Tuesday, June 16, 2009

String Instructions - scasb,scasw,scasd,scasq

The x86 architecture offers different types of instructions to perform various string operations . Scan string instruction is one of them. There are different flavors of the scan string instruction: scasb (byte form), scasw(word), scasd(double word) and scasq(quad word).

scasb: Will compare the byte at AL with the byte value in ES:EDI and sets the flags accordingly.
scasw: Will compare the word at AX with the word value in ES:EDI and sets the flags accordingly.
scasd: Will compare the dword at EAX with the dword value in ES:EDI and sets the flags accordingly.
scasq: Will compare the qword at RAX with the qword value in ES:(E/R)DI and sets the flags accordingly.

When the scas* instructions are used with the repeat prefix they become very powerful. For eg: The scasb instruction can be used with the repne(repeat not equal) prefix to compute the string length.
Here is an alogrithm of how the scasb instruction works when used with the repne prefix:

1. cmp AL with ES:EDI
2. If they are equal jump to 5 else goto 3.
3. if(DF==0) EDI = EDI+1 else EDI=EDI-1
4. jmp to 1
5. DONE

The DF above is the direction flag which controls the direction in which the string operation proceeds. If DF is 0, then after every iteration the value in EDI is incremented. If DF is 1, then after every iteration the value in EDI is decremented. The value by which EDI is incremented depends upon what version (byte/word/dword/qword) of scas is used. For the string length, use of scasb keeps it simple.

To control the direction flag, use the std/cld (set/clear direction flag) instructions. Assume AL has 0 (which is the NULL character in the string). At the end of the iteration if you subtract the final value of edi from the initial value of edi and then subtract the result by one, you end up with the string length.

string length = final edi - initial edi - 1;

Here is an example program:


-----------------------------------------------
section .data
mystring db "Siddharth", 0
mystrlen dd 0


section .text
global _start
_start:
nop
mov ax, ds
mov es, ax ; Initialize ES
mov edi, mystring ; Initialize EDI and EBP to point to the
mov ebp, mystring ; string in memory.
cld ; Clear eflags.df
mov ecx, 255 ; set ecx to a high value
mov al, 0 ; Initialize al with null character.
repne scasb ; scan bytes in the string
dec edi
sub edi, ebp ; This should put the string length in edi.
mov dword [mystrlen], edi; store string length in memory

; Use the stringlength as the exit-code
mov ebx, [mystrlen]
mov eax,1 ; 'exit' system call
int 80h ; call the kernel
---------------------------------------------


After assembling and running the program the exit code will contain the string length.
Typically executing 'echo $?' gives the exit code of the last command the shell executed. In this case, you will have a value of 9 which is the string length.



2 comments:

  1. This blog is really informative i really had fun reading it.

    ReplyDelete