Thursday, July 23, 2009

 

SSE2 Data Transfer/Packed Arithmetic Instruction - Example

SSE2 : Single Streaming Extensions2
SIMD: Single Instruction Multiple Data

This example shows the operation of 3 SSE2 instructions:

a) MOVLPD - SSE2 Data Transfer Instruction
b) MOVHPD - SSE2 Data Transfer Instruction
c) ADDPD - SSE2 Packed Arithmetic Instruction

The registers used in the example are the extended MMX registers (hence the abbreviation XMM). The x86 architecture provides for 16 XMM registers in 64-bit mode and 8 registers in 32-bit mode.

The XMM registers are 128 bit registers. These registers can be imagined as having 2 parts: a lower and a upper part of 64 bits each.

MOVLPD - Moves Data to the lower part of the XMM register. (bits 63:0)
MOVHPD - Moves Data to the upper part of the XMM register. (bits 127:64)
ADDPD - Adds the packed values in the two registers and saves the result in the destination register.

The instruction addpd xmm1, xmm0 works as explained under:

xmm1[63:0] <- xmm0[63:0] + xmm1[63:0]
xmm1[127:64] <- xmm0[127:64] + xmm1[127:64]

Here is a simple example that utilizes all these instructions:

1. The goal of this example is to add mm0_data_low (1.5) to mm1_data_low (2.5) and mm0_data_high(2.5) to mm1_data_high(2.0).

2. By using the SIMD instructions adding 2 different pairs of floating point numbers is done in a single instruction. Hence the name SIMD - Single Instruction Multiple Data.


//////////////////////////////////
section .data
mm0_data_low dq 1.5
mm0_data_high dq 2.5
mm1_data_high dq 2.0
mm1_data_low dq 2.5

section .text

global _start

_start:
nop

; xmm0[63:0] <- 1.5
movlpd xmm0, [mm0_data_low]

; xmm0[127:64] <- 2.5
movhpd xmm0, [mm0_data_high]

; xmm1[63:0] <- 2.0
movlpd xmm1, [mm1_data_low]

; xmm1[127:64] <- 2.5
movhpd xmm1, [mm1_data_high]

; xmm1[127:64] <- xmm0[127:64] + xmm1[127:64]

; xmm1[63:0] <- xmm0[63:0] + xmm1[63:0]

addpd xmm1,xmm0

mov eax, 1
mov ebx, 0
int 0x80
//////////////////////////////////////////////


Lets run this program through gdb and see what the values are:
We expect the following values in XMM1:
xmm1[127:64] = 4.5
xmm1[63:0] = 4.0

After loading the low-part of xmm0:

(gdb) p $xmm0
$2 = v2_double = {1.5, 0}
xmm0 low-part is 1.5

Now load the upper-part of xmm0:

(gdb) next
14 movhpd xmm0, [mm0_data_high]
(gdb) p $xmm0
$3 = v2_double = {1.5, 2.5}
xmm0 upper-part is 2.5 and xmm0 low-part is 1.5

Now load the low-part of xmm1:

(gdb) next
15 movlpd xmm1, [mm1_data_low]
(gdb) p $xmm1
$4 = v2_double = {2.5, 0}
xmm1 low-part is 2.5

(gdb) next
16 movhpd xmm1, [mm1_data_high]
gdb) p $xmm1
$5 = v2_double = {2.5, 2}
xmm1 upper-part is 2.0 and low-part is 2.5

Finally, the addpd:
(gdb) next
17 addpd xmm1,xmm0
p $xmm1
$6 = v2_double = {4, 4.5}

This agrees with our expected result of xmm1[127:64] = 4.5 and xmm1[63:0] = 4.0.


Comments:

Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]