Thursday, July 23, 2009
SSE2 Data Transfer/Packed Arithmetic Instruction - Example
SIMD: Single Instruction Multiple Data
This example shows the operation of 3 SSE2 instructions:
a) MOVLPD - SSE2 Data Transfer Instruction
b) MOVHPD - SSE2 Data Transfer Instruction
c) ADDPD - SSE2 Packed Arithmetic Instruction
The registers used in the example are the extended MMX registers (hence the abbreviation XMM). The x86 architecture provides for 16 XMM registers in 64-bit mode and 8 registers in 32-bit mode.
The XMM registers are 128 bit registers. These registers can be imagined as having 2 parts: a lower and a upper part of 64 bits each.
MOVLPD - Moves Data to the lower part of the XMM register. (bits 63:0)
MOVHPD - Moves Data to the upper part of the XMM register. (bits 127:64)
ADDPD - Adds the packed values in the two registers and saves the result in the destination register.
The instruction addpd xmm1, xmm0 works as explained under:
xmm1[63:0] <- xmm0[63:0] + xmm1[63:0]
xmm1[127:64] <- xmm0[127:64] + xmm1[127:64]
Here is a simple example that utilizes all these instructions:
1. The goal of this example is to add mm0_data_low (1.5) to mm1_data_low (2.5) and mm0_data_high(2.5) to mm1_data_high(2.0).
2. By using the SIMD instructions adding 2 different pairs of floating point numbers is done in a single instruction. Hence the name SIMD - Single Instruction Multiple Data.
//////////////////////////////////
section .data
mm0_data_low dq 1.5
mm0_data_high dq 2.5
mm1_data_high dq 2.0
mm1_data_low dq 2.5
section .text
global _start
_start:
nop
; xmm0[63:0] <- 1.5
movlpd xmm0, [mm0_data_low]
; xmm0[127:64] <- 2.5
movhpd xmm0, [mm0_data_high]
; xmm1[63:0] <- 2.0
movlpd xmm1, [mm1_data_low]
; xmm1[127:64] <- 2.5
movhpd xmm1, [mm1_data_high]
; xmm1[127:64] <- xmm0[127:64] + xmm1[127:64]
; xmm1[63:0] <- xmm0[63:0] + xmm1[63:0]
addpd xmm1,xmm0
mov eax, 1
mov ebx, 0
int 0x80
//////////////////////////////////////////////
Lets run this program through gdb and see what the values are:
We expect the following values in XMM1:
xmm1[127:64] = 4.5
xmm1[63:0] = 4.0
After loading the low-part of xmm0:
(gdb) p $xmm0
$2 = v2_double = {1.5, 0}
xmm0 low-part is 1.5
Now load the upper-part of xmm0:
(gdb) next
14 movhpd xmm0, [mm0_data_high]
(gdb) p $xmm0
$3 = v2_double = {1.5, 2.5}
xmm0 upper-part is 2.5 and xmm0 low-part is 1.5
Now load the low-part of xmm1:
(gdb) next
15 movlpd xmm1, [mm1_data_low]
(gdb) p $xmm1
$4 = v2_double = {2.5, 0}
xmm1 low-part is 2.5
(gdb) next
16 movhpd xmm1, [mm1_data_high]
gdb) p $xmm1
$5 = v2_double = {2.5, 2}
xmm1 upper-part is 2.0 and low-part is 2.5
Finally, the addpd:
(gdb) next
17 addpd xmm1,xmm0
p $xmm1
$6 = v2_double = {4, 4.5}
This agrees with our expected result of xmm1[127:64] = 4.5 and xmm1[63:0] = 4.0.
Subscribe to Posts [Atom]