Inline assembler

General FreeBASIC programming questions.
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 02, 2017 12:09

That was an error on my behalf but it seems to be working as intended as the following behaviour is true for both code examples.

When the line "movd [a],xmm1" is changed to "movd [a],xmm0", the value 26 is returned indicating that at least one value of the mula and mulb arrays is moved to the xmm# registers. Though, after the multiplication step, the product register returns 0.

Wanted to do,

Code: Select all

mov r8,[mula_ptr] 'memory offset mula into r8
mov r9,[mulb_ptr] 'memory offset mula into r9
   
movaps xmm0,[r8] 'move mula into xmm0
movaps xmm1,[r9] 'move mulb into xmm1
   
mulps xmm1,xmm0 'multiplicate xmm1 by xmm0
movd [a],xmm1 'get result

which seems equal to:

Code: Select all

movaps xmm0,[mula] 'move mula into xmm0
movaps xmm1,[mulb] 'move mulb into xmm1
   
mulps xmm1,xmm0 'multiplicate xmm1 by xmm0
movd [a],xmm1 'get result
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 03, 2017 10:30

Problem solved. I was not using the right multiplication instruction (integer dword). Here is a good list of instructions that helped me sort it out: https://cmpsb.net/asm/x86/instr/

Here is an example of SSE multiplication and horizontal addition with 32-bit integers.

Code: Select all

'64-bit

screenres 800,600

dim as long a
dim as long mula(3)
dim as long mulb(3)

'xmm0
mula(0)=26
mula(1)=676
mula(2)=17576
mula(3)=456976

'xmm1
mulb(0)=7
mulb(1)=13
mulb(2)=8
mulb(3)=24

asm
      
   movupd xmm0,[mula] 'copy mula array into xmm0
   movupd xmm1,[mulb] 'copy mulb array into xmm1
   
   pmulld xmm1,xmm0 'copy multiplication of xmm0 by xmm1 in xmm1
   
   'sum all 4 dwords in the xmm1 register
   phaddd xmm1,xmm1 'horizontal addition
   phaddd xmm1,xmm1 'horizontal addition
   
   movd [a],xmm1
   
end asm

print "value check: ";(7*26)+(13*676)+(8*17576)+(24*456976)
print "sse value  : ";a

sleep
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 05, 2017 8:08

How do you work with mixed variable types in assembler? 16-bit, 32-bit and 64-bit. The following program is supposed to return "123" but does not.

Thanks

Code: Select all

'64-bit

screenres 800,600

dim as short word1=123
dim as long dword1=123456
dim as integer qword1=1234567

dim as any ptr word1_ptr=@word1
dim as any ptr dword1_ptr=@dword1
dim as any ptr qword1_ptr=@qword1

dim as short w
dim as long dw
dim as integer qw

asm
   
   'following code needs to return 123
   mov rax,[word1_ptr] 'copy memory offset to rax
   mov rax,[rax] 'copy value at memory offset to rax
   mov [qw],rax 'copy rax to qword variable qw

end asm

print qw

sleep
Stonemonkey
Posts: 549
Joined: Jun 09, 2005 0:08

Re: Inline assembler

Postby Stonemonkey » Apr 05, 2017 10:25

On x86 you need to use registers of appropriate size, for 8 bit use registers AL or AH or BL or BH etc.
For 16 bit use AX or BX etc.
For 32 bit use EAX, EBX etc.

Not sure how that translates to 64 bit processors though.
adele
Posts: 47
Joined: Jun 13, 2015 19:33

Re: Inline assembler

Postby adele » Apr 05, 2017 11:08

Hi Provoni,

Provoni wrote:The following program is supposed to return "123" but does not.


123 is decimal; let us try with hex encoding (quick and dirty).

You`ll have to define some kind of Union to access the lower parts qword values. Maybe later, but this should help at least a bit:

Code: Select all

'64-bit

'screenres 800,600

dim as short word1=&h123
dim as long dword1=&h123456
dim as integer qword1=&h1234567

dim as any ptr word1_ptr=@word1
dim as any ptr dword1_ptr=@dword1
dim as any ptr qword1_ptr=@qword1

dim as short w
dim as long dw
' dim as integer qw ' OS / CPU dependent; LongInt is safer/more distinct
Dim As LongInt qw
asm
   ' this is _just_ a demonstration, _not_  good code! .adi
   'following code needs to return 123
   mov rax,[word1_ptr] 'copy memory offset to rax
   Xor rdx,rdx   ' later, we`ll write back _all_ 64 bits, so "zap" them (poor coding by myself!)
   mov dx,word Ptr [rax] 'copy value at memory offset to 16 bit register
   mov word Ptr [qw],dx ' copy 64 bits rdx to qword variable qw

end asm

print Hex(qw,16)
print Hex(qw)

sleep

I don´t go into too deep, but it seems you still are playing around with the code. IMO the best way to learn ASM.

adi
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 05, 2017 15:11

Thanks for the help Stonemonkey and adele,

The FreeBASIC generated code handles the conversion with the "movsx" instruction. The following now works:

Code: Select all

   mov rax,[word1_ptr] 'copy memory offset to rax
   mov rax,[rax] 'copy value at memory offset to rax
   movsx rax,ax '<---
   mov [qw],rax 'copy rax to qword variable qw

Is it possible to compress the following? To get the value from the pointer word1_ptr:

Code: Select all

   mov rax,[word1_ptr] 'copy memory offset to rax
   mov rax,[rax] 'copy value at memory offset to rax

The brackets around word1_ptr are not needed, it is the same.
Stonemonkey
Posts: 549
Joined: Jun 09, 2005 0:08

Re: Inline assembler

Postby Stonemonkey » Apr 05, 2017 16:24

I don't know how words (16 bit) are stored in 64 bit but in 32 bit x86 doing that could possibly result in a memory violation and when writing to memory could too or overwrite other data.

You'd do something like:

Mov eax,dword ptr[word1_ptr]
Movsx eax,word ptr[eax]

So in 64 bit it might be

Mov rax,qword ptr[word1_ptr]
Movsx rax,word ptr[rax]
Mov qword ptr[qw],rax

But I'm not sure tbh.

The pointer stores an address which points to a memory location where data is stored, a pointer is variable and can be altered to point to different locations so its value had to be loaded into a register first before the location it points to can be accessed. Variables in functions are stored on the stack and the assembler will address them relative to the base pointer so if you have a function with variable a as integer you could write

Mov eax,dword ptr[a]

To move the value in variable a into eax
But it might assemble to something like

Mov eax,dword ptr[ebp-12]

Again, I'm talking about 32 bit and not really sure about 64 bit.
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 06, 2017 9:11

Stonemonkey wrote:Mov eax,dword ptr[a]

To move the value in variable a into eax
But it might assemble to something like

Mov eax,dword ptr[ebp-12]

[epb-12] is the memory offset on the stack where the variable a is stored right?
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 06, 2017 9:27

What baffles me is that reducing the amount of instructions is not always faster. With my FreeBASIC programs I often noticed that using 16-bit arrays were faster over using 64-bit arrays. With the use of the compiler option -rr I found out why.

If only 64-bit arrays are used a piece of code looks like this:

Code: Select all

add   rbp, QWORD PTR [rsi+rax*8]

When a 16-bit array is used it has to use the movsx instruction and the code becomes faster!

Code: Select all

movsx   rax, WORD PTR [rsi+rax*2]
add   rbp, rax

Why is this faster?

- Specific to the CPU? Mine is i7 930.
- Less bits have to be moved?
- Offset calculation with the add instruction is not as efficient as with movsx?
Stonemonkey
Posts: 549
Joined: Jun 09, 2005 0:08

Re: Inline assembler

Postby Stonemonkey » Apr 06, 2017 16:22

Provoni wrote:
Stonemonkey wrote:Mov eax,dword ptr[a]

To move the value in variable a into eax
But it might assemble to something like

Mov eax,dword ptr[ebp-12]

[epb-12] is the memory offset on the stack where the variable a is stored right?


Yes, the variables in a function/sub are created on the stack and ebp points to them so be careful or avoid modifying that register, the assembler knows the offset to index each variable from ebp so if it's altered it no longer knows where to find them.
Variables declared within different scopes in a function can share the same location on the stack too.
Stonemonkey
Posts: 549
Joined: Jun 09, 2005 0:08

Re: Inline assembler

Postby Stonemonkey » Apr 06, 2017 16:30

Provoni wrote:What baffles me is that reducing the amount of instructions is not always faster. With my FreeBASIC programs I often noticed that using 16-bit arrays were faster over using 64-bit arrays. With the use of the compiler option -rr I found out why.

If only 64-bit arrays are used a piece of code looks like this:

Code: Select all

add   rbp, QWORD PTR [rsi+rax*8]

When a 16-bit array is used it has to use the movsx instruction and the code becomes faster!

Code: Select all

movsx   rax, WORD PTR [rsi+rax*2]
add   rbp, rax

Why is this faster?

- Specific to the CPU? Mine is i7 930.
- Less bits have to be moved?
- Offset calculation with the add instruction is not as efficient as with movsx?


It's possible that the qword isn't aligned and crosses an 8 byte boundary in memory and the CPU has to do 2 loads from memory (assuming it's a 64 bit data bus) to load the value to add.
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 06, 2017 17:32

Thanks for the feedback Stonemonkey.

How can you align data? Does FreeBASIC support this yet?

I've done a search and found: http://freebasic.net/forum/viewtopic.php?t=22975
And: https://sourceforge.net/p/fbc/bugs/659/
greenink
Posts: 200
Joined: Jan 28, 2016 15:45

Re: Inline assembler

Postby greenink » Apr 06, 2017 23:34

Code: Select all

   .align 16


You can also put static data in eg.

Code: Select all

   lea rdi,[rdi+64]
   jnz flipAlp
   ret
 flipshift:   .int 1,2,4,8,16,32,64,128
           .int 256,512,1024,2048,4096,8192,16384,32768
 flipmask:      .int 0x80000000,0x80000000,0x80000000,0x80000000
 rndphi:      .quad 0x9E3779B97F4A7C15
 rndsqr3:      .quad 0xBB67AE8584CAA73B

I forget if you can still use .text .data .bss sections with the current version of the compiler
Stonemonkey
Posts: 549
Joined: Jun 09, 2005 0:08

Re: Inline assembler

Postby Stonemonkey » Apr 07, 2017 6:10

If you print the hex address of a variable you can see if it is aligned or not, for 64 bit it should end in either 0 or 8.
Provoni
Posts: 313
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Inline assembler

Postby Provoni » Apr 08, 2017 8:28

Thanks greenink and Stonemonkey,

Some data is not aligned because instructions that work with aligned data crash the program. I tried using ".align 16" before the FreeBASIC initialization of the arrays but it didn't work.

Here's my assembler code. It is around 15 to 20% faster than the -O max generated FreeBASIC code. I went through various optimization guides and wonder if anyone could offer further advice.

Code: Select all

'64-bit

'- the following asm block is a small part of an inner loop

asm
      
   '- would it be worthwhile to store the following memory lookups
   'in an xmm# register or similar earlier on?
                  
   movapd xmm0,[mul0] 'can use movapd here instead of movupd (aligned)
   xor rax,rax
     mov r8,[map2_ptr]
     lea r9,[sol] 'can use lea here
     mov r11,[g5_ptr] 'can't use lea here, why?
     mov r12,[ngrams_ptr] 'can't use lea here, why?
     movsx rsi,word ptr[r8+2]
     movsx rcx,word ptr[r8]
     
   l1:
   
      '- various operations have been moved around to break dependencies
      '- sse operations are used to calculate 5-dim array lookup, is that
      'actually a worthwhile optimization to consider?
   
      movupd xmm1,[4+r9+rsi*4] 'must use moveupd here (unaligned)
      movsx rbx,dword ptr[r9+rsi*4]
      pmulld xmm1,xmm0 'multiplicate 4 values by xmm0
      add r8,2
      movsx r13,word ptr[r12+rsi*2]
      phaddd xmm1,xmm1 'horizontal addition a+b,c+d
      sub rax,r13
      phaddd xmm1,xmm1 'horizontal addition a+b+c+d
      movsx rsi,word ptr[r8+2] 'get rsi for next loop iteration and break dependencies
      movd r10d,xmm1 'get a+b+c+d
      add rbx,r10
      movsx rdx,word ptr[r11+rbx*2]
      add rax,rdx
      dec rcx
      jnz l1
         
   add [new_ngram_score],rax

end asm

Return to “General”

Who is online

Users browsing this forum: No registered users and 5 guests