24bit to 16bit

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
Eternal_Pain
Posts: 32
Joined: Aug 25, 2007 17:07

24bit to 16bit

Post by Eternal_Pain »

This Function convert 24bit RGB values to 16bit RGB values.

Code: Select all

	
Function Hrgb (Byval red   As Ubyte, _
               Byval green As Ubyte, _
               Byval blue  As Ubyte) As Ushort
               
    Return (((red Shr 3) Shl 11)+((green Shr 2) Shl 5)+(blue Shr 3))
End Function
Expample:

Code: Select all

Dim RGB16 as UShort

RGB16 = HRGB(255,255,255)

?RGB16

sleep
For all to like it faster :o) here a MACRO/ASM version (thx to Volta)

Code: Select all

#MACRO hrgb(red,green,blue,rgb16)
  ASM
    mov al, BYTE PTR [red]
    SHL eax, 5
   
    mov al, BYTE PTR [green]
    SHL eax, 6
   
    mov al, BYTE PTR [blue]
    SHR eax, 3
   
    mov word PTR [rgb16], ax
  END ASM
#ENDMACRO
Expample:

Code: Select all

	
DIM red   AS UBYTE
DIM green AS UBYTE
DIM blue  AS UBYTE
DIM rgb16 AS USHORT

red=255
green=255
blue=255
hrgb (red,green,blue,rgb16)
?rgb16

sleep
Mysoft
Posts: 836
Joined: Jul 28, 2005 13:56
Location: Brazil, Santa Catarina, Indaial (ouch!)
Contact:

Post by Mysoft »

i guess that the ASM version, seem to be useless, cuz is very annoying doing a major data conversion, using that macro, it will be a 20-40% code (speed) overhead, but, well, its a code =)
Eternal_Pain
Posts: 32
Joined: Aug 25, 2007 17:07

Post by Eternal_Pain »

i think the asm code is very usefull :D
A example to use. I've used the 'counter.bas' from Michael W to mess the cycles...

here a code with the BASIC function: it's was ~17873875 cycles

Code: Select all

'#Include "Counter.bas"
Const Pitch=640

Function Hrgb (Byval red   As Ubyte, _
               Byval green As Ubyte, _
               Byval blue  As Ubyte) As Ushort
               
    Return (((red Shr 3) Shl 11)+((green Shr 2) Shl 5)+(blue Shr 3))
End Function

Screen 18,16

Dim ColIndex(0 to 8) as Integer={ _
    &hFFFFFF, &hFF0000, &hFFFF00, _
    &h0000FF, &hFF00FF, &h808080, _
    &h157070, &h4080FF, &h00FFFF}
    
'Dim red    as ubyte
'Dim green  as ubyte
'Dim blue   as ubyte

Dim getrgb as ubyte ptr

Dim Scr as UShort ptr=ScreenPtr

Dim rgb16 as UShort

Dim Adr as Uinteger
Dim ColCount as UInteger

ScreenLock
'Counter_Begin
Adr=0
For y as integer = 0 to 479
for x as integer = 0 to 639
    
    getrgb=cast(ubyte ptr,@ColIndex(ColCount))
    rgb16=Hrgb (getrgb[2], getrgb[1], getrgb[0])
    
    Scr[Adr+x]=rgb16
    
    ColCount += 1
    If ColCount > 8 Then ColCount -= 8
    
next x
Adr+=Pitch
next y
'Counter_end
ScreenUnlock

'?Counter_Cycles
Sleep
an here the ASM/MACRO: it was just 12209035 cycles

Code: Select all

'#Include "Counter.bas"
Const Pitch=640

#MACRO hrgb(red,green,blue,rgb16)
  ASM
    mov al, Byte Ptr [red]
    Shl eax, 5
   
    mov al, Byte Ptr [green]
    Shl eax, 6
   
    mov al, Byte Ptr [blue]
    Shr eax, 3
   
    mov word Ptr [rgb16], ax
  End ASM
#ENDMACRO
 

Screen 18,16

Dim ColIndex(0 to 8) as Integer={ _
    &hFFFFFF, &hFF0000, &hFFFF00, _
    &h0000FF, &hFF00FF, &h808080, _
    &h157070, &h4080FF, &h00FFFF}
    
Dim red    as ubyte
Dim green  as ubyte
Dim blue   as ubyte

Dim getrgb as ubyte ptr

Dim Scr as UShort ptr=ScreenPtr

Dim rgb16 as UShort

Dim Adr as Uinteger
Dim ColCount as UInteger

ScreenLock
'Counter_Begin
Adr=0
For y as integer = 0 to 479
for x as integer = 0 to 639
    
    getrgb=cast(ubyte ptr,@ColIndex(ColCount))
    red   = getrgb[2]
    green = getrgb[1]
    blue  = getrgb[0]
    Hrgb(red, green, blue, rgb16)
    
    Scr[Adr+x]=rgb16
    
    ColCount += 1
    If ColCount > 8 Then ColCount -= 8
    
next x
Adr+=Pitch
next y
'Counter_end
ScreenUnlock

'?Counter_Cycles
Sleep
[/code]
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: 24bit to 16bit

Post by counting_pine »

Hi. I know this was a while ago now but, since this has come up again, I just thought I'd throw in a couple of comments:
- Interesting technique, using the al register like that. Is it actually faster that way? I'm not an asm expert, but I thought mixing instructions with e.g. al and eax actually slowed things down a bit?
- From the description I would have expected to pass an Integer and get back a Short. Could it be easily adapted to do that?
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: 24bit to 16bit

Post by MichaelW »

Using partial registers can interfere with out-of-order execution, by several mechanisms depending on the processor.

I noticed at the time it was posted that the above code did not look right but never got around to testing it, until now. The only bits that the code preserves are those that it shifts out of the low-order byte, before it overwrites the byte, and the blue bits that are not discarded by the shr. I think my version is closer to what it should be, and it avoids partial registers, but I’m not sure about some of the details and don’t have time to do a thorough test.

Code: Select all

#MACRO hrgb_(red,green,blue,rgb16)
  ASM
    mov al, [red]
    Shl eax, 5

    mov al, [green]
    Shl eax, 6

    mov al, [blue]
    Shr eax, 3

    mov [rgb16], eax
  End ASM
#ENDMACRO
/'
           000RRRRR
      000RRRRR00000
      000RR00GGGGGG
 000RR00GGGGGG000000
 000RR00GGGG000BBBBB
   000RR00GGGG000BB
'/
#MACRO _hrgb_(red,green,blue,rgb16)
  asm
    movzx eax, BYTE PTR [red]
    and eax, 0x1f
    shl eax, 6
    movzx edx, BYTE PTR [green]
    and edx, 0x3f
    or eax, edx
    shl eax, 5
    movzx edx, BYTE PTR [blue]
    and edx, 0x1f
    or eax, edx
    mov [rgb16], eax
  end asm
#ENDMACRO
/'
            000RRRRR
       00RRRRR000000
            00GGGGGG
       00RRRRRGGGGGG
  00RRRRRGGGGGG00000
            000BBBBB
  00RRRRRGGGGGGBBBBB
'/

Function Hrgb (Byval red   As Ubyte, _
               Byval green As Ubyte, _
               Byval blue  As Ubyte) As Ushort

    Return (((red Shr 3) Shl 11)+((green Shr 2) Shl 5)+(blue Shr 3))
End Function

dim as uinteger rgb16
dim as ubyte r = &b11111, g, b = &b11111
hrgb_(r,g,b,rgb16)
print bin(rgb16,16)
rgb16 = Hrgb(r,g,b)
print bin(rgb16,16)
_hrgb_(r,g,b,rgb16)
print bin(rgb16,16)

sleep

Code: Select all

0001100000000011
0001100000000011
1111100000011111
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: 24bit to 16bit

Post by counting_pine »

MichaelW wrote:

Code: Select all

/'
           000RRRRR
      000RRRRR00000
      000RR00GGGGGG
 000RR00GGGGGG000000
 000RR00GGGG000BBBBB
   000RR00GGGG000BB
'/
I think the idea is that it preserves the high-order bits, i.e. it takes values from 0..255, and scales them down, i.e.

Code: Select all

           RRRRR000
      RRRRR00000000
      RRRRRGGGGGG00
RRRRRGGGGGG00000000
RRRRRGGGGGGBBBBB000
   RRRRRGGGGGGBBBBB
So I think it's working correctly, but as you say it is using partial registers, so there may well be quicker ways.
I3I2UI/I0
Posts: 90
Joined: Jun 03, 2005 10:39
Location: Germany

Re: 24bit to 16bit

Post by I3I2UI/I0 »

The asm macro (hrgb_) is the fastest.
Test:

Code: Select all

#Macro hrgb_(red,green,blue,rgb16)
Asm
  mov al, [red]   '           RRRRRRRR
  Shl eax, 5      '      RRRRRRRR00000
  mov al, [green] '      RRRRRGGGGGGGG
  Shl eax, 6      'RRRRRGGGGGGGG000000
  mov al, [blue]  'RRRRRGGGGGGBBBBBBBB
  Shr eax, 3      '000RRRRRGGGGGGBBBBB
  mov [rgb16], eax'   RRRRRGGGGGGBBBBB
End Asm
#EndMacro

#Macro _hrgb_(red,green,blue,rgb16)
Asm
  movzx eax, Byte Ptr [red]
  And eax, 0x1f
  Shl eax, 6
  movzx edx, Byte Ptr [green]
  And edx, 0x3f
  Or eax, edx
  Shl eax, 5
  movzx edx, Byte Ptr [blue]
  And edx, 0x1f
  Or eax, edx
  mov [rgb16], eax
End Asm
#EndMacro

#Macro _hrgb_x(red,green,blue,rgb16)
Asm
  movzx eax, Byte Ptr [red]
  And eax, 0xf8
  Shl eax, 8
  movzx edx, Byte Ptr [green]
  And edx, 0xfc
  Shl edx, 3
  Or eax, edx
  movzx edx, Byte Ptr [blue]
  Shr edx, 3
  Or eax, edx
  mov [rgb16], eax
End Asm
#EndMacro

Function Hrgb (ByVal red   As UByte, _
  ByVal green As UByte, _
  ByVal blue  As UByte) As UShort
  Return (((red Shr 3) Shl 11)+((green Shr 2) Shl 5)+(blue Shr 3))
End Function

ScreenRes 600,220,32
Dim As UShort rgb16
Dim As Integer i,j
Dim As UByte r , g , b
Dim As Any Ptr phrgb_, p_32
Dim As Double t
p_32 = ImageCreate(256, 80, 0, 32)

For i = 0 To 255
  Line p_32, (i,0)-(i,79), RGBA(i, Abs(128-i), 255-i, 255)
Next
Put (0,0), p_32,pset
Sleep 4000

ScreenRes 600,220,16
phrgb_ = ImageCreate(256, 80, 0, 16)
Dim As Byte Ptr pb = p_32
pb = pb + 32
Dim As Short Ptr ps = phrgb_
ps = ps + 16

t=Timer
For j=1 To 2000
  For i = 0 To 80*256-1
    r=pb[i*4+2]
    g=pb[i*4+1]
    b=pb[i*4]
    hrgb_(r,g,b,rgb16)
    ps[i]=rgb16
  Next
Next
t=Timer-t
Put (0,20), phrgb_,pset
? "hrgb_(r,g,b,rgb16) ",t

t=Timer
For j=1 To 2000
  For i = 0 To 80*256-1
    r=pb[i*4+2]
    g=pb[i*4+1]
    b=pb[i*4]
    _hrgb_(r,g,b,rgb16)
    ps[i]=rgb16
  Next
Next
t=Timer-t
Put (0,120), phrgb_,PSet
? "_hrgb_(r,g,b,rgb16) ",t

t=Timer
For j=1 To 2000
  For i = 0 To 80*256-1
    r=pb[i*4+2]
    g=pb[i*4+1]
    b=pb[i*4]
    _hrgb_x(r,g,b,rgb16)
    ps[i]=rgb16
  Next
Next
t=Timer-t
Put (300,120), phrgb_,PSet
? "_hrgb_x(r,g,b,rgb16) ",t

t=Timer
For j=1 To 2000
  For i = 0 To 80*256-1
    r=pb[i*4+2]
    g=pb[i*4+1]
    b=pb[i*4]
    ps[i]=Hrgb(r,g,b)
  Next
Next
t=Timer-t
Put (300,20), phrgb_,PSet
? "Hrgb(r,g,b)     ",t

Sleep
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: 24bit to 16bit

Post by counting_pine »

I get these timings on a P4:

Code: Select all

hrgb_(r,g,b,rgb16)           0.6907146019952895
_hrgb_(r,g,b,rgb16)          0.6480161584784026
_hrgb_x(r,g,b,rgb16)         0.5942196564087809
Hrgb(r,g,b)                  1.674187641166678
I3I2UI/I0
Posts: 90
Joined: Jun 03, 2005 10:39
Location: Germany

Re: 24bit to 16bit

Post by I3I2UI/I0 »

Oh,

Code: Select all

hrgb_(r,g,b,rgb16)          0.6107652472974487
_hrgb_(r,g,b,rgb16)         0.6577056870029688
_hrgb_x(r,g,b,rgb16)        0.6453796180415029
Hrgb(r,g,b)                 0.9810234092454984
on ATOM 1,66GHz
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Re: 24bit to 16bit

Post by 1000101 »

Interesting speed optimization for 24/32-bit -> 15/16-bit.

Just out of curiosity, shouldn't it be writing ax and not eax to the result?
I3I2UI/I0
Posts: 90
Joined: Jun 03, 2005 10:39
Location: Germany

Re: 24bit to 16bit

Post by I3I2UI/I0 »

Hi 1000101
Yes, you are absolutely right!
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: 24bit to 16bit

Post by MichaelW »

My _hrgb_ code above is wrong, and for the hrgb_ code I could not find any coding that avoided partial registers and that seemed likely to be significantly faster than the original code. This is my attempt to verify that the original code produces the correct result.

Code: Select all

''=======================================================================================

#MACRO hrgb_(red,green,blue,rgb16)
  ASM
    mov al, [red]
    Shl eax, 5
    mov al, [green]
    Shl eax, 6
    mov al, [blue]
    Shr eax, 3
    mov [rgb16], ax
  End ASM
#ENDMACRO

''=======================================================================================

function rgb16 naked( byval red as uinteger, _
                      byval green as uinteger, _
                      byval blue as uinteger ) as ushort
    asm
        mov al, [esp+4]
        shl eax, 5
        mov al, [esp+8]
        shl eax, 6
        mov al, [esp+12]
        shr eax, 3
        ret 12
    end asm
end function

''=======================================================================================

#define RGB_R( c ) ( CUInt( c ) Shr 16 And 255 )
#define RGB_G( c ) ( CUInt( c ) Shr  8 And 255 )
#define RGB_B( c ) ( CUInt( c )        And 255 )

''=======================================================================================

''--------------------------------------------------------------------------------------
'' This ported from code I found here:
''  http://crpppc19.epfl.ch/doc/ffmpeg-doc/html/rgb2rgb__template_8c_source.html#l00188
''--------------------------------------------------------------------------------------

#define RGB32to16(rgb32)((rgb32 and &hff) shr 3 + _
                         (rgb32 and &hfc00) shr 5 + _
                         (rgb32 and &hf80000) shr 8 )

''=======================================================================================

dim as any ptr i1, i2, i3, i4
dim as ushort c16
dim as uinteger c, r, g, b
dim as uinteger ptr p1
dim as ushort ptr p2, p3, p4

screenres 640,480,16
width 80,30

i1 = imagecreate(200,200,rgb(127,127,127),32)
i2 = imagecreate(200,200,0)
i3 = imagecreate(200,200,0)
i4 = imagecreate(200,200,0)

imageinfo(i1,,,,,p1)
imageinfo(i2,,,,,p2)
imageinfo(i3,,,,,p3)
imageinfo(i4,,,,,p4)

c = *p1
print bin(RGB32to16(c),16)
r = RGB_R(c)
g = RGB_G(c)
b = RGB_B(c)
hrgb_( r, g, b, c16 )
print bin(c16,16)
c16 = rgb16( r, g, b )
print bin(c16,16)
print
print bin(r,8),bin(g,8),bin(b,8)
print

for y as integer = 0 to 199
    for x as integer = 0 to 199
        c = p1[y*200+x]
        r = RGB_R(c)
        assert( r = &b01111111 )
        g = RGB_G(c)
        assert( g = &b01111111 )
        b = RGB_B(c)
        assert( b = &b01111111 )
        hrgb_(r,g,b,c16)
        assert( c16 = &b0111101111101111 )
        p2[y*200+x] = c16
        c16 = rgb16( r, g, b)
        assert( c16 = &b0111101111101111 )
        p3[y*200+x] = c16
    next
next

for y as integer = 0 to 199
    ImageConvertRow( @p1[y*200], 32, @p4[y*200], 16, 200 )
next

print
print bin(*p4,16)

put(10,250),i2
put(220,250),i3
put(430,250),i4

sleep
This cycle count code shows that passing ubytes instead of uintegers slows the code down significantly.

Code: Select all

''===================================================================================
#include "counter.bas"
''===================================================================================
''
'' The newer cycle count macros are available here:
''
''    http://www.freebasic.net/forum/viewtopic.php?f=7&t=20003
''
''===================================================================================


#MACRO hrgb_(red,green,blue,rgb16)
  ASM
    mov al, [red]
    Shl eax, 5
    mov al, [green]
    Shl eax, 6
    mov al, [blue]
    Shr eax, 3
    mov [rgb16], ax
  End ASM
#ENDMACRO

''===================================================================================

function rgb16ub naked( byval red as ubyte, _
                        byval green as ubyte, _
                        byval blue as ubyte ) as ushort
    asm
        mov al, [esp+4]
        shl eax, 5
        mov al, [esp+8]
        shl eax, 6
        mov al, [esp+12]
        shr eax, 3
        ret 12
    end asm
end function

function rgb16 naked( byval red as uinteger, _
                      byval green as uinteger, _
                      byval blue as uinteger ) as ushort
    asm
        mov al, [esp+4]
        shl eax, 5
        mov al, [esp+8]
        shl eax, 6
        mov al, [esp+12]
        shr eax, 3
        ret 12
    end asm
end function

''===================================================================================

dim as uinteger red, green, blue
dim as ushort c

SetProcessAffinityMask( GetCurrentProcess(), 1)

sleep 5000

for i as integer = 1 to 4

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
    counter_end()
    print counter_cycles;" cycles"

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
        hrgb_( red, green, blue, c )
    counter_end()
    print counter_cycles;" cycles"

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
        c = rgb16ub( red, green, blue )
    counter_end()
    print counter_cycles;" cycles"

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
        asm
            movzx eax, byte ptr [ebp-16]
            push eax
            movzx eax, byte ptr [ebp-12]
            push eax
            movzx eax, byte ptr [ebp-8]
            push eax
            call _RGB16UB@12
            mov word ptr [ebp-20], ax
        end asm
    counter_end()
    print counter_cycles;" cycles"

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
        c = rgb16( red, green, blue )
    counter_end()
    print counter_cycles;" cycles"
    print

next

sleep
The problem appears to be the three additional partial register accesses, but I have no idea why they have such a large effect:

Code: Select all

mov al, byte ptr [ebp-16]
push eax
mov al, byte ptr [ebp-12]
push eax
mov al, byte ptr [ebp-8]
push eax
call _RGB16UB@12
Edit:
After taking a break I do have some idea of why the effect is so large, the partial register accesses are changing only the lower 8 bits of EAX. According to Agner Fog’s optimization manuals on a P3 each instance causes a partial register stall with a delay of 5-6 clock cycles before the whole register can be pushed. On a P4 the nature of the problem is different, but it still involves a delay.

So to test this I did as the microarchitecture manual recommends and substituted a movzx instruction that zeros the upper 24 bits of the register, see the cycle-count code above, and the results below. And before someone points it out, there is no way 15 instructions can execute in 0 cycles, there must be some sort of non-obvious effect acting here. Fortunately, this and similar P4 anomalies are no longer worth working around.

This optimization apparently never made it into the asm emitter, but AFAICT it should be just a matter of recognizing the problem areas and changing the instruction mnemonic.

Running on a P3:

Code: Select all

 0 cycles
 23 cycles
 54 cycles
 32 cycles
 34 cycles

 0 cycles
 23 cycles
 54 cycles
 32 cycles
 34 cycles

 0 cycles
 23 cycles
 54 cycles
 32 cycles
 34 cycles

 0 cycles
 23 cycles
 54 cycles
 32 cycles
 34 cycles
Running on a P4 (Northwood):

Code: Select all

 0 cycles
 5 cycles
 21 cycles
 3 cycles
 5 cycles

 0 cycles
 3 cycles
 22 cycles
 0 cycles
 6 cycles

 0 cycles
 3 cycles
 22 cycles
 0 cycles
 5 cycles

 0 cycles
 3 cycles
 21 cycles
 0 cycles
 5 cycles
I corrected the problem in the test code by passing uintegers.
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Re: 24bit to 16bit

Post by 1000101 »

I know this topic is old, but people wondered about partial registers.

For all internal workings of the CPU, the register size is irrelevant internally to the memory model (real mode, protected mode, long mode). The CPU will take the same number of [partial*] cycle for an 8/16/32/64-bit register-to-register transaction in any memory model.

The first case of partial registers you need to worry about is during memory access. If the partial register is source/destination and not the memory reference you don't have to worry. A partial register transfer won't affect performance with the exceptions noted next. As a reference, however, it must be the same width as the memory model - no 16-bit addresses in 32-bit mode.

The other time you need to worry about partial register access is independent of register width but if the address is unaligned to the register width (32-bit memory alignment for 32-bit registers). In this case, depending on the memory bus architecture, the transfer may be split into partial register moves to align to the address for the memory bus which can cause memory latency. In this case severely unaligned access can take an additional 0 + (address & alignment) cycles to complete. A 32-bit transaction misaligned by two bytes (16-bit) can be handled in two transactions (split into two aligned 16-bit tansfers). A 128-bit transaction misaligned by a single byte can cause up to an additional 7 cycles to complete since the only way to realign the transfers is byte-by-byte. Some architectures and memory controllers** can mitigate this to the minimum required. In the above case of the 128-bit transfer noted above, it could (best case) become a series of an 8-bit, 16-bit, 32-bit, 64-bit, 8-bit transaction.

Of course this is all transparent from the code running at any level but it is important to know in order to be able to help avoid major latency situations.

* Depending on architecture the operation will take a different number of micro-ops which may or may not take a full processor cycle.
** There is no way to predict how the transactions will be handled without knowing the exact hardware and even then a hardware revision of the same architecture could change that.
TESLACOIL
Posts: 1769
Joined: Jun 20, 2010 16:04
Location: UK
Contact:

Re: 24bit to 16bit

Post by TESLACOIL »

Convert 24bit to 16bit colour using ASM
http://www.freebasic.net/forum/viewtopi ... ur#p174265
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Re: 24bit to 16bit

Post by 1000101 »

TESLACOIL wrote:Convert 24bit to 16bit colour using ASM
http://www.freebasic.net/forum/viewtopi ... ur#p174265

Yeah, stop opening new threads for existing topics. That one should be locked since it's just a reply to this one essentially.
Post Reply