Pload()

User projects written in or related to FreeBASIC.
Thrawn89
Posts: 477
Joined: Oct 08, 2005 13:12

Postby Thrawn89 » May 05, 2006 1:07

Ok, well, here's the verdict:

NonInterlaced Types [Passed]:
Passed - 1 Bps Color 0 [Grayscale]
Passed - 2 Bps Color 0 [Grayscale]
Passed - 4 Bps Color 0 [Grayscale]
Passed - 8 Bps Color 0 [Grayscale]
Passed - 16Bps Color 0 [Grayscale]

Passed - 8 Bps Color 2 [RGB]
Passed - 16Bps Color 2 [RGB]

Passed - 1 Bps Color 3 [Paletted]
Passed - 2 Bps Color 3 [Paletted]
Passed - 4 Bps Color 3 [Paletted]
Passed - 8 Bps Color 3 [Paletted]

Passed - 8 Bps Color 4 [Grayscale Alpha]
Passed - 16Bps Color 4 [Grayscale Alpha]

Passed - 8 Bps Color 6 [RGBA]
Passed - 16Bps Color 6 [RGBA]

Interlaced Types [Failed]:
KindOf - 1 Bps Color 0 [Grayscale]
Passed - 2 Bps Color 0 [Grayscale]
Passed - 4 Bps Color 0 [Grayscale]
Passed - 8 Bps Color 0 [Grayscale]
*Failed - 16Bps Color 0 [Grayscale]

Passed - 8 Bps Color 2 [RGB]
*Failed - 16Bps Color 2 [RGB]

KindOf - 1 Bps Color 3 [Paletted]
Passed - 2 Bps Color 3 [Paletted]
Passed - 4 Bps Color 3 [Paletted]
Passed - 8 Bps Color 3 [Paletted]

Passed - 8 Bps Color 4 [Grayscale Alpha]
*Failed - 16Bps Color 4 [Grayscale Alpha]

Passed - 8 Bps Color 6 [RGBA]
*Failed - 16Bps Color 6 [RGBA]

Filter Test [All NonInterlaced, Passed]:
Passed - 8 Bps Color 0 [Grayscale] Filter 0
Passed - 8 Bps Color 2 [RGB] Filter 0

Passed - 8 Bps Color 0 [Grayscale] Filter 1
Passed - 8 Bps Color 2 [RGB] Filter 1

Passed - 8 Bps Color 0 [Grayscale] Filter 2
Passed - 8 Bps Color 2 [RGB] Filter 2

Passed - 8 Bps Color 0 [Grayscale] Filter 3
Passed - 8 Bps Color 2 [RGB] Filter 3

Passed - 8 Bps Color 0 [Grayscale] Filter 4
Passed - 8 Bps Color 2 [RGB] Filter 4

Weird Sizes [All are Color 3 Paletted, Failed]:
Passed - 1 Bps 1x1 Interlaced
Passed - 1 Bps 1x1 NonInterlaced

Failed - 1 Bps 2x2 Interlaced
Passed - 1 Bps 2x2 Non Interlaced

Failed - 1 Bps 3x3 Interlaced
Kindov - 1 Bps 3x3 Non Interlaced

Failed - 1 Bps 4x4 Interlaced
Kindov - 1 Bps 4x4 Non Interlaced

Kindov - 1 Bps 5x5 Interlaced
Kindov - 1 Bps 5x5 Non Interlaced

Kindov - 2 Bps 6x6 Interlaced
Kindov - 2 Bps 6x6 Non Interlaced

Kindov - 2 Bps 7x7 Interlaced
Kindov - 2 Bps 7x7 Non Interlaced

Kindov - 2 Bps 8x8 Interlaced
Passed - 2 Bps 8x8 Non Interlaced

Failed - 2 Bps 9x9 Interlaced
Kindov - 2 Bps 9x9 Non Interlaced

Passed - 4 Bps 32x32 Interlaced
Passed - 4 Bps 32x32 Non Interlaced

Failed - 4 Bps 33x33 Interlaced
Failed - 4 Bps 33x33 Non Interlaced

Kindov - 4 Bps 34x34 Interlaced
Passed - 4 Bps 34x34 Non Interlaced

Kindov - 4 Bps 35x35 Interlaced
Failed - 4 Bps 35x35 Non Interlaced

Kindov - 4 Bps 36x36 Interlaced
Passed - 4 Bps 36x36 Non Interlaced

Failed - 4 Bps 37x37 Interlaced
Kindov - 4 Bps 37x37 Non Interlaced

Kindov - 4 Bps 38x38 Interlaced
Passed - 4 Bps 38x38 Non Interlaced

Kindov - 4 Bps 39x39 Interlaced
Kindov - 4 Bps 39x39 Non Interlaced

Kindov - 4 Bps 40x40 Interlaced
Passed - 4 Bps 40x40 Non Interlaced


Ok, as you can see it now works for most interlaced Images, however as you can see, the 1 bits per sample is being rendered wrong, and it now crashes on 16 bits per sample

Though Im pretty sure that has to do with a flaw with my code, tomarrow, I'll look at the binary of the scanlines and see whats going wrong

Anyways, now most of the Wierd sizes images load, however render wrong...I gotta check those algos

But thats it for tonight before I type code in my sleep again...that wasnt good...lol

~Thrawn~
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Postby yetifoot » May 05, 2006 10:59

Hi, looks very good, good to see you got the interlacing down. I've been learning PNG format, after reading the specifications. It took me hours to get interlacing working on bitdepth < 8, i found it very tough. (again thanks Adam7).

I just wanted to add that in the last version i downloaded, if you inline the Paeth predictor it can make a significant speed boost.

I'm now stuck i think at the same position as you, images smaller than 8x8 not only fail but cause a seg fault. I know that in the specs it says if there are less than 5x5 rows, cols (of 8x8), then a pass may be emtpy. I'm finding it a trouble fitting this into my code.
Last edited by yetifoot on May 06, 2006 17:43, edited 1 time in total.
counting_pine
Site Admin
Posts: 6174
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » May 05, 2006 13:35

yetifoot wrote:I'm now stuck i think at the same position as you, images smaller than 8x8 not only fail but cause a seg fault. I know that in the specs it says if there are less than 5x5 rows, cols (of 8x8), then a pass may be emtpy. I'm finding it a trouble fitting this into my code.

Try putting this line in:

Code: Select all

...

For pass = 1 To 7
   
    'insert this line here:
    If PNG_Header->width <= xoff(pass) Then Continue For

    ...
If the image stops before the first column for that pass, it goes on to the next pass.

You could do a similar check for the height but it isn't necessary because a pass height of 0 just means the pass is effectively skipped anyway.

By the way, this calculation:

Code: Select all

interlace_width = PNG_Header->width \ widthfactor(pass)
If (PNG_Header->width mod widthfactor(pass)) > xoff(pass) Then
    interlace_width += 1
End If

Could be written more simply as this:

Code: Select all

interlace_width = (PNG_Header->width + widthfactor(pass) - xoff(pass) - 1) \ widthfactor(pass)
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Postby yetifoot » May 05, 2006 14:32

ahh. thanks for the tip, i had tried doing

Code: Select all

If PNG_Header->width < xoff(pass)


or something similar, so i'll try out your tip.

I just love a good challenge, and Adam7 has definately been that!
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Postby yetifoot » May 05, 2006 18:15

Thanks for that counting_pine, works a treat. I still had another bug to iron out. All my loops like

Code: Select all

For pixel = 0 To 7

Next pixel


were of course buggy if an image width was not a multiple of the bitsize, and that was destroying the heap. I hope now i too can sleep easy, without 8x8 grids of numbers floating into my mind!
Thrawn89
Posts: 477
Joined: Oct 08, 2005 13:12

Postby Thrawn89 » May 05, 2006 20:43

Update:

16Bps Interlaced Images fixed [It wasnt accurately predicting the size of RawScanLine()...apparently that code was dependent on the other bugs to work, now that they were fixed, it broke...lol now I see why I had to make it do odd behavor for no apparent reason before...*sigh*], now working on the 1 Bps

EDIT: Oh and I also made the Paeth Predictor inline as yetifoot suggested, which turned out good...Im thinking of also making the crc table inline, instead of having it calculate it....

~Thrawn~
counting_pine
Site Admin
Posts: 6174
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » May 05, 2006 21:30

Thrawn89 wrote:...Im thinking of also making the crc table inline, instead of having it calculate it....

Don't, it'll just add more bloat to the code size and it doesn't take long to calculate.
Personally, I'd suggest you use the crc32() function that's included with zlib. If you look at the Declare it should be pretty much self explanatory, or you could just look at how I did it in my PNG screenshot function.

Are you updating the zip file you linked to in your first post, or are you just planning to do another release at some point?
Thrawn89
Posts: 477
Joined: Oct 08, 2005 13:12

Postby Thrawn89 » May 05, 2006 21:38

Eh, alright, but I already got some pretty efficent code for the crc that I ported from the Internet Draft, so no need to make another function call to zlib...

Oh, I havent updated the zip file yet, once I get this 1Bps bug fixed, I'll clean up the code once and upload it...give me a little while...

@counting_pine: Check email

~Thrawn~
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Postby yetifoot » May 06, 2006 13:07

I agree with counting_pine about using the crc32 from zlib (although i didn't!), if you're relying on zlib anyway for uncompress, you may as well use their crc function. I'm not sure but it will probably be faster aswell (although its probably a tiny difference most of the time). I converted my one to ASM at one point, and it still didn't make much difference(although i'm a novice asm coder, so maybe that may make the difference) here
voodooattack
Posts: 605
Joined: Feb 18, 2006 13:30
Location: Alexandria / Egypt
Contact:

Postby voodooattack » May 06, 2006 14:23

yetifoot wrote:I agree with counting_pine about using the crc32 from zlib (although i didn't!), if you're relying on zlib anyway for uncompress, you may as well use their crc function. I'm not sure but it will probably be faster aswell (although its probably a tiny difference most of the time). I converted my one to ASM at one point, and it still didn't make much difference(although i'm a novice asm coder, so maybe that may make the difference) here


hmm i just had to test this..

Code: Select all

10MB:
crc32_update_asm(): 34ms
crc32_update(): 266ms

50MB:
crc32_update_asm(): 175ms
crc32_update(): 1186ms

500MB:
crc32_update_asm(): 18283ms
crc32_update(): 44398ms

1GB:
crc32_update_asm(): 86139ms (~1 minute)
crc32_update(): 257702ms (~4 minutes)


this test was done on a P4 2.4GHz, with 512 DDR memory.. (Windows XP MC2005 with SP2)
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Postby yetifoot » May 06, 2006 15:02

wow, it doesnt make anywhere near that much difference on my machine (P4 1.8Ghz), here its only about a 5-10% increase.
counting_pine
Site Admin
Posts: 6174
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » May 06, 2006 16:20

@voodattack, maybe this is a dumb question, but are you compiling with anything like -g or -exx? That might slow down the non-asm code quite a bit.

Code: Select all

' Translated to FreeBASIC from RFC 1952 by yetifoot
'Small changes by counting_pine
'
' crc should be initialized to 0
'
' Dim crc As uInteger
' Dim teststr As String
'
'   teststr = "Hello"
'   crc = 0
'   crc = crc32_update(crc, strptr(teststr), Len(teststr))
'   Print HEX$(crc)

Option Explicit

#include "zlib.bi"

Dim Shared crc_table(0 to 255) As uInteger

sub make_crc_table constructor
 
  dim as uinteger i, j, k
 
  for i = 0 to 255
    k = i
    for j = 1 to 8
      k = (k shr 1) xor (-(k and 1) and &hedb88320)
    next j
    crc_table(i) = k
  next i
 
end sub


Function crc32_update(ByVal crc As uInteger, _
                      ByVal buf As Any ptr, _
                      ByVal buf_len As Integer) As uInteger
                      Dim c As uInteger
                      Dim n As Integer
 
  c = crc XOR 4294967295
  While n < buf_len
    c = crc_table((c XOR cptr(uByte ptr, buf)[n]) AND 255) XOR (c shr 8)
    n += 1
  Wend
  Return c XOR 4294967295
End Function

Function crc32_update_asm(ByVal crc As uInteger, _
                          ByVal buf As Any ptr, _
                          ByVal buf_len As uInteger) As uInteger
  ASM
    mov eax, dword ptr [crc]
    xor eax, -1
   
    mov ecx, dword ptr [buf_len]
    jz crc_end_loop
    mov esi, dword ptr [buf]
    xor ecx, ecx
    crc_loop:
    '
    movzx ebx, byte ptr [esi+ecx]
    xor ebx, eax
    and ebx, 255
    shr eax, 8
    xor eax, dword ptr [crc_table+ebx*4]
    '
    inc ecx   
    cmp dword ptr [buf_len], ecx
    jne crc_loop
    crc_end_loop:
   
    xor eax, -1
    mov dword ptr [ebp-4], eax
  End ASM
End Function

const buffersize as integer = 1024 * 1024 * 100
Dim buffer As Any ptr
Dim As Double t1, t2
Dim As Integer i
Dim As uInteger crc
 
  buffer = Allocate(buffersize)
  if buffer = 0 then end
 
  For i = 0 To buffersize - 1
    cptr(uByte ptr, buffer)[i] = int(rnd * 256)
  Next i
 
  print "crc_update_asm"
 
  t1 = Timer
  crc = crc32_update_asm(0, buffer, buffersize)
  t2 = Timer
 
  Print Hex$(crc, 8)
  Print cInt((t2 - t1) * 1000); "ms"
 
  print
  print "crc_update"
 
  t1 = Timer
  crc = crc32_update(0, buffer, buffersize)
  t2 = Timer
 
  Print Hex$(crc, 8)
  Print cInt((t2 - t1) * 1000); "ms"
 
  print
  print "zlib's crc32"
 
  t1 = Timer
  crc = crc32(0, buffer, buffersize)
  t2 = Timer
 
  Print Hex$(crc, 8)
  Print cInt((t2 - t1) * 1000); "ms"
 
  sleep


Here's my slightly modified version. It includes the Zlib CRC32 routine, and randomizes the whole of the buffer instead of the first byte. (This takes considerably longer than any of the CRC checks.)

I reduced the code size by replacing the crc table intialisation constants with a constructor that does the same thing.

I also made the buffer size a constant, for easy changing.
1GB proved too much for my PC so I didn't test that. On 100MB though, ZLib won out.

Code: Select all

crc_update_asm
C764FAB7
 449ms

crc_update
C764FAB7
 721ms

zlib's crc32
C764FAB7
 229ms
voodooattack
Posts: 605
Joined: Feb 18, 2006 13:30
Location: Alexandria / Egypt
Contact:

Postby voodooattack » May 06, 2006 16:30

probably because of the available physical memory :)

the larger the buffer the more it'll be swapped to/from the pagefile, at smaller buffer sizes, this doesn't make a big difference, but, if the buffer goes larger, swapping is necessary and this slows down execution.. and this is where the asm code shows better performance :D

i also noticed that your asm proc is fully utilizing my cpu (100% usage), while the fb version doesn't..

EDIT:
@counting_pine, i only use -r to check the asm source when i'm done :)

btw, here's the 1GB results from your version:

Code: Select all

crc_update_asm
6EF49D5D
 152494ms

crc_update
6EF49D5D
 204161ms

zlib's crc32
6EF49D5D
 153504ms
counting_pine
Site Admin
Posts: 6174
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » May 06, 2006 17:13

Wow, that's pretty close.

I think overall, it's best just to use the zlib crc32 function, rather than reinventing the wheel.

The only reason not to use it is if you need all the speed you can get, and you can write asm code that will definitely be faster.
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Postby yetifoot » May 06, 2006 17:36

Yeah, looks like the zlib one is the best choice. The code that voodoo was testing was different aswell, as i recoded it a bit. The strange bit for me was that the old code did seem faster even though it did some (seemingly) inefficent things. This may be down to pipeline/cache/alignment stuff which i haven't learnt about yet. (another good reason to stick with the zlib one!)

just for reference heres the code voodoo was testing

here

heres the modded version counting_pine was using (with the fix for the rnd)

here

Return to “Projects”

Who is online

Users browsing this forum: No registered users and 1 guest