Does anyone know what algo XP -> full screen DOS?

Windows specific questions.
counting_pine
Site Admin
Posts: 5893
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Does anyone know what algo XP -> full screen DOS?

Postby counting_pine » Dec 30, 2017 20:33

Huh.. I'll put that down to "cursor jump".. Fixed now,
It may be possible to increase the speed further, since it does a floating-point multiply/divide for every horizontal pixel merge. But perhaps it's good enough for now.

From a parallelization perspective, it might be difficult to sort out the dependencies. There are three different kinds of row operations:
- scalerow (expands a source row to fit a destination row, no dependencies)
- copyrow (copies the last scaled row, depends on the presence of the previous row)
- mergerow (merges the just-scaled row into the previous (scaled or copied) row, depends on the presence of both)

Whenever a row is scaled, it must also be merged into the row before, unless there is a clean break between the two rows, then there is no merging required.
It could be split cleanly into sections if the source and destination height share a divisor, e.g. if the source and destination height are both divisible by 2, then there will be a clean vertical break halfway down the image, and both halves can be scaled separately.

It could be split in other ways, but every section will start with a scalerow(). And that scaled row may or may not need to be merged into the row before it, once the previous section has finished rendering.
paul doe
Posts: 361
Joined: Jul 25, 2017 17:22
Location: Argentina

Re: Does anyone know what algo XP -> full screen DOS?

Postby paul doe » Dec 30, 2017 21:09

counting_pine wrote:It may be possible to increase the speed further, since it does a floating-point multiply/divide for every horizontal pixel merge. But perhaps it's good enough for now.

More than enough in fact. Especially considering the OP left slamming the door (if at all =D)

Little speed gain (~30 FPS): replace the divisions by a multiplication by its reciprocal (especially if it's a constant like here):

Code: Select all

#include "crt.bi"

function merge(c1 as ulong, c2 as ulong, w as single) as ulong
    dim as integer r1, g1, b1, r2, g2, b2
    dim as integer m = int(w * 256)

    assert(m >= 0 and m < 256)

    r1 = c1 shr 16 and 255: r2 = c2 shr 16 and 255
    g1 = c1 shr  8 and 255: g2 = c2 shr  8 and 255
    b1 = c1        and 255: b2 = c2        and 255

    r1 += ((r2 - r1) * m) shr 8
    g1 += ((g2 - g1) * m) shr 8
    b1 += ((b2 - b1) * m) shr 8

    return rgb(r1, g1, b1)
end function

sub scalerow(dp as ulong ptr, dw as uinteger, sp as ulong ptr, sw as uinteger)
    dim as uinteger dxsw = 0
    dim as ulong c, c2
      dim as single rsw = 1 / sw
      
    c = *sp
   
    for dx as uinteger = 0 to dw-1

        dxsw += sw
        select case dxsw
        case is <= dw
            '' output current pixel
            *dp = c
            if dxsw = dw then
                dxsw = 0: sp += 1
                '' move to next pixel
                c = *sp
            end if
        case else 'is > dw
            '' merge with next pixel
            sp += 1: dxsw -= dw
            c2 = *sp

            '*dp = merge(c, c2, (dxsw / sw))
            *dp = merge(c, c2, (dxsw * rsw))

            c = c2
        end select
        dp += 1
    next dx
end sub

sub copyrow(dp as ulong ptr, sp as ulong ptr, w as uinteger)
    memcpy(dp, sp, w*4)
end sub

sub mergerow(dp as ulong ptr, sp as ulong ptr, wid as uinteger, w as single)
    dim as integer r1, g1, b1, r2, g2, b2
    dim as integer m = int(w * 256)

    assertwarn(m >= 0 and m < 256)

    for i as integer = 0 to wid-1
        r1 = dp[i] shr 16 and 255: r2 = sp[i] shr 16 and 255
        g1 = dp[i] shr  8 and 255: g2 = sp[i] shr  8 and 255
        b1 = dp[i]        and 255: b2 = sp[i]        and 255

        r1 += ((r2 - r1) * m) shr 8
        g1 += ((g2 - g1) * m) shr 8
        b1 += ((b2 - b1) * m) shr 8
       
        dp[i] = rgb(r1, g1, b1)
    next i
end sub

sub scaleimg(dp as any ptr, dw as uinteger, dh as uinteger, dpitch as uinteger, _
             sp as any ptr, sw as uinteger, sh as uinteger, spitch as uinteger)

    dim as uinteger dysh = 0
      dim as single rsh = 1 / sh
      
    for dy as integer = 0 to dh-1
        if dysh < sh then
            scalerow(dp, dw, sp, sw)
            if dysh > 0 then
                'mergerow(dp - dpitch, dp, dw, dysh / sh)
                mergerow(dp - dpitch, dp, dw, dysh * rsh)
            end if
        else
            'copyrow(dp, dp - dpitch, dw)
            memcpy( dp, dp - dpitch, dw *  sizeOf( ulong ) )
        end if

        dysh += sh
        if dysh >= dh then
            dysh -= dh
            sp += spitch
        end if
        dp += dpitch
    next dy

end sub

screenres 1024, 768, 32
dim as any ptr img = imagecreate(320, 200)
for y as integer = 16 to 199
    for x as integer = 0 to 319
        pset img, (x, y), (x + y*1) mod 2 <> 0
    next x
next y
draw string img, (0, 0), "Hello world! |\/|[](){}<>_-=+"

dim as integer dw, dh, dpitch
dim as any ptr dp
screeninfo dw, dh, ,, dpitch

dim as integer sw, sh, spitch
dim as any ptr sp
imageinfo img, sw, sh, , spitch, sp

dim as double t, sum
dim as integer count

do
    dim as double t = timer
    screenlock
        t = timer()
        dp = screenptr
        scaleimg(dp, dw, dh, dpitch, sp, sw, sh, spitch)
        t = timer() - t
    screenunlock
      
      sum += t
      count += 1
      
    windowtitle "Time per frame: " & int( 1 / ( sum / count ) )

    sleep 1
loop until len(inkey)

imagedestroy img

I don't think that parallelizing will further improve this. Parallelizing rendering is tricky, and often you end up with highly parallelized but surprisingly slower code. What you gain in parallelization, you lose in synchronization...

Return to “Windows”

Who is online

Users browsing this forum: No registered users and 1 guest