Huh.. I'll put that down to "cursor jump".. Fixed now,
It may be possible to increase the speed further, since it does a floatingpoint multiply/divide for every horizontal pixel merge. But perhaps it's good enough for now.
From a parallelization perspective, it might be difficult to sort out the dependencies. There are three different kinds of row operations:
 scalerow (expands a source row to fit a destination row, no dependencies)
 copyrow (copies the last scaled row, depends on the presence of the previous row)
 mergerow (merges the justscaled row into the previous (scaled or copied) row, depends on the presence of both)
Whenever a row is scaled, it must also be merged into the row before, unless there is a clean break between the two rows, then there is no merging required.
It could be split cleanly into sections if the source and destination height share a divisor, e.g. if the source and destination height are both divisible by 2, then there will be a clean vertical break halfway down the image, and both halves can be scaled separately.
It could be split in other ways, but every section will start with a scalerow(). And that scaled row may or may not need to be merged into the row before it, once the previous section has finished rendering.
Does anyone know what algo XP > full screen DOS?

 Site Admin
 Posts: 5893
 Joined: Jul 05, 2005 17:32
 Location: Manchester, Lancs
Re: Does anyone know what algo XP > full screen DOS?
counting_pine wrote:It may be possible to increase the speed further, since it does a floatingpoint multiply/divide for every horizontal pixel merge. But perhaps it's good enough for now.
More than enough in fact. Especially considering the OP left slamming the door (if at all =D)
Little speed gain (~30 FPS): replace the divisions by a multiplication by its reciprocal (especially if it's a constant like here):
Code: Select all
#include "crt.bi"
function merge(c1 as ulong, c2 as ulong, w as single) as ulong
dim as integer r1, g1, b1, r2, g2, b2
dim as integer m = int(w * 256)
assert(m >= 0 and m < 256)
r1 = c1 shr 16 and 255: r2 = c2 shr 16 and 255
g1 = c1 shr 8 and 255: g2 = c2 shr 8 and 255
b1 = c1 and 255: b2 = c2 and 255
r1 += ((r2  r1) * m) shr 8
g1 += ((g2  g1) * m) shr 8
b1 += ((b2  b1) * m) shr 8
return rgb(r1, g1, b1)
end function
sub scalerow(dp as ulong ptr, dw as uinteger, sp as ulong ptr, sw as uinteger)
dim as uinteger dxsw = 0
dim as ulong c, c2
dim as single rsw = 1 / sw
c = *sp
for dx as uinteger = 0 to dw1
dxsw += sw
select case dxsw
case is <= dw
'' output current pixel
*dp = c
if dxsw = dw then
dxsw = 0: sp += 1
'' move to next pixel
c = *sp
end if
case else 'is > dw
'' merge with next pixel
sp += 1: dxsw = dw
c2 = *sp
'*dp = merge(c, c2, (dxsw / sw))
*dp = merge(c, c2, (dxsw * rsw))
c = c2
end select
dp += 1
next dx
end sub
sub copyrow(dp as ulong ptr, sp as ulong ptr, w as uinteger)
memcpy(dp, sp, w*4)
end sub
sub mergerow(dp as ulong ptr, sp as ulong ptr, wid as uinteger, w as single)
dim as integer r1, g1, b1, r2, g2, b2
dim as integer m = int(w * 256)
assertwarn(m >= 0 and m < 256)
for i as integer = 0 to wid1
r1 = dp[i] shr 16 and 255: r2 = sp[i] shr 16 and 255
g1 = dp[i] shr 8 and 255: g2 = sp[i] shr 8 and 255
b1 = dp[i] and 255: b2 = sp[i] and 255
r1 += ((r2  r1) * m) shr 8
g1 += ((g2  g1) * m) shr 8
b1 += ((b2  b1) * m) shr 8
dp[i] = rgb(r1, g1, b1)
next i
end sub
sub scaleimg(dp as any ptr, dw as uinteger, dh as uinteger, dpitch as uinteger, _
sp as any ptr, sw as uinteger, sh as uinteger, spitch as uinteger)
dim as uinteger dysh = 0
dim as single rsh = 1 / sh
for dy as integer = 0 to dh1
if dysh < sh then
scalerow(dp, dw, sp, sw)
if dysh > 0 then
'mergerow(dp  dpitch, dp, dw, dysh / sh)
mergerow(dp  dpitch, dp, dw, dysh * rsh)
end if
else
'copyrow(dp, dp  dpitch, dw)
memcpy( dp, dp  dpitch, dw * sizeOf( ulong ) )
end if
dysh += sh
if dysh >= dh then
dysh = dh
sp += spitch
end if
dp += dpitch
next dy
end sub
screenres 1024, 768, 32
dim as any ptr img = imagecreate(320, 200)
for y as integer = 16 to 199
for x as integer = 0 to 319
pset img, (x, y), (x + y*1) mod 2 <> 0
next x
next y
draw string img, (0, 0), "Hello world! \/[](){}<>_=+"
dim as integer dw, dh, dpitch
dim as any ptr dp
screeninfo dw, dh, ,, dpitch
dim as integer sw, sh, spitch
dim as any ptr sp
imageinfo img, sw, sh, , spitch, sp
dim as double t, sum
dim as integer count
do
dim as double t = timer
screenlock
t = timer()
dp = screenptr
scaleimg(dp, dw, dh, dpitch, sp, sw, sh, spitch)
t = timer()  t
screenunlock
sum += t
count += 1
windowtitle "Time per frame: " & int( 1 / ( sum / count ) )
sleep 1
loop until len(inkey)
imagedestroy img
I don't think that parallelizing will further improve this. Parallelizing rendering is tricky, and often you end up with highly parallelized but surprisingly slower code. What you gain in parallelization, you lose in synchronization...
Who is online
Users browsing this forum: No registered users and 1 guest