Global Illumination rendering. (CPU killer)

D.J.Peters · Post by **D.J.Peters** » Feb 08, 2015 16:03

Global Illumination Rendering Intro

w=320 h=320 MAX_THREADS=8 MAX_SAMPLES_PER_PIXEL=256

w=256 h=256 MAX_THREADS=1 MAX_SAMPLES_PER_PIXEL=128

C++ version by kevin beason smallpt: Global Illumination in 99 lines of C++

const as double M_PI=atn(1)*4
type Vec
  declare constructor (a as double=0,b as double=0,c as double=0)
  declare function mult(byref b as const Vec) as Vec
  declare function norm() as Vec
  declare function dot(byref b as const Vec) as double
  as double x, y, z
end type
constructor Vec(a as double,b as double,c as double)
  x=a:y=b:z=c
end constructor
operator +(l as Vec,r as Vec) as Vec
  operator = Vec(l.x+r.x,l.y+r.y,l.z+r.z)
end operator
operator -(l as Vec,r as Vec) as Vec
  operator = Vec(l.x-r.x,l.y-r.y,l.z-r.z)
end operator
operator *(l as Vec,r as double) as Vec
  operator = Vec(l.x*r,l.y*r,l.z*r)
end operator
operator \(l as Vec,r as Vec) as Vec
  operator = Vec(l.y*r.z-l.z*r.y, l.z*r.x-l.x*r.z, l.x*r.y-l.y*r.x)
end operator
function Vec.mult(byref r as const Vec) as Vec
  function = Vec(x*r.x,y*r.y,z*r.z)
end function
function Vec.norm() as Vec
  this = this * (1/sqr(x*x+y*y+z*z))
  return this
end function
function Vec.dot(byref b as const Vec) as double
  return x*b.x+y*b.y+z*b.z
end function



type Ray
  declare constructor
  declare constructor(a as Vec,b as Vec)
  as Vec o, d 
end type
constructor Ray
  d.z=-1
end constructor
constructor Ray(a as Vec,b as Vec)
  o=a:d=b
end constructor

enum Refl_t 
  DIFF, SPEC, REFR  ' material types, used in radiance()
end enum

type Sphere
  declare constructor (rad_ as double, p_ as Vec,e_ as Vec,c_ as Vec,refl_ as Refl_t)
  declare function intersect(byref r as const Ray) as double
  as double rad,rad2       ' radius
  as Vec p ' position
  as Vec e ' emission, color
  as Vec c ' color
  as Refl_t refl  ' reflection type (DIFFuse, SPECular, REFRactive)
end type
constructor Sphere(rad_ as double, p_ as Vec,e_ as Vec,c_ as Vec,refl_ as Refl_t)
  rad =rad_ : p=p_ : e=e_ : c=c_ : refl=refl_
  rad2=rad*rad
end constructor

function Sphere.intersect(byref r as const Ray) as double
  dim as Vec op = p-r.o
  dim as double t
  dim as double eps=1e-4
  dim as double b=op.dot(r.d)
  dim as double det=b*b-op.dot(op)+rad2
  if (det<0) then  return 0
  det=sqr(det)
  t=b-det : if t>eps then return t 
  t=b+det : if t>eps then return t
  return 0
end function

dim shared as sphere spheres(...) = { _ Scene:
  _ ' radius,      position,        emission,     color,  material
  Sphere(600, Vec(50,681.6-.27,81.6),Vec(32,32,32),  Vec(), DIFF), _ 'Lite
  Sphere(16.5,Vec(27,16.5,47),       Vec(),Vec(.8 ,.8 ,.8 ),SPEC), _ 'Mirr
  Sphere(16.5,Vec(73,16.5,78),       Vec(),Vec(.8 ,.8 ,.8 ),REFR), _ 'Glas
  Sphere(10  ,Vec(10,2   ,90),       Vec(),Vec(1,.5,0),DIFF), _ 'Dif
  Sphere(1e5 ,Vec(50, 1e5, 81.6),    Vec(),Vec(.50,.25,0),DIFF), _ 'Botm
  Sphere(1e5 ,Vec(50,40.8, 1e5),     Vec(),Vec(1,1,1),DIFF), _ 'Back
  Sphere(1e5 ,Vec(50,-1e5+81.6,81.6),Vec(),Vec(.5,.5,.5),DIFF), _ 'Top
  Sphere(1e5 ,Vec( 1e5+1,40.8,81.6), Vec(),Vec(.8,.2,.2),DIFF), _ 'Left
  Sphere(1e5 ,Vec(-1e5+99,40.8,81.6),Vec(),Vec(.2,.2,.8),DIFF), _ 'Rght
  Sphere(1e5 ,Vec(50,40.8,-1e5+170), Vec(),Vec()           ,DIFF)  _ 'Frnt
}

#define clamp(_x) iif(_x<0,0,iif(_x>1,1,_x))

#define toInt(_x) int(clamp(_x)^1/2.2*255+.5)


function intersect(byref r as const Ray,byref t as double,byref id as integer) as integer
  dim as double n=ubound(spheres)
  dim as double d
  dim as double inf=1e20
  t=1e20
  for i as integer=0 to n
    d=spheres(i).intersect(r)
    if(d>0 andalso d<t) then
      t=d : id=i
    end if
  next
  return (t<inf)
end function

function radiance(byref r as Ray,depth as integer) as Vec
  dim as double t       ' distance to intersection
  dim as integer id=0 ' id of intersected object
  if (intersect(r, t, id)=0) then return Vec(0,0,0) ' if miss, return black

  dim as Sphere obj = spheres(id)        ' the hit object
  dim as Vec x = r.o + r.d * t
  dim as Vec n=(x-obj.p).norm()
  dim as Vec nl=iif(n.dot(r.d)<0,n,n*-1)
  dim as Vec f=obj.c
  dim as double p = iif(f.x>f.y andalso f.x>f.z,f.x,iif(f.y>f.z,f.y,f.z)) ' max refl
  depth+=1
  if (depth>5) then
    if (rnd()<p) then 
     f=f*(1/p)
     return vec(1,0,1) 'f
    else
     return obj.e 'R.R.
    end if
  end if

  ' Ideal DIFFUSE reflection
  if (obj.refl = DIFF) then
    dim as double r1=2*M_PI*rnd()
    dim as double r2=rnd()
    dim as double r2s=sqr(r2)
    dim as Vec w=nl
    dim as Vec u= iif(abs(w.x)>.1,Vec(0,1),(Vec(1) \ w).norm())
    dim as Vec v=w \ u
    dim as Vec d = (u*cos(r1)*r2s + v*sin(r1)*r2s + w*sqr(1-r2)).norm()
    return obj.e + f.mult(radiance(Ray(x,d),depth))
  elseif (obj.refl = SPEC) then
    ' Ideal SPECULAR reflection
    return obj.e + f.mult(radiance(Ray(x,r.d-n*2*n.dot(r.d)),depth))
  end if

   ' Ideal dielectric REFRACTION
  dim as Ray reflRay = Ray(x, r.d - n * 2 * n.dot(r.d))     
  dim as integer into = (n.dot(nl)>0)    ' Ray from outside going in?
  dim as double nc=1
  dim as double nt=1.5
  dim as double nnt=iif(into,nc/nt,nt/nc)
  dim as double ddn=r.d.dot(nl)
  dim as double cos2t
   cos2t=1-nnt*nnt*(1-ddn*ddn)
  if (cos2t<0) then   ' Total internal reflection
    return obj.e + f.mult(radiance(reflRay,depth))
  end if
  dim as Vec tdir = (r.d*nnt - n*(iif(into,1,-1)*(ddn*nnt+sqr(cos2t)))).norm()
  dim as double a=nt-nc
  dim as double b=nt+nc
  dim as double R0=a*a/(b*b)
  dim as double c = 1-iif(into,-ddn,tdir.dot(n))
  dim as double Re=R0+(1-R0)*c*c*c*c*c
  dim as double Tr=1-Re
  dim as double PP=.25+.5*Re
 
  ' Russian roulette
  if depth>2 then
    if rnd()<PP then
      dim as double RP=Re/PP
      obj.e=obj.e+f.mult(radiance(reflRay,depth)*RP)
    else
      dim as double TP=Tr/(1-PP)
      obj.e=obj.e+f.mult(radiance(Ray(x,tdir),depth)*TP)
    end if
  else
    obj.e=obj.e+f.mult(radiance(reflRay,depth)*Re+radiance(Ray(x,tdir),depth)*Tr)
  end if
  return obj.e
end function

type threaddata
  as any ptr id
  as integer w,h
  as integer y
  as integer samps
  as Ray cam
  as vec cx
  as vec cy
end type

sub RenderThread(byval arg as any ptr)
  dim as threaddata ptr td=arg
  dim as integer gy=td->h-1-td->y
  line (0,gy)-(td->w,gy),rgb(255,0,0)
  for x as integer=0 to td->w-1   ' Loop cols
    dim as vec c
    for sy as integer=0 to 1 ' 2x2 subpixel rows
      for sx as integer=0 to 1 ' 2x2 subpixel cols
        dim as Vec r
        for s as integer=0 to td->samps-1
          dim as double r1
          dim as double r2
          while r1=0:r1=2*rnd():wend
          while r2=0:r2=2*rnd():wend
          dim as double dx=iif(r1<1,sqr(r1)-1,1-sqr(2-r1))
          dim as double dy=iif(r2<1,sqr(r2)-1,1-sqr(2-r2))
          dim as Vec d = td->cx*(((sx+.5+dx)/2+x)/td->w-.5) + td->cy*(((sy+.5+dy)/2+td->y)/td->h-.5) + td->cam.d
          r = r + radiance(Ray(td->cam.o+d*140,d.norm()),0)*(1./td->samps)
        next ' Camera rays are pushed ^^^^^ forward to start in interior
        c = c + Vec(clamp(r.x),clamp(r.y),clamp(r.z))*.25
      next
    next
    pset (x,gy),rgb(toInt(c.x), toInt(c.y), toInt(c.z))  
  next
  td->id=0
end sub 

'
' main
'
const MAX_THREADS      =8
const SAMPLES_PER_PIXEL=64     ' more samples per pixel = higer image quality

dim as integer w=256
dim as integer h=256
dim as string imagefile = "_b" & w & "x" & h & "x" & SAMPLES_PER_PIXEL & ".bmp"
screenres w,h,32

dim as Ray cam=Ray(Vec(50,52,295.6),Vec(0.0,-0.1,-1).norm()) ' cam pos, dir
dim as Vec cx=Vec(w*.5135/h)
dim as Vec cy=(cx \ cam.d).norm()*.5135
dim as double p1=h*0.01
dim as integer i,nThreads
dim as THREADDATA tds(MAX_THREADS-1)
for t as integer=0 to MAX_THREADS-1
  for y as integer=t to h-1 step MAX_THREADS
    windowtitle "" & int(nThreads/p1) & " %"
    dim as integer tind=-1
    while tind=-1
      if tds(i).id=0 then sleep 10:tind=i:continue while 
      i=(i+1) mod MAX_THREADS
      sleep 5
    wend
    with tds(tind)
      .w=w
      .h=h
      .y=y
      .cx=cx
      .cy=cy
      .cam=cam
      .samps=SAMPLES_PER_PIXEL
    end with
    if MAX_THREADS>1 then
      tds(tind).id=threadcreate(@RenderThread,@tds(tind))
      nThreads+=1
    else
      RenderThread(@tds(tind))
    end if
    if asc(inkey())=27 then exit for,for
  next
next
windowtitle "100 % done "
for i=0 to MAX_THREADS-1
  if tds(i).id then 
    windowtitle "wait " & i & " !"
    ThreadWait(tds(i).id)
    windowtitle "done ..."
  end if
next
bsave imagefile,0
windowtitle "img saved ..."
sleep

Post by **fxm** » Feb 08, 2015 16:32

Have you seen my 2 remarks on your similar program:
http://www.freebasic.net/forum/viewtopi ... 73#p205073
http://www.freebasic.net/forum/viewtopi ... 89#p205089)

Post by **fxm** » Feb 09, 2015 10:31

The above program compiles and runs only with a fbc version >= 1.01.0.

I slightly modified the above program to be compatible with any fbc version >= 0.90.0:

Adding a shared mutex (lock, unlock) around the graphic instructions of the thread (Line and Pset), and also around the graphic instructions (Windowtitle) of each main program loop (when the threads run).
(because gfxlib is not thread-safe before fbc version 1.00.0)
Adding parentheses in the following line:
dim as Ray cam=Ray(Vec(50,52,295.6),(Vec(0.0,-0.1,-1)).norm()) ' cam pos, dir
(because such parentheses are required before fbc version 1.01.0)
Why make a special case for 'MAX_THREADS=1'?
Moreover, 'nThreads' being not incremented, the progress counter is stuck at 0:
''if MAX_THREADS>1 then tds(tind).id=threadcreate(@RenderThread,@tds(tind)) nThreads+=1 ''else ''RenderThread(@tds(tind)) ''end if

Code: Select all

const as double M_PI=atn(1)*4
type Vec
  declare constructor (a as double=0,b as double=0,c as double=0)
  declare function mult(byref b as const Vec) as Vec
  declare function norm() as Vec
  declare function dot(byref b as const Vec) as double
  as double x, y, z
end type
constructor Vec(a as double,b as double,c as double)
  x=a:y=b:z=c
end constructor
operator +(l as Vec,r as Vec) as Vec
  operator = Vec(l.x+r.x,l.y+r.y,l.z+r.z)
end operator
operator -(l as Vec,r as Vec) as Vec
  operator = Vec(l.x-r.x,l.y-r.y,l.z-r.z)
end operator
operator *(l as Vec,r as double) as Vec
  operator = Vec(l.x*r,l.y*r,l.z*r)
end operator
operator \(l as Vec,r as Vec) as Vec
  operator = Vec(l.y*r.z-l.z*r.y, l.z*r.x-l.x*r.z, l.x*r.y-l.y*r.x)
end operator
function Vec.mult(byref r as const Vec) as Vec
  function = Vec(x*r.x,y*r.y,z*r.z)
end function
function Vec.norm() as Vec
  this = this * (1/sqr(x*x+y*y+z*z))
  return this
end function
function Vec.dot(byref b as const Vec) as double
  return x*b.x+y*b.y+z*b.z
end function



type Ray
  declare constructor
  declare constructor(a as Vec,b as Vec)
  as Vec o, d
end type
constructor Ray
  d.z=-1
end constructor
constructor Ray(a as Vec,b as Vec)
  o=a:d=b
end constructor

enum Refl_t
  DIFF, SPEC, REFR  ' material types, used in radiance()
end enum

type Sphere
  declare constructor (rad_ as double, p_ as Vec,e_ as Vec,c_ as Vec,refl_ as Refl_t)
  declare function intersect(byref r as const Ray) as double
  as double rad,rad2       ' radius
  as Vec p ' position
  as Vec e ' emission, color
  as Vec c ' color
  as Refl_t refl  ' reflection type (DIFFuse, SPECular, REFRactive)
end type
constructor Sphere(rad_ as double, p_ as Vec,e_ as Vec,c_ as Vec,refl_ as Refl_t)
  rad =rad_ : p=p_ : e=e_ : c=c_ : refl=refl_
  rad2=rad*rad
end constructor

function Sphere.intersect(byref r as const Ray) as double
  dim as Vec op = p-r.o
  dim as double t
  dim as double eps=1e-4
  dim as double b=op.dot(r.d)
  dim as double det=b*b-op.dot(op)+rad2
  if (det<0) then  return 0
  det=sqr(det)
  t=b-det : if t>eps then return t
  t=b+det : if t>eps then return t
  return 0
end function

dim shared as sphere spheres(...) = { _ Scene:
  _ ' radius,      position,        emission,     color,  material
  Sphere(600, Vec(50,681.6-.27,81.6),Vec(32,32,32),  Vec(), DIFF), _ 'Lite
  Sphere(16.5,Vec(27,16.5,47),       Vec(),Vec(.8 ,.8 ,.8 ),SPEC), _ 'Mirr
  Sphere(16.5,Vec(73,16.5,78),       Vec(),Vec(.8 ,.8 ,.8 ),REFR), _ 'Glas
  Sphere(10  ,Vec(10,2   ,90),       Vec(),Vec(1,.5,0),DIFF), _ 'Dif
  Sphere(1e5 ,Vec(50, 1e5, 81.6),    Vec(),Vec(.50,.25,0),DIFF), _ 'Botm
  Sphere(1e5 ,Vec(50,40.8, 1e5),     Vec(),Vec(1,1,1),DIFF), _ 'Back
  Sphere(1e5 ,Vec(50,-1e5+81.6,81.6),Vec(),Vec(.5,.5,.5),DIFF), _ 'Top
  Sphere(1e5 ,Vec( 1e5+1,40.8,81.6), Vec(),Vec(.8,.2,.2),DIFF), _ 'Left
  Sphere(1e5 ,Vec(-1e5+99,40.8,81.6),Vec(),Vec(.2,.2,.8),DIFF), _ 'Rght
  Sphere(1e5 ,Vec(50,40.8,-1e5+170), Vec(),Vec()           ,DIFF)  _ 'Frnt
}

#define clamp(_x) iif(_x<0,0,iif(_x>1,1,_x))

#define toInt(_x) int(clamp(_x)^1/2.2*255+.5)


function intersect(byref r as const Ray,byref t as double,byref id as integer) as integer
  dim as double n=ubound(spheres)
  dim as double d
  dim as double inf=1e20
  t=1e20
  for i as integer=0 to n
    d=spheres(i).intersect(r)
    if(d>0 andalso d<t) then
      t=d : id=i
    end if
  next
  return (t<inf)
end function

function radiance(byref r as Ray,depth as integer) as Vec
  dim as double t       ' distance to intersection
  dim as integer id=0 ' id of intersected object
  if (intersect(r, t, id)=0) then return Vec(0,0,0) ' if miss, return black

  dim as Sphere obj = spheres(id)        ' the hit object
  dim as Vec x = r.o + r.d * t
  dim as Vec n=(x-obj.p).norm()
  dim as Vec nl=iif(n.dot(r.d)<0,n,n*-1)
  dim as Vec f=obj.c
  dim as double p = iif(f.x>f.y andalso f.x>f.z,f.x,iif(f.y>f.z,f.y,f.z)) ' max refl
  depth+=1
  if (depth>5) then
    if (rnd()<p) then
      f=f*(1/p)
      return vec(1,0,1) 'f
    else
      return obj.e 'R.R.
    end if
  end if

  ' Ideal DIFFUSE reflection
  if (obj.refl = DIFF) then
    dim as double r1=2*M_PI*rnd()
    dim as double r2=rnd()
    dim as double r2s=sqr(r2)
    dim as Vec w=nl
    dim as Vec u= iif(abs(w.x)>.1,Vec(0,1),(Vec(1) \ w).norm())
    dim as Vec v=w \ u
    dim as Vec d = (u*cos(r1)*r2s + v*sin(r1)*r2s + w*sqr(1-r2)).norm()
    return obj.e + f.mult(radiance(Ray(x,d),depth))
  elseif (obj.refl = SPEC) then
    ' Ideal SPECULAR reflection
    return obj.e + f.mult(radiance(Ray(x,r.d-n*2*n.dot(r.d)),depth))
  end if

   ' Ideal dielectric REFRACTION
  dim as Ray reflRay = Ray(x, r.d - n * 2 * n.dot(r.d))    
  dim as integer into = (n.dot(nl)>0)    ' Ray from outside going in?
  dim as double nc=1
  dim as double nt=1.5
  dim as double nnt=iif(into,nc/nt,nt/nc)
  dim as double ddn=r.d.dot(nl)
  dim as double cos2t
  cos2t=1-nnt*nnt*(1-ddn*ddn)
  if (cos2t<0) then   ' Total internal reflection
    return obj.e + f.mult(radiance(reflRay,depth))
  end if
  dim as Vec tdir = (r.d*nnt - n*(iif(into,1,-1)*(ddn*nnt+sqr(cos2t)))).norm()
  dim as double a=nt-nc
  dim as double b=nt+nc
  dim as double R0=a*a/(b*b)
  dim as double c = 1-iif(into,-ddn,tdir.dot(n))
  dim as double Re=R0+(1-R0)*c*c*c*c*c
  dim as double Tr=1-Re
  dim as double PP=.25+.5*Re
 
  ' Russian roulette
  if depth>2 then
    if rnd()<PP then
      dim as double RP=Re/PP
      obj.e=obj.e+f.mult(radiance(reflRay,depth)*RP)
    else
      dim as double TP=Tr/(1-PP)
      obj.e=obj.e+f.mult(radiance(Ray(x,tdir),depth)*TP)
    end if
  else
    obj.e=obj.e+f.mult(radiance(reflRay,depth)*Re+radiance(Ray(x,tdir),depth)*Tr)
  end if
  return obj.e
end function

type threaddata
  as any ptr id
  static as any ptr mutex
  as integer w,h
  as integer y
  as integer samps
  as Ray cam
  as vec cx
  as vec cy
end type

dim as any ptr threaddata.mutex

sub RenderThread(byval arg as any ptr)
  dim as threaddata ptr td=arg
  dim as integer gy=td->h-1-td->y
  mutexlock(td->mutex)
    line (0,gy)-(td->w,gy),rgb(255,0,0)
  mutexunlock(td->mutex)
  for x as integer=0 to td->w-1   ' Loop cols
    dim as vec c
    for sy as integer=0 to 1 ' 2x2 subpixel rows
      for sx as integer=0 to 1 ' 2x2 subpixel cols
        dim as Vec r
        for s as integer=0 to td->samps-1
          dim as double r1
          dim as double r2
          while r1=0:r1=2*rnd():wend
          while r2=0:r2=2*rnd():wend
          dim as double dx=iif(r1<1,sqr(r1)-1,1-sqr(2-r1))
          dim as double dy=iif(r2<1,sqr(r2)-1,1-sqr(2-r2))
          dim as Vec d = td->cx*(((sx+.5+dx)/2+x)/td->w-.5) + td->cy*(((sy+.5+dy)/2+td->y)/td->h-.5) + td->cam.d
          r = r + radiance(Ray(td->cam.o+d*140,d.norm()),0)*(1./td->samps)
        next ' Camera rays are pushed ^^^^^ forward to start in interior
        c = c + Vec(clamp(r.x),clamp(r.y),clamp(r.z))*.25
      next
    next
    mutexlock(td->mutex)
      pset (x,gy),rgb(toInt(c.x), toInt(c.y), toInt(c.z))  
    mutexunlock(td->mutex)
next
  td->id=0
end sub

'
' main
'
const MAX_THREADS      =8
const SAMPLES_PER_PIXEL=64     ' more samples per pixel = higer image quality

dim as integer w=256
dim as integer h=256
dim as string imagefile = "_b" & w & "x" & h & "x" & SAMPLES_PER_PIXEL & ".bmp"
screenres w,h,32

dim as Ray cam=Ray(Vec(50,52,295.6),(Vec(0.0,-0.1,-1)).norm()) ' cam pos, dir
dim as Vec cx=Vec(w*.5135/h)
dim as Vec cy=(cx \ cam.d).norm()*.5135
dim as double p1=h*0.01
dim as integer i,nThreads
dim as THREADDATA tds(MAX_THREADS-1)
threaddata.mutex = mutexcreate()
for t as integer=0 to MAX_THREADS-1
  for y as integer=t to h-1 step MAX_THREADS
    mutexlock(threaddata.mutex)
      windowtitle "" & int(nThreads/p1) & " %"
    mutexunlock(threaddata.mutex)
    dim as integer tind=-1
    while tind=-1
      if tds(i).id=0 then sleep 10:tind=i:continue while
      i=(i+1) mod MAX_THREADS
      sleep 5
    wend
    with tds(tind)
      .w=w
      .h=h
      .y=y
      .cx=cx
      .cy=cy
      .cam=cam
      .samps=SAMPLES_PER_PIXEL
      .id=threadcreate(@RenderThread,@tds(tind))
    end with
    nThreads+=1
    if asc(inkey())=27 then exit for,for
  next
next
windowtitle "100 % done "
for i=0 to MAX_THREADS-1
  if tds(i).id then
    mutexlock(threaddata.mutex)
      windowtitle "wait " & i & " !"
    mutexunlock(threaddata.mutex)
    ThreadWait(tds(i).id)
    mutexlock(threaddata.mutex)
      windowtitle "done ..."
    mutexunlock(threaddata.mutex)
  end if
next
mutexdestroy(threaddata.mutex)
bsave imagefile,0
windowtitle "img saved ..."
sleep

MrSwiss · Post by **MrSwiss** » Feb 11, 2015 14:07

@ D.J. Peters, fxm,

Speed comparison of your code:
D.J. Peters: runtime 272.5 Seconds
fxm: runtime 67.6 Seconds

FBC Version 1.02.0, Windows 64bit, standalone (2015-01-12)
Compiler switches: -s gui -mt
Set parameters = BMP: 256 x 256 x 256, number of threads: 8

OS: Win 8.1, pro, 64bit
CPU: Intel i7-3517U (2 x phys. cores, 4 x virt. cores) @1.9 - 2.4GHz

Only added the code to measure the "RunTime" and to display it in "WindowTitle" at prog.-end.
The code added is identical in both cases, simply amazing ...

Post by **fxm** » Feb 11, 2015 14:46

Very weird!
Is that a reproducible measurement?

The only functional difference between the two codes is the adding of mutex around each graphic instruction to avoid simultaneous graphic calls (because gfxlib is thread-unsafe before fbc version 1.00.0).
But in any cases the graphic drawing time (Line(), Pset()) is very small compared to total execution time (<5%)!

On my PC (dual cores 32bit Win XP), I measure 2 execution times very close, about 200 seconds.
(FreeBASIC Compiler - Version 1.02.0 (02-06-2015), built for win32 (32bit))

angros47 · Post by **angros47** » Feb 12, 2015 12:25

To avoid killing CPU, maybe the solution is to use GPU

http://madebyevan.com/webgl-path-tracing/

Post by **fxm** » Feb 12, 2015 19:51

But the aim of this example is really how use all the CPU resources!

Now that PCs generally have multiple cores, users have lost the habit of systematically putting a 'Sleep' (in each long time loop) in their normal program (without multiple threading).
But for an advanced program with multiple threading, when the number of threads simultaneously running is greater or equal to the number of cores, 'Sleep' must be simply put back in each thread code loop if user wants avoiding to hog the CPU.

sean_vn · Post by **sean_vn** » Feb 14, 2015 0:29

Java 8 has some very nice thread related classes that makes it easy to write multithreaded code. It is all based on work stealing thread pools. I found it easy to use compared to anything else I've tried. The problem with Java though is that they still haven't really thought out memory management. You can easily crash the JVM or even the entire OS with high memory usage applications. I have D.J. Peters thread pool code for FB somewhere. I must try it sometime.

Actually I am interested in Quasi-Random number generators at the moment in relation to hash tables.
http://en.m.wikipedia.org/wiki/Low-discrepancy_sequence
In particular additive recurrence. I am only ever interested in really simple things.
It looks like the image you generated is contaminated by Gaussian noise. You could try a median filter to get rid of that. If you search through the research papers I think there are some good ways of cleaning up such images.

brybry · Post by **brybry** » Feb 14, 2015 17:03

@MrSwiss:

Is it possible you accidentally ran Joshy's code with 256 samples per pixel and fxm's code with 64 samples per pixel? I have a hard time believing your CPU could run 256 samples per pixel in 67.6 seconds.

With FBC 1.01.0, on my Core i5 4570 (4x 3.2GHz, no hyperthreading), both codes take 49.5 seconds @ 64 spp and 194 seconds @ 256 spp. All tests where run 256x256 pixels and 8 threads. Windows 7 64bit.

@all:

I've spent the past few days testing various changes to Joshy's translation of smallpt against different compiler options. The results were interesting.

The reference time I used for my speed tests was 49.5 seconds, as mentioned above. 256x256 @ 64 spp. FBC 1.01.0 using the default -fpu x87 math mode.

When I compiled with -fpu SSE (no code modifications), the time was about 11.3% faster at 43.9 seconds.

The first modification I made was to remove the constructor calls within the Vec operators. This was a very simple change to make. This also enabled the opportunity for the compiler to vectorize the math. Here are my timings:

No Constructor Calls in Operators: 37.1 seconds (25% faster)
No Constructor Calls in Operators + -fpu SSE: 35.6 seconds (28% faster)
No Constructor Calls in Operators + -fpu SSE + -vec 1: 35.3 seconds (28.6% faster)
No Constructor Calls in Operators + -fpu SSE + -vec 2: 34.9 seconds (29.5% faster)

Sadly, the vectorization didn't help too much. I believe if single-precision math was used instead of double, the vectorization would have helped more. Unfortunately, single-precision math would introduce anomalies that would require other changes to the code.

More importantly, making a few changes to the Vec operators had a significant impact over the original code. Calling the constructor introduced significant overhead.

Next, I decided to see what would happen if I actually removed the call to the operator itself, i.e. inlining all the vector math...

Inline Vec Operators: 23.4 seconds (52.7% faster)
Inline Vec Operators + -fpu SSE: 23.8 seconds (51.9% faster)
Inline Vec Operators + -fpu SSE + -vec 1: 22.2 seconds (55.2% faster)
Inline Vec Operators + -fpu SSE + -vec 2: 22 seconds (55.6% faster)

So, the code wasn't as pretty as it could have been, but the speed increase was incredible. Most programs will not benefit from "inline" functions the way this path tracer did. A conservative estimate is that 100,000,000+ operator function calls were eliminated throughout the execution of the render.

The more samples & pixels rendered will save even more time with the optimizations. Using inline Vec operators with -fpu SSE and -vec 2, rendering 256 samples per pixel, I got a time of 81.6 seconds, which is 57.9% faster than the 194 seconds mentioned above.

MrSwiss · Post by **MrSwiss** » Mar 02, 2015 12:27

brybry wrote:@MrSwiss:
Is it possible you accidentally ran Joshy's code with 256 samples per pixel and fxm's code with 64 samples per pixel? I have a hard time believing your CPU could run 256 samples per pixel in 67.6 seconds.

@brybry:
Yes, you are right, that's exactly what happened. After setting 256 samples on both, the timings differ only marginally. Sorry ...

Global Illumination rendering. (CPU killer)

Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)

Re: Global Illumination rendering. (CPU killer)