How would you find the shortest distance in a set of points?

Tourist Trap · Post by **Tourist Trap** » Jun 28, 2020 15:06

Hello,

so as said in the tittle, how would you find the shortest distance in a set of points, in a fast way.

So far here is my try:

Code: Select all

type POINT2D
    as single   x
    as single   y
end type

dim as POINT2D   xy(1 to 1000)

'set up 1000 points
randomize TIMER()
for index as integer = lBound(XY) to uBound(XY)
    xy(index).x = 50 + 500*rnd()
    xy(index).y = 50 + 500*rnd()
next index

#macro _dist(xy1, xy2)
    sqr(((xy1.x) - (xy2.x))*((xy1.x) - (xy2.x)) + ((xy1.y) - (xy2.y))*((xy1.y) - (xy2.y)))
#endMacro

screenRes 600,600,32

dim as single   dist
dim as single   minDistanceSofar = 1e+9
for index as integer = lBound(XY) to uBound(XY) - 1
    line (xy(index).x, xy(index).y)-(xy(index + 1).x, xy(index + 1).y), rgb(100,200,200)
    if (xy(index + 1).x - xy(index).x)>minDistanceSofar then
        continue for
    end if
    if (xy(index + 1).y - xy(index).y)>minDistanceSofar then
        continue for
    end if
    '
    dist = _dist(xy(index + 1), xy(index))
    if dist<minDistanceSofar then
        minDistanceSofar = dist
    end if
next index

? "min distance found = ", minDistanceSofar

getKey()

Any other idea?

edit: Hum, yes there is something weird, missing a loop or something in the above example. Every couple of points should be tested, not only adjacent ones in index order of course.

jj2007 · Post by **jj2007** » Jun 28, 2020 16:15

- avoid the square roots, they are very slow
- check to what extent you can use the simple sum dx+dy as a proxy, e.g. to identify the best 10 matches
- with these 10 matches, use the exact method

Tourist Trap · Post by **Tourist Trap** » Jun 28, 2020 16:20

jj2007 wrote:- avoid the square roots, they are very slow
- check to what extent you can use the simple sum dx+dy as a proxy, e.g. to identify the best 10 matches
- with these 10 matches, use the exact method

I see, first try another distance formula than the pythagorean stuff if I follow you well. Of course if all is positive number, why not something like dx + dy? I have to give it a try anyway. Thanks.

Post by **counting_pine** » Jun 28, 2020 16:33

You could try partitioning the points into a grid of N*N cells. Then you’d only have to check the points in each cell against points in the same cell and neighbouring cells.

Note, depending on the distribution of points, you might find that most points are clustered together in the same cell. In that case you’d have to be more creative about how the cells are partitioned.

Post by **paul doe** » Jun 28, 2020 16:35

Tourist Trap wrote: I see, first try another distance formula than the pythagorean stuff if I follow you well. Of course if all is positive number, why not something like dx + dy? I have to give it a try anyway. Thanks.

Just use the squared distance( dx * dx + dy * dy ) to sort them. And you could use some fast data structure (like a Binary Heap, aka Priority Queues) to get the points desired:

add all points to the Binary Heap/Priority Queue, using the squared distance as the priority.
after all are added, popping the first will give you the nearest, popping again will give you the second nearest, and so on and so forth.

This, combined with a spatial partitioning scheme like counting_pine above suggests, should be pretty fast. Binary heaps are pretty easy to implement efficiently to boot, and are used extensively in eg. path finding algorithms (A*)

dodicat · Post by **dodicat** » Jun 28, 2020 20:43

Using the squared distance can yield wrong results, or so it seems here.
I have it set up for the squared distance (line 68 or so).
For the true distance the parameter is 0
You can see the optimised distance increasing after iteration 572 using the the squared distance comparison.

Code: Select all

 

#include "crt.bi" 

Type pt
    As double x,y
End Type

Sub setup(points() As pt)
    #define Intrange(f,l) Int(Rnd*(((l)+1)-(f)))+(f)
    For n As long=Lbound(points) To Ubound(points)
        points(n)=Type<pt>(IntRange(10,790),IntRange(10,590))
    Next n
End Sub


Function distance(points() As pt,f as long=0) As double
    #define rawlength(a,b) ((a.x-b.x)*(a.x-b.x)+(a.y-b.y)*(a.y-b.y))
    Dim As double total,L
    For n As long=Lbound(points) To Ubound(points)-1
        L= rawlength(points(n),points(n+1))
       if f=0 then total+=sqr(L) else total+=L
    Next n
    Return total
End Function

Sub shuffle(a() As pt)
    #define range(f,l) Int(Rnd*((l+1)-(f)))+(f)
    For n As long = Lbound(a) To Ubound(a)-1
        Swap a(n), a(range(n,Ubound(a)))
    Next n
End Sub


Sub show(c() As pt,clr As Ulong=Rgb(255,255,255))
    pset(c(1).x,c(1).y),clr
    For n As Integer=Lbound(c)+1 To Ubound(c)
        Line -(c(n).x,c(n).y),clr
    Next n
End Sub

sub copymem(toa() as pt,froma() as pt)
    for n as long=lbound(toa) to ubound(toa)
        toa(n)=froma(n)
    next
    end sub


'===================================
redim as pt p(1 to 1000)
redim as pt opt(1 to ubound(p))'an optimised array
redim as pt orig(1 to ubound(p))'a copy of the original to view/compare the lines (not used here)

dim as long size=ubound(p)*sizeof(pt)
dim as double d=1e20,dd

setup(p())
'memcpy(@orig(1),@p(1),size) 'to see the lines (not used here)
copymem(orig(),p())
dim as double t=timer
dim as double start=(distance(p()))

print "original distance ";start
print "pseudo dist","array dist"
print d
for n as long=1 to 20000
    shuffle(p())
    dd=distance(p(),1) '<<<  flag for distance method 0 (true)or 1(pseudo)
    if d > dd then 
    d=dd
    'memcpy(@opt(1),@p(1),size)
    copymem(opt(),p())
    print d,distance(opt(),0),n
    end if
next n
print
print "optimized distance";
print distance(opt())
print
print timer-t; " seconds"
    

sleep
end
screen 19,32
show(orig(),rgb(50,50,50))
show(opt(),rgb(255,255,255))
sleep

To be honest I was surprised at this.
The problem was not memcpy, I changed it to a direct copy.

badidea · Post by **badidea** » Jun 28, 2020 22:06

Tourist Trap wrote:edit: Hum, yes there is something weird, missing a loop or something in the above example. Every couple of points should be tested, not only adjacent ones in index order of course.

Yes, that is weird, why? I would expect a nested for loop. And be careful not to test each pair twice and don't test against self.

dodicat wrote:Using the squared distance can yield wrong results, or so it seems here

That is even weirder. Mathematically not possible. And a doubt some floating point rounding is the cause.

dodicat · Post by **dodicat** » Jun 28, 2020 22:39

does the same with ulongint.
dim as ulongint d=1e18,dd (and all doubles ulongint except the timer)
1e20 is too big for ulongint.

Post by **paul doe** » Jun 29, 2020 1:09

dodicat wrote:Using the squared distance can yield wrong results, or so it seems here.
...

That would depend on how it's used. If they're used just for comparisons (is this closer/farther than that?) it's exactly the same. The problem with using Manhattan Distance for this sort of comparisons is that the distance will frequently overshoot. This is seen more clearly in, say, an A* implementation: using squared distance yields better results (it's a better heuristic).

Code: Select all

#define rndRange( aMin, aMax ) _
  ( rnd() * ( ( aMax + 1 ) - aMin ) + aMin )
#define sqDistance( p1, p2 ) _
  ( ( p2.x - p1.x ) ^ 2 + ( p2.y - p1.y ) ^ 2 )
#define manhattanDistance( p1, p2 ) _
  ( abs( p2.x - p1.x ) + abs( p2.y - p1.y ) )

type _
  Point
 
  as integer _
    x, y
end type

type _
  Bounds
 
  as integer _
    minX, maxX, _
    minY, maxY
end type

function _
  randomPoint( _
    byref aBound as Bounds ) _
  as Point
 
  return( type <Point>( _
    rndRange( aBound.minX, aBound.maxX ), _
    rndRange( aBound.minY, aBound.maxY ) ) )
end function

sub _
  withRandomPoints( _
    p() as Point, _
    byref aBound as Bounds )
 
  for _
    i as integer => lbound( p ) _
    to ubound( p )
   
    p( i ) => randomPoint( aBound )
  next
end sub

sub _
  render( _
    p() as Point )
 
  for _
    i as integer => lbound( p ) _
    to ubound( p )
   
    circle _
      ( p( i ).x, p( i ).y ), _
      3, rgba( 255, 255, 255, 255 ), , , , f
  next
end sub

sub _
  renderClosestTo( _
    byref p0 as Point, _
    p() as Point )
   
  dim as integer _
    indexSq, indexMn, _
    minSq => 999999, minMn => 999999
  
  for _
    i as integer => lbound( p ) _
    to ubound( p )
    
    dim as integer _
      distSq => sqDistance( p0, p( i ) ), _
      distMn => manhattanDistance( p0, p( i ) )
    
    if( distSq < minSq ) then
      minSq => distSq
      indexSq => i
    end if
    
    if( distMn < minMn ) then
      minMn => distMn
      indexMn => i
    end if
  next
  
  if( indexSq = indexMn ) then
    circle _
      ( p( indexSq ).x, p( indexSq ).y ), _
      6, rgba( 0, 192, 0, 255 )
    circle _
      ( p( indexMn ).x, p( indexMn ).y ), _
      3, rgba( 255, 0, 0, 255 ), , , , f
  else
    circle _
      ( p( indexSq ).x, p( indexSq ).y ), _
      3, rgba( 0, 192, 0, 255 ), , , , f
    circle _
      ( p( indexMn ).x, p( indexMn ).y ), _
      3, rgba( 255, 0, 0, 255 ), , , , f
  end if
end sub

var _
  bounds => type <Bounds> ( 0, 800, 0, 600 )

screenRes( bounds.maxX, bounds.maxY, 32 )
randomize()

dim as integer _
  count => 100

dim as Point _
  points( 0 to count - 1 )

withRandomPoints( points(), bounds )

dim as integer _
  mx, my

do
  getMouse( mx, my )
 
  screenLock()
    cls()
    render( points() )
    renderClosestTo( _
      type <Point>( mx, my ), _
      points() )
  screenUnlock()
 
  sleep( 1, 1 )
loop until( len( inkey() ) )

jj2007 · Post by **jj2007** » Jun 29, 2020 1:47

I couldn't resist: a graphical implementation of the problem. With 800 points, it takes 3 milliseconds on my trusty Core i5 to generate the random points and find the closest pair.

dodicat · Post by **dodicat** » Jun 29, 2020 9:55

Thanks jj2007, paul doe
I have lost all faith in squared distance now, but I use it here and it seems OK.

Code: Select all

 

type pt
    as double x,y
end type

#define rawlength(a,b) ((a.x-b.x)*(a.x-b.x)+(a.y-b.y)*(a.y-b.y))


function getpair(p() as pt) as pt
dim as pt result
dim as double min=1000000
dim as double d
for n1 as long=lbound(p) to ubound(p)-1
    for n2 as long=n1+1 to ubound(p)
        d=rawlength(p(n1),p(n2))
        if min>d then min=d:result=type(n1,n2)
    next n2
next n1
return result
end function

Sub setup(points() As pt)
    #define range(f,l) Rnd*((l)-(f))+(f)
    For n As long=Lbound(points) To Ubound(points)
        points(n)=Type<pt>(Range(10,790),Range(10,590))
    Next n
End Sub

sub show(p() as pt)
    for n as long=lbound(p) to ubound(p)
        circle(p(n).x,p(n).y),2,rgb(255,255,255)
    next n
    end sub

dim as pt z(1 to 800),res
randomize 
screen 19,32,,64
do
    cls
    dim as double t=timer
setup(z())
show(z())
res=getpair(z())
print "time (secs)",,"seperation (pixels)"
var seperation=sqr(rawlength(type<pt>(z(res.x).x,z(res.x).y),type<pt>(z(res.y).x,z(res.y).y)))
print timer-t,seperation

dim as pt half=type((z(res.x).x+z(res.y).x)/2,(z(res.x).y+z(res.y).y)/2)

circle(half.x,half.y),10,rgba(200,0,0,100),,,,f
sleep
loop until inkey=chr(27)

sleep

jj2007 · Post by **jj2007** » Jun 29, 2020 13:17

I see no problem with squared distance - what could possibly go wrong??
My new version does 1,000 points in slightly less than 2 milliseconds.

dodicat · Post by **dodicat** » Jun 29, 2020 15:38

Hi jj2007
I found a problem (with squared distance) in my first snippet in this thread.
I am bamboozled.
I will never trust it again, from now on I am a square rooter.
I'll maybe come back to squared distance in the near future, .. depends ..

jj2007 · Post by **jj2007** » Jun 29, 2020 16:29

dodicat wrote:I found a problem (with squared distance) in my first snippet in this thread.

I saw that one but didn't understand at all what you are doing, or what the problem is. Some comments might be helpful ;-)

Post by **fxm** » Jun 29, 2020 18:10

@dodicat,

We can not compare:
SUMi(Xi^2)
with:
SUMi(SQR(Xi^2))
because the SQR function induces a non-linearity in the sum.

Examples where that fails:

X1 = 1, X2 = 6
SUMi(Xi^2) = 37
SUMi(SQR(Xi^2)) = 7

X1 = 3, X2 = 5
SUMi(Xi^2) = 34
SUMi(SQR(Xi^2)) = 8

Between these two examples, SUMi(Xi^2) decreases while SUMi(SQR(Xi^2)) increases.

How would you find the shortest distance in a set of points?

How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?

Re: How would you find the shortest distance in a set of points?