A general string sort enhancement

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
Post Reply
adhay
Posts: 158
Joined: May 12, 2006 17:52

A general string sort enhancement

Post by adhay »

Any sorting algorithm can sort twenty-six 1000 item arrays much faster than it can sort one 26000 item array.
For example using the ubiquitous MRBubble:

Code: Select all

#define NUMSTRINGS 26000

RANDOMIZE TIMER

'Create NUMSTRINGS strings of 10 alpha caps
DIM AS STRING sortlist(NUMSTRINGS)
DIM AS STRING sortbox(26,NUMSTRINGS/10) ' On avg, each subarray will
'contain NUMSTRINGS/26 strings. NUMSTRINGS/10 should insure sufficient space  

DIM AS INTEGER boxcounter(26)
DIM AS INTEGER i, j, x

FOR i=1 TO NUMSTRINGS
    FOR j=1 TO 10
        sortlist(i)+= CHR$(65+INT(RND*26))
    NEXT
    'as each string is received, it is posted to it's initial letter box
    x=ASC(sortlist(i))-64
    boxcounter(x)+=1  'holds the number of elements in each subarray
    sortbox(x, boxcounter(x))=sortlist(i)
NEXT

DIM AS DOUBLE start_time
start_time=TIMER

DIM AS INTEGER ii, jj, flag
FOR ii=1 TO 26
    IF boxcounter(ii)>1 THEN 'we need to sort the subarray
    'sort starts here
        flag=1
        DO WHILE flag=1
            flag=0
            FOR jj=boxcounter(ii) TO 2 STEP -1
                IF sortbox(ii,jj)<sortbox(ii,jj-1) THEN
                    SWAP sortbox(ii,jj),sortbox(ii,jj-1)
                    flag=1
                ENDIF
            NEXT
        LOOP
    ENDIF
NEXT
?TIMER-start_time
'FOR ii=1 TO 26
'    IF boxcounter(ii)>0 THEN
'        FOR jj=1 TO boxcounter(ii)
'            ? sortbox(ii,jj)
'        NEXT
'    ENDIF
'NEXT
SLEEP
END
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Post by counting_pine »

There are a few things to note here.
- In the worst case, all the strings will begin with the same letter, and the array subdivison will need to big enough to hold every string.

- The speed increase will be much better with Bubblesort than with a more efficient algorithm such as Quicksort.
In the average case, Quicksort can manage a similar level of subdivison after about the fifth pass over the data.

But assuming the data is pretty uniform (which it should be in your case), it's a good optimisation, and should reach that level of subdivision faster than a generic Quicksort algorithm would.

Technically it also means that you can then skip the first character in string comparisons, although it would be hard to actually exploit a speed increase from that.
jevans4949
Posts: 1186
Joined: May 08, 2006 21:58
Location: Crewe, England

Post by jevans4949 »

I've used this technique before, but with linked lists. rather than arrays. it then doesn't matter if the lists are unbalanced.

With a linked list, you can also sort stuff as you load it.

One point; if you are processing data that isn't clean, you may want a 27th array for dud items.
Post Reply