How to reduce the filesize of a compiled file?

General FreeBASIC programming questions.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

caseih wrote:
jj2007 wrote:Truth is that dense assembler code beats C any time (there is a small code cache, for example).
Beats it at what metric? A well-written C routine can easily best a naive assembler implementation, however dense. A well-written assembler routine would be approximately equal to a well-written C routine. Contrary to popular belief around here, hacking something out in assembler is not going to automatically make it faster in all cases
Make a test with less naive assembler then. The example posted there is a pretty straightforward task: read the 250k lines of the disassembled fbc.exe and count something interesting, such as "esp".
Tell me if you can beat it with FB, C or Java or File::ReadAllLines, as suggested above by St_W.
It's interesting to note that the average number of LOC written per day on average by a programmer is still under a dozen. I know it's true for me. One week of coding can generate thousands of LOC, and then two weeks of debugging and tweaking.
Absolutely ;-)
St_W
Posts: 1626
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: How to reduce the filesize of a compiled file?

Post by St_W »

jj2007 wrote:But something struck my eyes in the comments: Java still doesn’t take advantage of SSE
That information seems to be outdated. SSE & SSE2 support can be found in the changelog of this old version from the year 2003: http://www.oracle.com/technetwork/java/ ... 36374.html

Anyway, I fully agree to what caseih has written: that the manual higher level optimizations (algorithms, ...) are far more important than manual lower level optimizations, as the latter can be done by compilers. So if you (find and) use some O(n*log(n)) algorithm instead of some O(n²) one that would be far better than any optimizations in assembly language can ever be.

To return to the topic of this thread: executable size does not really influence performance in practice as long as there is not a huge difference. A big EXE might start a little bit slower, but that is usually neglicible and even more in times of SSDs and several GBs of RAM. (However, I've to admit that startup time may be an issue when the frameworks get really big like in C#/Java and need to be loaded and at some time compiled during runtime.) Storage isn't really an issue nowadays. Neither is network transfer - at least not for files below 3 MB when considering that a lot of websites are already that large. So one of the few application areas where the executable size really matters is in the demoscene world - not because of technical reasons but rather because of artificial limitations to challenge one's programming skills.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

St_W wrote:the manual higher level optimizations (algorithms, ...) are far more important than manual lower level optimizations, as the latter can be done by compilers. So if you (find and) use some O(n*log(n)) algorithm instead of some O(n²) one that would be far better than any optimizations in assembly language can ever be.
Guess what? If I have the choice between a bubble sort and a quick sort, I often take the latter one. Glad that we can agree on something ;-)
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

jj2007 wrote:Make a test with less naive assembler then. The example posted there is a pretty straightforward task: read the 250k lines of the disassembled fbc.exe and count something interesting, such as "esp".
Tell me if you can beat it with FB, C or Java or File::ReadAllLines, as suggested above by St_W.
You do realize this is an I/O bound problem, not CPU, right?

Furthermore this is a great case where assembly should not be your first choice. For example, suppose you wanted to search through 250k lines of a UTF-8-encoded text file and find the number of a particular emoticon? UTF-16?

By the way grep can search my entire hard drive for strings in an incredibly fast manner. I regularly do recursive searches through whole file trees with millions of lines of text and find the answers very quickly, as fast as my disk allows, really.
Last edited by caseih on Aug 15, 2017 23:09, edited 1 time in total.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

jj2007 wrote:
St_W wrote:the manual higher level optimizations (algorithms, ...) are far more important than manual lower level optimizations, as the latter can be done by compilers. So if you (find and) use some O(n*log(n)) algorithm instead of some O(n²) one that would be far better than any optimizations in assembly language can ever be.
Guess what? If I have the choice between a bubble sort and a quick sort, I often take the latter one. Glad that we can agree on something ;-)
Fortunately we can have our cake and eat it too. We can choose fast algorithms in portable languages and have good performance on many platforms, which is worth the small amount of overhead one might incur compared to hand-tuned assembly.

This is not to say assembly has no place. It surely does in helping build low-level OS components, but even there it's used judiciously.

You're well-versed and experienced in assembly. Most of us here aren't so skilled. So for most programmers, any time a program is slow, it's important to analyze the run time and determine where the CPU is spending it's time. 90% of cpu time is usually spent in 10% of the code. Focus on that code and understand why it's running slow and then work on means to speed it up, such as changing the algorithm entirely, trading memory for speed, etc. On after that gets you all the speed you can get, then could you think about some assembly solutions.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

caseih wrote:You do realize this is an I/O bound problem, not CPU, right?

Code: Select all

14 ms to read the file into an array
Now checking 250281 lines for 'esp':
24 ms to count 'esp'
Reading the file into a string array, yes, partly. Counting the lines that contain "esp", definitely not.
Furthermore this is a great case where assembly should not be your first choice. For example, suppose you wanted to search through 250k lines of a UTF-8-encoded text file and find the number of a particular emoticon? UTF-16?
Just use .if Instr_(L$(ct), Utf8$(emoji$)) (and since we are talking code design: of course you would translate the UTF-16 emoji outside the loop)
By the way grep can search my entire hard drive for strings in an incredibly fast manner. I regularly do recursive searches through whole file trees with millions of lines of text and find the answers very quickly, as fast as my disk allows, really.
grep is a great tool, but it's not meant for parsing a string array, as a compiler would typically do. For such tasks, grep is very slow.
caseih wrote:You're well-versed and experienced in assembly.
Thanks. As you may know, practically everybody who started with BASIC thirty years ago needed to be proficient in assembly, too.
So for most programmers, any time a program is slow, it's important to analyze the run time and determine where the CPU is spending it's time. 90% of cpu time is usually spent in 10% of the code. Focus on that code and understand why it's running slow and then work on means to speed it up, such as changing the algorithm entirely, trading memory for speed, etc. On after that gets you all the speed you can get, then could you think about some assembly solutions.
That is a great lesson. You should offer it to Microsoft: The guys who developed the snailware called "VS Community" need it desperately (my applications are blazing fast already, thanks).
BasicCoder2
Posts: 3906
Joined: Jan 01, 2009 7:03
Location: Australia

Re: How to reduce the filesize of a compiled file?

Post by BasicCoder2 »

jj2007 wrote:
caseih wrote:You're well-versed and experienced in assembly.
Thanks. As you may know, practically everybody who started with BASIC thirty years ago needed to be proficient in assembly, too..
I last used assembler on the old DOS machines and I never got to use all those high level directives,
http://psut.jo/sites/qaralleh/uplab/doc ... torial.pdf
they use now,
https://win32assembly.programminghorizo ... rials.html
The problem was unlike the old DOS machines the assembler code simply became calls to the window's api and really there was no longer any speed advantage in using it. All that cool direct to the metal stuff was no longer possible.
.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

BasicCoder2 wrote:The problem was unlike the old DOS machines the assembler code simply became calls to the window's api and really there was no longer any speed advantage in using it. All that cool direct to the metal stuff was no longer possible.
Of course it is possible! There are areas, I/O for example, where you MUST use the WinApi, but for everything else, you can go as close to the metal as you like. Btw this smalltalk might benefit from a quick test with FreeBasic. What is the FB equivalent to this:

Code: Select all

  For_ ct=0 To L$(?)-1
        .if Instr_(L$(ct), "esp")
                inc ctEsp
        .endif
  Next
Is there a ReadAllLines (or Recall as in good ol' GfaBasic) equivalent in FreeBasic? A high precision timer? Excuse my ignorance, I have little experience with FB but you have fans at the Masm32 forum ;-)
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: How to reduce the filesize of a compiled file?

Post by dodicat »

A simple Tally of characters in a string.

Code: Select all

#macro HideThisCode
  For_ ct=0 To L$(?)-1
        .if Instr_(L$(ct), "esp")
                inc ctEsp
        .endif
  Next
  #endmacro
  
  
   #define Intrange(f,l) Int(Rnd*(((l)+1)-(f))+(f))
  
  dim as long size=50000000
  
  dim as string L=string(size,0) 'create a string of (size) characters
  
  for n as long=0 to size-1
      L[n]=Intrange(97,122) 'Fill the string "a"  to  "z"
      next
      
  
  
  Function TALLY(SomeString As String,PartString As String) As long
        dim as long LenP=Len(PartString),count
        Dim As long position=Instr(SomeString,PartString)
        If position=0 Then Return 0
        While position>0
            count+=1
            position=Instr(position+LenP,SomeString,PartString)
        Wend
        return count
    End Function
    
  
  dim as double t=timer
  print Tally(L,"esp") & " times"
  print "Time taken ";timer-t
  sleep 
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

dodicat wrote:A simple Tally of characters in a string.
That is simpler than what the other example does, but it's quite OK because the bottleneck here is Instr(), which is what's working also inside Count(source, part):

include \masm32\MasmBasic\MasmBasic.inc ; download
Init
totalsize=50000000
mov ecx, totalsize
Let edi=New$(ecx)
.Repeat
    add Rand(26), 97 ; range 97... 122
    mov [edi+ecx], al
    dec ecx
.Until Sign?
PrintLine Left$(edi, 50)
NanoTimer()
Inkey Str$("Counting %i occurrences of 'esp' in the string", Count(edi, "esp")), Str$(" took %2f seconds", NanoTimer(ms)/1000)
EndOfCode


FB (I added a print Left(L, 50) to your code):

Code: Select all

iintqqhzzynrbrambmvvxhimlowgeojikhwmhrtmiqzknrspyv
2877 times
Time taken  0.1070166961545738
MB:

Code: Select all

elplktijppzomllgqzrlaususzikmccsuqpwcotverqhwgwqwo
Counting 2890 occurrences of 'esp' in the string took 0.041 seconds
The generated string looks a bit different because the two Basics use a different PRNG.

Back to topic: filesize is 35840 (FB) vs 26112 (MB). Both in the "ridiculously compact" class ;-)
Josep Roca
Posts: 564
Joined: Sep 27, 2016 18:20
Location: Valencia, Spain

Re: How to reduce the filesize of a compiled file?

Post by Josep Roca »

Code: Select all

#INCLUDE ONCE "windows.bi"
#INCLUDE ONCE "crt/string.bi"

' --> change the path
DIM wszFileName AS WSTRING * MAX_PATH = $"C:\Users\Pepe\FreeBasic64\inc\win\mshtmlc.bi"
DIM bSuccess AS LONG, dwFileSize AS DWORD, dwHighSize AS DWORD, dwBytesRead AS DWORD
DIM nCount AS DWORD
DIM hFile AS HANDLE = CreateFileW(@wszFileName, GENERIC_READ, FILE_SHARE_READ, NULL, _
                      OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL)
IF hFile = INVALID_HANDLE_VALUE THEN END
' // Get the size of the file
dwFileSize = GetFileSize(hFile, @dwHighSize)
IF dwHighSize THEN
   CloseHandle(hFile)
   END
END IF
DIM pBuffer AS UBYTE PTR
pBuffer = CAllocate(1, dwFileSize)
bSuccess = ReadFile(hFile, pBuffer, dwFileSize, @dwBytesRead, NULL)
CloseHandle(hFile)
IF bSuccess THEN
   IF pBuffer THEN
      DIM pstr AS ANY PTR = pBuffer
      ' // Search for CRLF to get the number of lines
      DIM sf AS ZSTRING * 3 = CHR(13, 10)
      DIM t1 AS DOUBLE = TIMER
      DO
         pstr = strstr(pstr, sf)
         IF pstr = NULL THEN EXIT DO
         pstr += 2
         nCount += 1
      LOOP
      DIM t2 AS DOUBLE = TIMER
      PRINT "seconds: ", t2 - t1
      DeAllocate(pBuffer)
   END IF
END IF

print "Count: ", nCount

PRINT
PRINT "Press any key..."
SLEEP
Seconds: 0.002644868538482115
Count: 24974
Josep Roca
Posts: 564
Joined: Sep 27, 2016 18:20
Location: Valencia, Spain

Re: How to reduce the filesize of a compiled file?

Post by Josep Roca »

> Is there a ReadAllLines (or Recall as in good ol' GfaBasic) equivalent in FreeBasic?

No, but we can write one. How many intrinsic functions has MASM?

> A high precision timer?

We can call QueryPerformanceCounter.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

Josep Roca wrote: Seconds: 0.002644868538482115
Count: 24974

Code: Select all

include \masm32\MasmBasic\MasmBasic.inc
  Init
  Let esi=FileRead$("mshtmlc.bi")	; 2,488,437 bytes
  NanoTimer()
  Inkey Str$("%i line feeds", Count(esi, Lf$)), Str$(" found in %2f seconds", NanoTimer(ms)/1000) 
EndOfCode

Code: Select all

24974 line feeds found in 0.0030 seconds
So your code is a bit faster, compliments. Which CPU (mine is a Core i5)?

Let's try something more challenging:

Code: Select all

include \masm32\MasmBasic\MasmBasic.inc
  Init
  Let esi=FileRead$("mshtmlc.bi")	; 2,488,437 bytes
  NanoTimer()
  mov ecx, Count(esi, "type", 4)
  sub ecx, Count(esi, "end type")
  Inkey Str$("%i type declarations", ecx), Str$(" found in %2f seconds", NanoTimer(ms)/1000) 
EndOfCode
1718 type declarations found in 0.0040 seconds


P.S.: I feel honoured to meet you, José Roca - your site has helped me out many times when I was stuck with the Windows API.
Josep Roca
Posts: 564
Joined: Sep 27, 2016 18:20
Location: Valencia, Spain

Re: How to reduce the filesize of a compiled file?

Post by Josep Roca »

> Which CPU (mine is a Core i5)?

Same as mine. Intrinsic string BASIC functions are slow because they have been designed for ease of use rather than for speed. This is a legacy from the past. But compilers like FB are low-level enough to allow to use alternatives, even assembler. And Windows provides functions for almost everything, so there is not need to bloat the language, making it a nightmare to maintain. Add to the compiler mainly what only the compiler can do. That is, provide the bricks and let the masons to build the house.

> Let's try something more challenging

Maybe later. Now I'm busy working into adding support for low-level COM servers (nothing to do with the slow and bloated Automation). Fast and efficient COM servers with only an overhead of about 2 KB. There is also the misconception than COM servers are big, slow and bloated (this comes from the VB6 era and its infamous OCXs). BASIC has been a language traditionally geared to beginners, and speed was not the main concern (VB6 not even had pointers, considered dangerous in the hands of a beginner). But the language has evolved.

BTW I'm going to reuse the code that I have posted to implement a wrapper function and add it to my framework for FB. I will call it AfxFileScan or something like that. It is a way of extending the language. I also have some ideas for a ReadAllLines function.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

OK ;-)
Btw the mshtmlc.bi that I downloaded from https://fossies.org/linux/privat/FreeBA ... mlc.bi?m=b to test your code has no CrLf, just linefeeds; so initially your code didn't work. But Chr(10) did the job, of course (and MasmBasic's Instr is very slow for single byte searches).
Post Reply