How to reduce the filesize of a compiled file?

General FreeBASIC programming questions.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: How to reduce the filesize of a compiled file?

Post by dodicat »

I agree mshtmlc.bi has only newline (chr(10)

I could not get masmbasic.inc.
If masmbasic downloaded as a zip I would try it out.
But I don't install with .exe installers.
I like to see what is going on.
Silly I know!

So, with mshtmlc.bi, and a tweaked TALLY:

Code: Select all

#include "file.bi"    
#include "crt.bi"

'Run with the best - 32 bit -gen gas, the FreeBASIC as free from C as pos. (except msvcrt.dll or it's Linux counterpart)

Function loadfile(file As String) As String
	If Fileexists(file)=0 Then Print file;" not found":Sleep:End
    Dim As Long  f=Freefile
    Open file For Binary Access Read As #f
    Dim As String text
    If Lof(1) > 0 Then
        text = String(Lof(f), 0)
        Get #f, , text
    End If
    Close #f
    Return text
End Function

Function Tally( SomeString As String,PartString As String) As Long
    Dim As zstring Ptr z=Strptr(Somestring)
    Dim As Long i=Len(PartString),count
    If Left(SomeString,i)=PartString Then count+=1    'in case it starts with PartString
    If Instr(SomeString,PartString) =0 Then Return 0  'no sense going on if this
    While z
        count+=1
        z=strstr(z[i],Partstring)
    Wend
    Return count-1
End Function



Dim As String s=Chr(10)
Dim As String text=loadfile("mshtmlc.bi")
Dim As Double t=Timer
Print tally(text,s)
Print "Time taken ";Timer-t
sleep


 
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

dodicat wrote:I agree mshtmlc.bi has only newline (chr(10))

I could not get masmbasic.inc.
If masmbasic downloaded as a zip I would try it out.
But I don't install with .exe installers.
I like to see what is going on.
Silly I know!
Not silly at all: You don't know me. Even Jotti would warn you that some of my exes are dangerous.
Besides, MasmBasic also needs the Masm32 package to work. Installation is not that difficult, but assembly is clearly not for everybody.
So, with mshtmlc.bi, and a tweaked TALLY:
Works fine, and with newline it's about as fast or slow as the MasmBasic Instr(). With a longer string, e.g. "style", MB is roughly 40% faster. Does FreeBasic allow case-insensitive and full word search for Instr()?
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: How to reduce the filesize of a compiled file?

Post by dodicat »

for case insensitive just make text =lcase(text) and look for "style" in the text.

5396 style
.0038 seconds here.

You will perhaps notice that 64 bit freebasic is slower at this task than 32 bit, even with -O3 gcc opimizations.
Question:
Is masmbasic backended by C?
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

dodicat wrote:for case insensitive just make text =lcase(text) and look for "style" in the text.

5396 style
.0038 seconds here.
5396
Time taken 0.00446 (vs about 0.003 for MasmBasic)
You will perhaps notice that 64 bit freebasic is slower at this task than 32 bit, even with -O3 gcc opimizations.
64-bit code is often slower, for various reasons. But I know one example, UAsm, where the 64-bit version is significantly faster.
Question:
Is masmbasic backended by C?
No. There is not a single call to msvcrt, and only a handful to the Masm32 library, all the rest is hand-written assembly and WinAPI.
St_W
Posts: 1619
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: How to reduce the filesize of a compiled file?

Post by St_W »

dodicat wrote:If masmbasic downloaded as a zip I would try it out.
But I don't install with .exe installers.
As we are quite off-topic anyway already: Is there a download that doesn't require registration at the MASM forums?
jj2007 wrote:There is not a single call to msvcrt, and only a handful to the Masm32 library, all the rest is hand-written assembly and WinAPI.
hm, I'm not sure whether replacing MSVCRT calls by Win32 API calls is a good or bad thing. When consindering portability it's definitely a bad thing.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

jj2007 wrote:Is there a ReadAllLines (or Recall as in good ol' GfaBasic) equivalent in FreeBasic? A high precision timer? Excuse my ignorance, I have little experience with FB but you have fans at the Masm32 forum ;-)
There's no such equivalent in machine language either. Most of the code you've posted has barely any machine code you've written in it at all. Most of what you've done is use routines from the masm runtime library. It may be written in assembler, and it is inlined, but it's still a runtime libray/toolbox. So really you're benchmarking the runtime library, not your own code. I appreciate that the modern macros and runtime routines allow assembler to be used in a high-level and modern way. But if you want to do a true comparison, let's skip the part about reading in a file (as we've demonstrated, that is I/O bound), and just do the search on an in-memory (very large) string. And do it entirely with hand-coded assembler, with no macros to functions in the runtime library. Then I suspect a fast FB or C implementation would be nearly identical in performance.

I say fast as in using memory operations and not the FB string runtime. Jose has already explained about the tradeoffs made in the FB runtime. If we skip the runtime and do straight memory manipulation using proper string searching algorithms, FB's speed should be the same as anything you could code yourself in machine language.

As for our fans on the forum, it's the same for FB as it is for MS and their products. The compilers are fast enough to be very useful and generate reasonable code. The tradeoffs from that are well worth it.

You can say what you want about how slow MS Visual Studio is, but the speed that it provides a developer in terms of rapid code development, profiling, and debugging, is what matters, and a decently-performing output EXE is good too. And when you're dealing with an I/O bound problem, CPU performance isn't nearly as important as a good I/O strategy. Hence no one does web development in a compiled language even.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

jj2007 wrote:No. There is not a single call to msvcrt, and only a handful to the Masm32 library, all the rest is hand-written assembly and WinAPI.
Hand-written by whom? In your code snippets I saw very little machine code. Mostly a lot of macros. Which I consider the same as a runtime library, albeit one whose functions can be inlined. This masmbasic code is little different from a C or FB. Is this not correct? In fact it appears to be a super simple compiler at that, likely one-pass with a 1:1 code generation (it is based on assembler after all). All you've proved is that the string routines your macros define are faster than the ones in FB's runtime, which is obviously as FB's strings are dynamic and accessible mainly through function calls.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

jj2007 wrote:Just use .if Instr_(L$(ct), Utf8$(emoji$)) (and since we are talking code design: of course you would translate the UTF-16 emoji outside the loop)
Sure but what if you want to deal with UTF-16, UTF-8, UTF-32, or any other encoding? You're making assumptions about the bytes which are not always true in all cases. MASM would be the last tool I'd want to use when dealing with text generally.
grep is a great tool, but it's not meant for parsing a string array, as a compiler would typically do. For such tasks, grep is very slow.
Kind of funny because these benchmarks you've been showing are not parsing. They are simply searching for strings. Grep's algorithm is highly optimized for this and will be as fast as anything you can crank out with masm basic (whatever you call the macros you are using). True parsing like a compiler would do is going to require a lot more work, even in your assembler macros. Show me an parser for a formal grammar written in masm.

Maybe had they used masm32 with the fancy high-level macros you use, PowerBasic would be alive and well today, but I kind of doubt it. The fate of PB is kind of a warning to me about any platform-dependant programming, particularly extended use of assembler (in small doses it's fine!).
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

caseih wrote:Sure but what if you want to deal with UTF-16, UTF-8, UTF-32, or any other encoding?
UTF-16, UTF-8 and "any other encoding" (I assume you mean ordinary codepages - cyrillic and the like) are built into the language, but I admit I have never thought of UTF-32. Can you zip up a UTF-32 text and post it here please? I'd like to give it a try.
Tourist Trap
Posts: 2958
Joined: Jun 02, 2015 16:24

Re: How to reduce the filesize of a compiled file?

Post by Tourist Trap »

dodicat wrote: I could not get masmbasic.inc.
If masmbasic downloaded as a zip I would try it out.
But I don't install with .exe installers.
Maybe the version from here is ok?
http://www.phatcode.net/downloads.php?sub=compilers
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

Tourist Trap wrote:
dodicat wrote:Maybe the version from here is ok?
http://www.phatcode.net/downloads.php?sub=compilers
Definitely not. This is the official installation guide; but I have a suspicion that you need to register at Masm32 to see downloads.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

jj2007 wrote:
caseih wrote:Sure but what if you want to deal with UTF-16, UTF-8, UTF-32, or any other encoding?
UTF-16, UTF-8 and "any other encoding" (I assume you mean ordinary codepages - cyrillic and the like) are built into the language, but I admit I have never thought of UTF-32. Can you zip up a UTF-32 text and post it here please? I'd like to give it a try.
Any decent text editor can save into UTF-32 (and many other formats) on demand Also note that both UTF-16 and UTF-32 can be either big-endian or little-endian, which adds complication.

But you're missing the point. The point is that "plain text" is much more complicated than mere bytes, which is generally what assembler primitives work in. So bit twiddling isn't always going to buy you speed. In fact searching UTF-8 is potentially quite slow, since you can't just look at individual bytes. Rather you have to decode each byte as you go, and possibly combine multiple bytes into a character.

You say UTF-8 and UTF-16 are "built into the language" but clearly no such routines are instructions in a CPU. So again you're talking about benchmarking runtime routines, not pure ASM (that you personally wrote) vs C or FB. It's neat that masm macros have such high-level functions, but it still doesn't prove to me that assembler is inherently faster than C for many every-day things, especially text-processing. Byte processing maybe, and bit twiddling, yes could be some advantage.
Last edited by caseih on Aug 18, 2017 3:51, edited 1 time in total.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: How to reduce the filesize of a compiled file?

Post by jj2007 »

caseih wrote:Any decent text editor can save into UTF-32
I have only indecent ones like Notepad++. Can you recommend one for a test? Maybe together with a short example showing how FreeBasic handles UTF-32 Instr()?
In fact searching UTF-8 is potentially quite slow, since you can't just look at individual bytes. Rather you have to decode each byte as you go, and possibly combine multiple bytes into a character.
Sure you can! As somebody noted in a recent thread:
TeeEmCee wrote:Normal INSTR also works with UTF8. The result is bytes, not characters, but actually that's usually what you want anyway.
And in the odd case that you desperately need the char positions:

Code: Select all

Let MyUtf16$=wRec$(MyUtf8$)
Print Str$("The pos is %i", wInstr(MyUtf16$, wChr$("whatever"))
You say UTF-8 and UTF-16 are "built into the language" but clearly no such routines are instructions in a CPU.
Sorry, that is about as meaningful as "FreeBasic doesn't have Instr() because the CPU has no such instructions". Truth is that in MasmBasic, Instr() is not a wrapper around C strstr. But Instr_() can handle UTF-8 and UTF-16, of course.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: How to reduce the filesize of a compiled file?

Post by caseih »

EDIT: deleted for now.
Josep Roca
Posts: 564
Joined: Sep 27, 2016 18:20
Location: Valencia, Spain

Re: How to reduce the filesize of a compiled file?

Post by Josep Roca »

Let's try something more challenging:

Code: Select all

include \masm32\MasmBasic\MasmBasic.inc
  Init
  Let esi=FileRead$("mshtmlc.bi")	; 2,488,437 bytes
  NanoTimer()
  mov ecx, Count(esi, "type", 4)
  sub ecx, Count(esi, "end type")
  Inkey Str$("%i type declarations", ecx), Str$(" found in %2f seconds", NanoTimer(ms)/1000) 
EndOfCode
1718 type declarations found in 0.0040 seconds
I have finished my template code to make low-level COM servers with Free Basic.

Well, considering that there are many "type" without "end type" and that all "end type" must have a "type", we can use an small variation of my first example.

Code: Select all

' --> change the path
DIM wszFileName AS WSTRING * MAX_PATH = $"C:\Users\Pepe\FreeBasic64\inc\win\mshtmlc.bi"
DIM bSuccess AS LONG, dwFileSize AS DWORD, dwHighSize AS DWORD, dwBytesRead AS DWORD
DIM nCount AS DWORD
DIM hFile AS HANDLE = CreateFileW(@wszFileName, GENERIC_READ, FILE_SHARE_READ, NULL, _
                      OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL)
IF hFile = INVALID_HANDLE_VALUE THEN END
' // Get the size of the file
dwFileSize = GetFileSize(hFile, @dwHighSize)
IF dwHighSize THEN
   CloseHandle(hFile)
   END
END IF
DIM pBuffer AS UBYTE PTR
pBuffer = CAllocate(1, dwFileSize)
bSuccess = ReadFile(hFile, pBuffer, dwFileSize, @dwBytesRead, NULL)
CloseHandle(hFile)
IF bSuccess THEN
   IF pBuffer THEN
      DIM pstr AS ANY PTR = pBuffer
      DIM sf AS ZSTRING * 5 = "type"
      DIM sf2 AS ZSTRING * 9 = "end type"
      DIM t1 AS DOUBLE = TIMER
      DO
         pstr = strstr(pstr, sf)
         IF pstr = NULL THEN EXIT DO
         pstr += 4
         nCount += 1
      LOOP
      pstr = pBuffer
      DO
         pstr = strstr(pstr, sf2)
         IF pstr = NULL THEN EXIT DO
         pstr += 8
         nCount -= 1
      LOOP
      DIM t2 AS DOUBLE = TIMER
      PRINT "Seconds: ", t2 - t1
      DeAllocate(pBuffer)
   END IF
END IF

print "Count: ", nCount

PRINT
PRINT "Press any key..."
SLEEP
Seconds: 0.006208108010469005
Count: 1728

Not bad for a Basic compiler.

Apparently, my file has a few more "type" than yours.

How about string manipulation such appending, replacing, deleting, inserting or concatenating unicode strings? I do all my work using unicode and only use FB's ansi strings when I want to quickly allocate a byte buffer.
Post Reply