Quick and dirty self hosting BASICish compiler

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Quick and dirty self hosting BASICish compiler

Post by yetifoot »

I decided for a bit of a self-challenge to write a self hosting compiler of sorts, as quickly as possible. I thought I'd share what I have so far before it gets even more unwieldy.

Self hosting means that the compiler can compile itself.

This program takes its input from stdin, and outputs a source file that is compatible with g++, so for example with a small script like this;

NOTE. when copying the code from the forum, please add an extra blank line at the bottom of the tinybasic.bas, otherwise a parsing issue will stop it working.

fbc tinybasic.bas
./tinybasic < tinybasic.bas > tiny_src1.c
g++ tiny_src1.c -o tiny1
./tiny1 < tinybasic.bas > tiny_src2.c
g++ tiny_src2.c -o tiny2
./tiny2 < tinybasic.bas > tiny_src3.c
g++ tiny_src3.c -o tiny3
./tiny3 < tinybasic.bas > tiny_src4.c
g++ tiny_src4.c -o tiny4

you can see it compiling itself a few times, and remaining stable.

Much in this code is dirty, I used quite a few hacks, and just hacked the code until it worked, any code other than that used in the compiler could well end up buggy, and the compiler doesn't do as much syntax checking and such. However, while the code is still under 2k lines, it may well be of interest to people.

Seems the forum doesn't want to allow the code anymore, so I put the file here:

http://streetcds.co.uk/tinybasic.bas

Here is a version much later on, more complex, but more features, and some parts are cleaner.

http://streetcds.co.uk/tb_0.35.tar.gz

EDIT Jul 2011:

The smaller version can now be found here:

http://web.archive.org/web/200905031354 ... ybasic.bas

Files for the larger version can now be found via this thread:

http://www.freebasic.net/forum/viewtopic.php?t=18098
Last edited by yetifoot on Jul 02, 2011 22:54, edited 7 times in total.
jcfuller
Posts: 325
Joined: Sep 03, 2007 18:40

Post by jcfuller »

Just tried under WinXP Sp3 and it's a no go!
James

fb version:

Code: Select all

D:\FreeBasic>fbc -version
FreeBASIC Compiler - Version 0.21.0 (11-07-2008) for win32 (target:win32)
Copyright (C) 2004-2008 The FreeBASIC development team.
Configured as standalone
objinfo enabled using FB BFD header version 217
g++ --version

Code: Select all

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\James>g++ --version
g++ (GCC) 3.4.5 (mingw special)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The first few lines of g++ output. This goes on for many more.

Code: Select all

D:\FreeBasic\TinyBasic>g++ tiny_src1.c -o tiny1
tiny_src1.c:189: error: `Integer' does not name a type
tiny_src1.c:190: error: `Integer' does not name a type
tiny_src1.c:191: error: `Integer' does not name a type
tiny_src1.c:192: error: `Integer' does not name a type
tiny_src1.c:193: error: `Integer' does not name a type
tiny_src1.c:194: error: `Integer' does not name a type
tiny_src1.c:195: error: `Integer' does not name a type
tiny_src1.c:198: error: `Integer' does not name a type
tiny_src1.c:199: error: `String' does not name a type
tiny_src1.c:200: error: ISO C++ forbids declaration of `Any' with no type
tiny_src1.c:200: error: expected `;' before '*' token
tiny_src1.c:201: error: `Integer' does not name a type
tiny_src1.c:202: error: `Integer' does not name a type
tiny_src1.c:203: error: `Integer' does not name a type
tiny_src1.c:206: error: `Integer' does not name a type
tiny_src1.c:212: error: `Integer' does not name a type
tiny_src1.c:213: error: `String' does not name a type
tiny_src1.c:215: error: ISO C++ forbids declaration of `Any' with no type
tiny_src1.c:215: error: expected `;' before '*' token
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Post by yetifoot »

Ah, that's a shame. I'm working on Linux with GCC 4.X, a lot changed in the time between 3.X and 4.X, I'm not really shocked it's incompatible.

Looks like it may just be the typedefs though for the most part, I'll see if I can get it working for windows too.
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Post by yetifoot »

OK, thanks to nkk for helping me out with windows testing, it should run fine on windows now. The issues were the forum made Any and String capitalized, and I hadn't put an lcase() check in that. Also make sure tinybasic.bas ends in a blank line, or it will get confused on that also.

EDIT: I think there might still be another bug with the forum capitalizing the code, I'll take another look

EDIT2: OK, hopefully those bugs are fixed enough now!
jcfuller
Posts: 325
Joined: Sep 03, 2007 18:40

Post by jcfuller »

Yep. Working fine now on WinXp.
Are you going anywhere in particular with this? Or just a little experiment?

James
jcfuller
Posts: 325
Joined: Sep 03, 2007 18:40

Post by jcfuller »

Just curious on the size difference of the exe's

fbc on the first -> 50k

subsequent compiles using g++ 160k

James
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Post by yetifoot »

It was just a bit of a self-challenge really, I may add a bit of polish and fix the issues and so on, but I don't plan to aim for the big time or anything.

you can make them a little smaller by doing

g++ -O2 -s tiny1.c -o tiny1

or such, -O2 shoves some optimization on, and -s strips the executable.

They still turn out a little bigger than the FB original, but that's because I don't do any kind of optimization at all really, and the code I make is pretty sloppy.
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Post by counting_pine »

Awesomeness...
How long did it take you to write it?
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Post by yetifoot »

The first version that compiled itself was 24 hours in, I guess I had spent real time about 6/8 hours on it to that point, then what i've posted here was with another few hours of work.
D.J.Peters
Posts: 8586
Joined: May 28, 2005 3:28
Contact:

Post by D.J.Peters »

good job (cool idea)
works fine tested under XP
D:\FBEXAM~1\compi>fbc tinybasic.bas
D:\FBEXAM~1\compi>tinybasic<tinybasic.bas>tinibasic.cpp
D:\FBEXAM~1\compi>g++ -W3 -O2 -s tinybasic.cpp -o tinybasic_cpp.exe
D:\FBEXAM~1\compi>tinybasic_cpp<tinybasic.bas>tinibasic.cpp
D:\FBEXAM~1\compi>dir *.exe
Datenträger in Laufwerk D: ist DATEN
Volumeseriennummer: 2835-1508

Verzeichnis von D:\FBEXAM~1\compi

05.12.2008 19:37 56.320 tinybasic.exe
05.12.2008 19:38 104.448 tinybasic_cpp.exe
2 Datei(en) 160.768 Bytes
0 Verzeichnis(se), 8.568.307.712 Bytes frei
I will invest some time and take a deeper look to your code.

Joshy
fsw
Posts: 260
Joined: May 27, 2005 6:02

Post by fsw »

Hello yetifoot,
Since awhile I'm mostly lurking around - instead of being more involved in this forum - but your code got me really interested.

So I went through your fascinating code in an attempt to understand how it works and have a question.
In the sub "parse_file" on line 1489 there is the following code which throws me off:

Code: Select all

Dim As Integer is_func = Lcase( tk_str ) = "function"
What do the 2 equal sign do?


I'm grateful for any (correct) explanation of my answer.

Thanks
fsw
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Post by counting_pine »

Hi fsw, good question.

An expression of the form "a = (b = c)" (I added the brackets for clarity) means: "assign the truth value of (b is equal to c) to the variable a".

By comparison, "a = (b > c)" means: assign the truth value of (b is greater than c) to the variable a, and "a = (b + c)" assigns the value of (b plus c) to a.

"b = c" is a boolean expression - i.e. it returns either True or False. It means the same thing is it does when you see "if b = c then ...".
Things like '=', '>', '<=' are actually operators, just like '+', '-', '*', etc...
The only difference is, they return a value to represent True/False, instead of calculating their sum/difference/product/...


Anyway, whenever you see a statement of the form "variable = expression", you know that the value of expression will be assigned to variable. That first '=' is a special "assignment" operator. Any '='s inside expression are just equality tests.


Sorry if my answer's a little wordy, but I hope you understand it.
fsw
Posts: 260
Joined: May 27, 2005 6:02

Post by fsw »

@counting_pine

Thanks for this explanation, it makes total sense now.

If Basic languages would have the "==" operator like in c syntax languages all of this would make sense at first sight... well more or less :P

Thanks again
fsw

BTW: should have looked closer at the c output... duh!
int is_func = (int)(lcase(tk_str) == str_temp("function"));
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Post by marcov »

fsw wrote:
If Basic languages would have the "==" operator like in c syntax languages all of this would make sense at first sight... well more or less :P
A clean solution is to simply have a real boolean type that is not compatible with integer. The advantage being that unintended use of this function is detected by the compiler. (since then a comparison has a different result type than an assignment of non boolean variables)
yetifoot
Posts: 1710
Joined: Sep 11, 2005 7:08
Location: England
Contact:

Post by yetifoot »

I spent a bit more time on this after I posted, but haven't done anything in the last month or so, so I spent a little while and cleaned it up best I could, so anyone interested could look.

A lot of stuff is cleaner and better, but many new quirks and hacks have been added. One part I had real trouble on was the string implementation, trying to make it clean up correctly.

Anyway, here it is,

http://streetcds.co.uk/tb_0.35.tar.gz
Last edited by yetifoot on Feb 27, 2009 16:57, edited 1 time in total.
Post Reply