Quick and dirty self hosting BASICish compiler
Quick and dirty self hosting BASICish compiler
I decided for a bit of a self-challenge to write a self hosting compiler of sorts, as quickly as possible. I thought I'd share what I have so far before it gets even more unwieldy.
Self hosting means that the compiler can compile itself.
This program takes its input from stdin, and outputs a source file that is compatible with g++, so for example with a small script like this;
NOTE. when copying the code from the forum, please add an extra blank line at the bottom of the tinybasic.bas, otherwise a parsing issue will stop it working.
fbc tinybasic.bas
./tinybasic < tinybasic.bas > tiny_src1.c
g++ tiny_src1.c -o tiny1
./tiny1 < tinybasic.bas > tiny_src2.c
g++ tiny_src2.c -o tiny2
./tiny2 < tinybasic.bas > tiny_src3.c
g++ tiny_src3.c -o tiny3
./tiny3 < tinybasic.bas > tiny_src4.c
g++ tiny_src4.c -o tiny4
you can see it compiling itself a few times, and remaining stable.
Much in this code is dirty, I used quite a few hacks, and just hacked the code until it worked, any code other than that used in the compiler could well end up buggy, and the compiler doesn't do as much syntax checking and such. However, while the code is still under 2k lines, it may well be of interest to people.
Seems the forum doesn't want to allow the code anymore, so I put the file here:
http://streetcds.co.uk/tinybasic.bas
Here is a version much later on, more complex, but more features, and some parts are cleaner.
http://streetcds.co.uk/tb_0.35.tar.gz
EDIT Jul 2011:
The smaller version can now be found here:
http://web.archive.org/web/200905031354 ... ybasic.bas
Files for the larger version can now be found via this thread:
http://www.freebasic.net/forum/viewtopic.php?t=18098
Self hosting means that the compiler can compile itself.
This program takes its input from stdin, and outputs a source file that is compatible with g++, so for example with a small script like this;
NOTE. when copying the code from the forum, please add an extra blank line at the bottom of the tinybasic.bas, otherwise a parsing issue will stop it working.
fbc tinybasic.bas
./tinybasic < tinybasic.bas > tiny_src1.c
g++ tiny_src1.c -o tiny1
./tiny1 < tinybasic.bas > tiny_src2.c
g++ tiny_src2.c -o tiny2
./tiny2 < tinybasic.bas > tiny_src3.c
g++ tiny_src3.c -o tiny3
./tiny3 < tinybasic.bas > tiny_src4.c
g++ tiny_src4.c -o tiny4
you can see it compiling itself a few times, and remaining stable.
Much in this code is dirty, I used quite a few hacks, and just hacked the code until it worked, any code other than that used in the compiler could well end up buggy, and the compiler doesn't do as much syntax checking and such. However, while the code is still under 2k lines, it may well be of interest to people.
Seems the forum doesn't want to allow the code anymore, so I put the file here:
http://streetcds.co.uk/tinybasic.bas
Here is a version much later on, more complex, but more features, and some parts are cleaner.
http://streetcds.co.uk/tb_0.35.tar.gz
EDIT Jul 2011:
The smaller version can now be found here:
http://web.archive.org/web/200905031354 ... ybasic.bas
Files for the larger version can now be found via this thread:
http://www.freebasic.net/forum/viewtopic.php?t=18098
Last edited by yetifoot on Jul 02, 2011 22:54, edited 7 times in total.
Just tried under WinXP Sp3 and it's a no go!
James
fb version:
g++ --version
The first few lines of g++ output. This goes on for many more.
James
fb version:
Code: Select all
D:\FreeBasic>fbc -version
FreeBASIC Compiler - Version 0.21.0 (11-07-2008) for win32 (target:win32)
Copyright (C) 2004-2008 The FreeBASIC development team.
Configured as standalone
objinfo enabled using FB BFD header version 217
Code: Select all
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\James>g++ --version
g++ (GCC) 3.4.5 (mingw special)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Code: Select all
D:\FreeBasic\TinyBasic>g++ tiny_src1.c -o tiny1
tiny_src1.c:189: error: `Integer' does not name a type
tiny_src1.c:190: error: `Integer' does not name a type
tiny_src1.c:191: error: `Integer' does not name a type
tiny_src1.c:192: error: `Integer' does not name a type
tiny_src1.c:193: error: `Integer' does not name a type
tiny_src1.c:194: error: `Integer' does not name a type
tiny_src1.c:195: error: `Integer' does not name a type
tiny_src1.c:198: error: `Integer' does not name a type
tiny_src1.c:199: error: `String' does not name a type
tiny_src1.c:200: error: ISO C++ forbids declaration of `Any' with no type
tiny_src1.c:200: error: expected `;' before '*' token
tiny_src1.c:201: error: `Integer' does not name a type
tiny_src1.c:202: error: `Integer' does not name a type
tiny_src1.c:203: error: `Integer' does not name a type
tiny_src1.c:206: error: `Integer' does not name a type
tiny_src1.c:212: error: `Integer' does not name a type
tiny_src1.c:213: error: `String' does not name a type
tiny_src1.c:215: error: ISO C++ forbids declaration of `Any' with no type
tiny_src1.c:215: error: expected `;' before '*' token
OK, thanks to nkk for helping me out with windows testing, it should run fine on windows now. The issues were the forum made Any and String capitalized, and I hadn't put an lcase() check in that. Also make sure tinybasic.bas ends in a blank line, or it will get confused on that also.
EDIT: I think there might still be another bug with the forum capitalizing the code, I'll take another look
EDIT2: OK, hopefully those bugs are fixed enough now!
EDIT: I think there might still be another bug with the forum capitalizing the code, I'll take another look
EDIT2: OK, hopefully those bugs are fixed enough now!
It was just a bit of a self-challenge really, I may add a bit of polish and fix the issues and so on, but I don't plan to aim for the big time or anything.
you can make them a little smaller by doing
g++ -O2 -s tiny1.c -o tiny1
or such, -O2 shoves some optimization on, and -s strips the executable.
They still turn out a little bigger than the FB original, but that's because I don't do any kind of optimization at all really, and the code I make is pretty sloppy.
you can make them a little smaller by doing
g++ -O2 -s tiny1.c -o tiny1
or such, -O2 shoves some optimization on, and -s strips the executable.
They still turn out a little bigger than the FB original, but that's because I don't do any kind of optimization at all really, and the code I make is pretty sloppy.
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
good job (cool idea)
works fine tested under XP
Joshy
works fine tested under XP
I will invest some time and take a deeper look to your code.D:\FBEXAM~1\compi>fbc tinybasic.bas
D:\FBEXAM~1\compi>tinybasic<tinybasic.bas>tinibasic.cpp
D:\FBEXAM~1\compi>g++ -W3 -O2 -s tinybasic.cpp -o tinybasic_cpp.exe
D:\FBEXAM~1\compi>tinybasic_cpp<tinybasic.bas>tinibasic.cpp
D:\FBEXAM~1\compi>dir *.exe
Datenträger in Laufwerk D: ist DATEN
Volumeseriennummer: 2835-1508
Verzeichnis von D:\FBEXAM~1\compi
05.12.2008 19:37 56.320 tinybasic.exe
05.12.2008 19:38 104.448 tinybasic_cpp.exe
2 Datei(en) 160.768 Bytes
0 Verzeichnis(se), 8.568.307.712 Bytes frei
Joshy
Hello yetifoot,
Since awhile I'm mostly lurking around - instead of being more involved in this forum - but your code got me really interested.
So I went through your fascinating code in an attempt to understand how it works and have a question.
In the sub "parse_file" on line 1489 there is the following code which throws me off:
What do the 2 equal sign do?
I'm grateful for any (correct) explanation of my answer.
Thanks
fsw
Since awhile I'm mostly lurking around - instead of being more involved in this forum - but your code got me really interested.
So I went through your fascinating code in an attempt to understand how it works and have a question.
In the sub "parse_file" on line 1489 there is the following code which throws me off:
Code: Select all
Dim As Integer is_func = Lcase( tk_str ) = "function"
I'm grateful for any (correct) explanation of my answer.
Thanks
fsw
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
Hi fsw, good question.
An expression of the form "a = (b = c)" (I added the brackets for clarity) means: "assign the truth value of (b is equal to c) to the variable a".
By comparison, "a = (b > c)" means: assign the truth value of (b is greater than c) to the variable a, and "a = (b + c)" assigns the value of (b plus c) to a.
"b = c" is a boolean expression - i.e. it returns either True or False. It means the same thing is it does when you see "if b = c then ...".
Things like '=', '>', '<=' are actually operators, just like '+', '-', '*', etc...
The only difference is, they return a value to represent True/False, instead of calculating their sum/difference/product/...
Anyway, whenever you see a statement of the form "variable = expression", you know that the value of expression will be assigned to variable. That first '=' is a special "assignment" operator. Any '='s inside expression are just equality tests.
Sorry if my answer's a little wordy, but I hope you understand it.
An expression of the form "a = (b = c)" (I added the brackets for clarity) means: "assign the truth value of (b is equal to c) to the variable a".
By comparison, "a = (b > c)" means: assign the truth value of (b is greater than c) to the variable a, and "a = (b + c)" assigns the value of (b plus c) to a.
"b = c" is a boolean expression - i.e. it returns either True or False. It means the same thing is it does when you see "if b = c then ...".
Things like '=', '>', '<=' are actually operators, just like '+', '-', '*', etc...
The only difference is, they return a value to represent True/False, instead of calculating their sum/difference/product/...
Anyway, whenever you see a statement of the form "variable = expression", you know that the value of expression will be assigned to variable. That first '=' is a special "assignment" operator. Any '='s inside expression are just equality tests.
Sorry if my answer's a little wordy, but I hope you understand it.
@counting_pine
Thanks for this explanation, it makes total sense now.
If Basic languages would have the "==" operator like in c syntax languages all of this would make sense at first sight... well more or less :P
Thanks again
fsw
BTW: should have looked closer at the c output... duh!
int is_func = (int)(lcase(tk_str) == str_temp("function"));
Thanks for this explanation, it makes total sense now.
If Basic languages would have the "==" operator like in c syntax languages all of this would make sense at first sight... well more or less :P
Thanks again
fsw
BTW: should have looked closer at the c output... duh!
int is_func = (int)(lcase(tk_str) == str_temp("function"));
A clean solution is to simply have a real boolean type that is not compatible with integer. The advantage being that unintended use of this function is detected by the compiler. (since then a comparison has a different result type than an assignment of non boolean variables)fsw wrote:
If Basic languages would have the "==" operator like in c syntax languages all of this would make sense at first sight... well more or less :P
I spent a bit more time on this after I posted, but haven't done anything in the last month or so, so I spent a little while and cleaned it up best I could, so anyone interested could look.
A lot of stuff is cleaner and better, but many new quirks and hacks have been added. One part I had real trouble on was the string implementation, trying to make it clean up correctly.
Anyway, here it is,
http://streetcds.co.uk/tb_0.35.tar.gz
A lot of stuff is cleaner and better, but many new quirks and hacks have been added. One part I had real trouble on was the string implementation, trying to make it clean up correctly.
Anyway, here it is,
http://streetcds.co.uk/tb_0.35.tar.gz
Last edited by yetifoot on Feb 27, 2009 16:57, edited 1 time in total.