Revision history for DevFbcParser


Revision [22563]

Last edited on 2019-02-16 20:26:24 by JeffMarshall [remove link to grammar]
Additions:
The structure of the parser has a very close relation to the FreeBASIC grammar. Basically there is a parsing function for every element of the grammar.
Deletions:
The structure of the parser has a very close relation to the [[FbGrammar|FreeBASIC grammar]]. Basically there is a parsing function for every element of the grammar.


Revision [20792]

Edited on 2016-03-12 13:27:06 by fxm [Formatting]

No Differences

Revision [19985]

Edited on 2016-02-10 15:48:20 by DkLwikki [Update link format]
Additions:
The structure of the parser has a very close relation to the [[FbGrammar|FreeBASIC grammar]]. Basically there is a parsing function for every element of the grammar.
Deletions:
The structure of the parser has a very close relation to the [[FbGrammar FreeBASIC grammar]]. Basically there is a parsing function for every element of the grammar.


Revision [15698]

Edited on 2012-01-16 02:20:21 by SirMud [Update link format]
Additions:
{{fbdoc item="back" value="DocToc|Table of Contents"}}


Revision [15676]

Edited on 2012-01-16 02:00:01 by SirMud [Update link format]
Additions:
{{fbdoc item="back" value="DevToc|FreeBASIC Developer Information"}}
Deletions:
{{fbdoc item="back" value="DevToc|Table of Contents"}}


Revision [14949]

Edited on 2010-10-25 17:55:42 by DkLwikki [Update link format]
Additions:
{{fbdoc item="title" value="Purpose"}}----
Deletions:
===Parser/Compiler===
==__Motivation__==
==__Top level the parsing/compilation process__==
''##fb.bas:fbCompile()##'' is called from the fbc frontend for every input file. Parsing (and compiling) of the file begins here.
''##fb.bas:fbCompile()##''
- Open the input .bas
- Start the emitter (''##ir##'') (Open the output .asm)
- fbMainBegin() (Build the AST for the implicit main() or static constructor for module-level code)
- fbPreIncludes()
- fbIncludeFile() for every preinclude (found on the fbc command line)
- cProgram()
- fbMainEnd() (Close the implicit main())
- Finish emitting (''##ir##'') (Finish generating the .asm and close it)
- Close the input .bas
''##fb.bas:fbIncludeFile()##''
- Include file search
- lexPush() (Push a new lexer context to parse this #include file without disturbing the lexer's state in the parent file)
- Open the include file
- cProgram()
- Close the include file
- lexPop() (Restore the lexer state to the parent file)
''##parser-toplevel.bas:cProgram()##'' is the root of the [[FbGrammar FB grammar]], and parses a file. Here's a short & quick run down of what is done:
- cLine() repeatedly until EOF
- cLabel()
- cStatement()
- Declarations
- UDT declarations, typedefs
- Variables (DIM, VAR, ...)
- Procedure declarations (DECLARE)
- Procedure bodies (SUB, FUNCTION, ...)
(Procs temporarily replace the implicit module level procedure, so any AST nodes go into them instead of the implicit main())
- Compounds statements (IF/ELSE, DO/LOOP, EXIT/CONTINUE DO, ...)
- Procedure calls
- Function result assignments
- Quirk statements (special QB rtlib/gfxlib statements)
- ASM blocks
- Assignments
- Procedure pointer calls
and most of them use cExpression() at some point.
==__Symbols__==
In order to be able to make the transition from tokens to AST, the parser needs to be able to recognize functions, variables, types, etc. The ''##symb##'' module keeps track of all these symbols and their namespaces and scopes. The parser can do lookups in the current scope, or in just specific namespaces. Many AST nodes have a corresponding symbol (e.g. variables and functions).
==__Data types__==
The parser has to check types, which are mostly a feature of the language, but also affect the assembly output (e.g. floating point operations). Symbols contain their data types, as well as AST nodes, although the AST mostly needs it for expressions and casting/conversions. For variables/functions, the data type is stored in both places.
A data type is represented as a combination of:
- ##dtype## integer
- 5 bits: raw type:
- ##void## (unknown type, e.g.: any ptr, type t as t)
- ##byte##, ##ubyte##
- ##char## (zstring pointers and their deref expressions)
- ##short##, ##ushort##
- ##wchar## (wstring pointers and their deref expressions)
- ##integer##, ##uinteger##
- ##enum## (integer)
- ##bitfield## (uinteger)
- ##long##, ##ulong##
- ##longint##, ##ulongint##
- ##single##, ##double##
- ##string## (variable length)
- ##fixstr## (fixed length strings, string * N, N is the type's length)
- ##struct## (UDT, -> subtype is used)
- ##namespace## (Used during name mangling?)
- ##function## (Used for function pointers, -> subtype contains full function declaration)
- ##forward reference## (will be changed to actual raw type when known, -> subtype is used)
- ##pointer## (could be removed; didn't the pointer count replace this?)
- ##xmmword## (Used by SSE emitter)

- 4 bits: PTR count
How many PTR's there are on the type, maximum 8. If > 0, then the data type is a pointer.

- 9 bits: CONST mask (8 PTR's + 1 "base")
%% Example CONST mask
const integer 000000001 (first CONST bit set)
integer const ptr 000000001 (ditto)
const integer ptr 000000010 (pointer to const)
const integer ptr const ptr 000000101 (const pointer to pointer to const)%%

- ##subtype## symbol, points to one the following:
- An UDT symbol
- A typedef symbol
- A forward reference symbol (will be replaced by actual subtype when known)
- ##length## integer
Found where the size is needed (e.g. structure size calculations, pointer arithmetic, stack offsets).


Revision [14898]

Edited on 2010-10-25 14:58:20 by DkLwikki [Update link format]
Additions:
{{fbdoc item="back" value="DevToc|Table of Contents"}}
Deletions:
@@[[DevToc Back to Table Of Content]]@@


Revision [14886]

Edited on 2010-10-24 15:20:59 by AgSwikki [Removed a couple of typos]
Additions:
The parser has to check types, which are mostly a feature of the language, but also affect the assembly output (e.g. floating point operations). Symbols contain their data types, as well as AST nodes, although the AST mostly needs it for expressions and casting/conversions. For variables/functions, the data type is stored in both places.
Deletions:
The parser has to check types, which are mostly a feature of the language, but also affect the assembly output (e.g. floating point operations). Symbols contain their data types, aswell as AST nodes, although the AST mostly needs it for expressions and casting/conversions. For variables/functions, the data type is stored in both places.


Revision [14882]

Edited on 2010-10-21 14:44:10 by DkLwikki [Removed a couple of typos]
Additions:
When parsing code a corresponding AST is built up to represent the program. The AST is used to represent executable code, but also to hold temporary expressions, for example the values of constants or the initializers found while parsing type or procedure declarations. The AST does //not// contain nodes for code flow constructs like IF, DO/LOOP, GOTO, RETURN, EXIT DO, etc., but it contains labels and branches. Likewise, several operations (like IIF(), ANDALSO, ORELSE, field dereference, member access) are replaced by the corresponding set of lower-level operations in the AST.
Deletions:
When parsing code ("executable" code, not declarations) a corresponding AST is built up to represent the program. Here, code flow constructs like IF, DO/LOOP, GOTO, RETURN, EXIT DO, etc. are turned into conditional branches or jumps to labels.


Revision [14878]

Edited on 2010-10-16 12:05:11 by DkLwikki [Removed a couple of typos]
Additions:
@@[[DevToc Back to Table Of Content]]@@


Revision [14877]

The oldest known version of this page was created on 2010-10-16 12:04:24 by DkLwikki [Removed a couple of typos]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki



sf.net phatcode