FreeBASIC grammar (extracted from the FreeBASIC sources)

AGS · Post by **AGS** » Dec 18, 2008 23:04

The FreeBASIC grammar is spread out across many pages of the manual. So I thought it would be nice to get the grammar on one page. And to get the grammar I took the FreeBASIC sources and extracted the lines containing FreeBASIC grammar rules. So here it is, the FreeBASIC grammar according to the FreeBASIC developers.

* PatternSearchMatcher[(['][']\s*\w+\s+[=]\s+.+)|(['][']\s*[|]\s*.+),true]

Lexical Grammar
''char classes:

ALPHA 'A' - 'Z'
DIGIT '0' - '9'
HEXDIG 'A' - 'F' | DIGIT
OCTDIG '0' - '7'
BINDIG '0' | '1'
ALPHADIGIT ALPHA | DIGIT
ISUFFIX '%' | '&'
FSUFFIX '!' | '#'
SUFFIX ISUFFIX | FSUFFIX | '$'

EXPCHAR 'D' | 'E'

OPERATOR '=' | '<' | '>' | '+' | '-' | '*' | '/' | '\' | '^'
DELIMITER '.' | ':' | ',' | ';' | '"' | '''

indentifier = (ALPHA | '_') { [ALPHADIGIT | '_' ] } [SUFFIX].
hex_oct_bin = 'H' HEXDIG+
| 'O' OCTDIG+
| 'B' BINDIG+
float = DOT DIGIT { DIGIT } [FSUFFIX | { EXPCHAR [opadd] DIGIT DIGIT } } | ].
number = DIGIT dig_dot_nil i_fsufx_nil
| '.' float
| '&' hex_oct_bin
dig_dot_nil = DIGIT dig_dot_nil
| ('.'|EXPCHAR) float
| .
i_fsufx_nil = ISUFFIX # is integer
| FSUFFIX # is float
| . # is def### !!! context sensitive !!!
string = '"' { ANY_CHAR_BUT_QUOTE } '"'. # less quotes
MultiLineComment = '/' ''' . '/' '''

Context Free Grammar

Program = Line* EOF? .

Line = Label? Statement? Comment? EOL .

Label = NUM_LIT
| ID ':' .

Statement = STT_SEPARATOR? ( Declaration
| ProcCallOrAssign
| CompoundStmt
| QuirkStmt
| AsmBlock
| AssignmentOrPtrCall )?
(STT_SEPARATOR Statement)* .

SttSeparator = (STT_SEPARATOR | EOL)+ .

Declaration = ConstDecl | TypeDecl | VariableDecl | ProcDecl | DefDecl | OptDecl.

ConstDecl = CONST (AS SymbolType)? ConstAssign (DECL_SEPARATOR ConstAssign)*
ConstAssign = ID (AS SymbolType)? '=' ConstExpression

TypeDecl = (TYPE|UNION) ID (ALIAS LITSTR)? (FIELD '=' Expression)? Comment? SttSeparator TypeLine+ END (TYPE|UNION) .

VariableDecl = (REDIM PRESERVE?|DIM|COMMON) SHARED? SymbolDef
| EXTERN IMPORT? SymbolDef ALIAS STR_LIT
| STATIC SymbolDef .

ProcDecl = DECLARE ((SUB | FUNCTION) ProcHeader | OPERATOR OperatorHeader ) .

ProcHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?

CallConvention = (CDECL|STDCALL|PASCAL)?

OperatorHeader = Operator CallConvention? OVERLOAD? (ALIAS LIT_STRING)?

DefDecl = (DEFINT|DEFLNG|DEFSNG|DEFDBL|DEFSTR) (CHAR '-' CHAR ','?)* .

ProcCallOrAssign= CALL ID ('(' ProcParamList ')')?
| ID ProcParamList?
| (ID | FUNCTION | OPERATOR | PROPERTY) '=' Expression .

ProcParamList = ProcArg (DECL_SEPARATOR ProcArg)* .
ProcArg = BYVAL? (ID(('(' ')')? | Expression) .

CompoundStmt = IfStatement
| ForStatement
| DoStatement
| WhileStatement
| SelectStatement
| ExitStatement
| ContinueStatement
| EndStatement

IfStmtBegin = IF Expression THEN (BlockIfStatement | SingleIfStatement) .

IfStmtNext = ELSEIF Expression THEN
| ELSE .

IfStmtEnd = END IF | ENDIF .

ForStmtBegin = FOR ID (AS DataType)? '=' Expression TO Expression (STEP Expression)? .

ForStmtEnd = NEXT (ID (',' ID?))? .

DoStmtBegin = DO ((WHILE | UNTIL) Expression)? .

DoStmtEnd = LOOP ((WHILE | UNTIL) Expression)? .

WhileStmtBegin = WHILE Expression .

WhileStmtEnd = WEND

SelectStatement = SELECT CASE (AS CONST)? Expression .

SelectStmtNext = CASE (ELSE | (CaseExpression (',' CaseExpression)*)) .

CaseExpression = (Expression (TO Expression)?)?
| (IS REL_OP Expression)? .

SelectStmtEnd = END SELECT .

SelConstStmtBegin = SELECT CASE AS CONST Expression{int} .

cSelConstStmtNext = CASE (ELSE | (ConstExpression{int} (',' ConstExpression{int})*)) .

SelConstStmtEnd = END SELECT .

ExitStatement = EXIT (FOR | DO | WHILE | SELECT | SUB | FUNCTION)

ContinueStatement = CONTINUE (FOR | DO | WHILE)

CompoundEnd = END (IF | SELECT | SUB | FUNCTION | SCOPE | WITH | NAMESPACE | EXTERN)

QuirkStmt = GotoStmt
| ArrayStmt
| PrintStmt
| MidStmt
| DataStmt
| etc .

GotoStmt = GOTO LABEL
| GOSUB LABEL
| RETURN LABEL?
| RESUME NEXT? .

ArrayStmt = ERASE ID (',' ID)*;
| SWAP Variable, Variable .

PrintStmt = (PRINT|'?') ('#' Expression ',')? (USING Expression{str} ';')? (Expression? ';'|"," )*

MidStmt = MID '(' Expression{str}, Expression{int} (',' Expression{int}) ')' '=' Expression{str} .

DataStmt = RESTORE LABEL?
| READ Variable{int|flt|str} (',' Variable{int|flt|str})*
| DATA literal|constant (',' literal|constant)*

AsmBlock = ASM Comment? SttSeparator

Assignment = LET? Variable BOP? '=' Expression
| Variable{function ptr} '(' ProcParamList ')' .

''Comment = (COMMENT_CHAR | REM) ((DIRECTIVE_CHAR Directive)
'' | (any_char_but_EOL*)) .
''Directive = INCLUDE ONCE? ':' '\'' STR_LIT '\''
'' | DYNAMIC
'' | STATIC .

''ExternStmtBegin = EXTERN "mangling_spec" (LIB LITSTR)? .
''ExternStmtEnd = END EXTERN .
''NamespaceStmtBegin = NAMESPACE (ID (ALIAS LITSTR)?)? .
''NamespaceStmtEnd = END NAMESPACE .
''Usingtmt = USING ID (',' ID)*
''ScopeStmtBegin = SCOPE .
''ScopeStmtEnd = END SCOPE .
''WithStmtBegin = WITH Variable .
''WithStmtEnd = END WITH .
''EndStatement = END Expression? .
''EnumConstDecl = ID ('=' ConstExpression)? .
''EnumBody = (EnumDecl (',' EnumDecl)? Comment? SttSeparator)+ .
''EnumDecl = ENUM ID? (ALIAS LITSTR)? EXPLICIT? Comment? SttSeparator
''OptDecl = OPTION (EXPLICIT|BASE NUM_LIT|BYVAL|PRIVATE|ESCAPE|DYNAMIC|STATIC)
'' ParamDecl = (BYVAL|BYREF)? ID (('(' ')')? (AS SymbolType)?)? ('=" (NUM_LIT|STR_LIT))? .
''TypeProtoDecl = DECLARE ( CONSTRUCTOR Params
'' | DESTRUCTOR
'' | OPERATOR Op Params
'' | PROPERTY Params
'' | (STATIC|CONST)? SUB|FUNCTION Params ) .
''TypeEnumDecl = ENUM|CONST ...
''TypeMultElementDecl = AS SymbolType ID (ArrayDecl | ':' NUMLIT)? ('=' Expression)?
'' TypeElementDecl = ID (ArrayDecl| ':' NUMLIT)? AS SymbolType ('=' Expression)?
''TypeBody = ( (UNION|TYPE Comment? SttSeparator
'' | ElementDecl
'' | AS AsElementDecl )+ .
''SymbolType = CONST? UNSIGNED? (
'' | CHAR|BYTE
'' | SHORT|WORD
'' | INTEGER|LONG|DWORD
'' | SINGLE
'' | DOUBLE
'' | STRING ('*' NUM_LIT)?
'' | USERDEFTYPE
'' | (FUNCTION|SUB) ('(' args ')') (AS SymbolType)?
'' Typedef = TYPE ((symbol AS DataType (',')?)+
''VarDecl = ID ('(' ArrayDecl? ')')? (AS SymbolType)? ('=' VarInitializer)?
''ArrayDecl = '(' Expression (TO Expression)?
''ArrayDecl = '(' Expression (TO Expression)?
''AutoVarDecl = AUTO SHARED? SymbolDef '=' VarInitializer
''Atom = Constant | Function | QuirkFunction | Variable | Literal .
''Expression = LogExpression .
''LogExpression = LogOrExpression ( (XOR | EQV | IMP) LogOrExpression )* .
''LogOrExpression = LogAndExpression ( OR LogAndExpression )* .
''LogAndExpression = RelExpression ( AND RelExpression )* .
''RelExpression = CatExpression ( (EQ | GT | LT | NE | LE | GE) CatExpression )* .
''CatExpression = AddExpression ( & AddExpression )* .
''AddExpression = ShiftExpression ( ('+' | '-') ShiftExpression )* .
''ShiftExpression = ModExpression ( (SHL | SHR) ModExpression )* .
''ModExpression = IntDivExpression ( MOD IntDivExpression )* .
''MultExpression = ExpExpression ( ('*' | '/') ExpExpression )* .
''ExpExpression = NegNotExpression ( '^' NegNotExpression )* .
'' Constant = ID .
'' LitString = STR_LITERAL STR_LITERAL* .
''Literal = NUM_LITERAL
'' | STR_LITERAL STR_LITERAL* .
''Function = ID ('(' ProcParamList ')')? FuncPtrOrMemberDeref? .
'' | NOT RelExpression
'' | HighestPresExpr .
'' | ( DerefExpr
'' | CastingExpr
'' | PtrTypeCastingExpr
'' | ParentExpression
'' | AnonUDT
'' | Atom .
'' CastingExpr = CAST '(' DataType ',' Expression ')'
'' PtrTypeCastingExpr = CPTR '(' DataType ',' Expression{int|uint|ptr} ')'
''DerefExpression = DREF+ HighestPresExpr .
''AddrOfExpression = VARPTR '(' HighPrecExpr ')'
'' | PROCPTR '(' Proc ('('')')? ')'
'' | '@' (Proc ('('')')? | HighPrecExpr)
'' | SADD|STRPTR '(' Variable{str}|Const{str}|Literal{str} ')' .
''FieldArray = '(' Expression (',' Expression)* ')' .
'' MemberId = ID ArrayIdx?
'' UdtMember = MemberId ('.' MemberId)*
''MemberDeref = (('->' DREF* | '[' Expression ']' '.'?) UdtMember)* .
''FuncPtrOrDeref = FuncPtr '(' Args? ')'
'' | MemberDeref .
''DynArrayIdx = '(' Expression (',' Expression)* ')' .
''ArgArrayIdx = '(' Expression (',' Expression)* ')' .
''ArrayIdx = '(' Expression (',' Expression)* ')' .
''Variable = ID ArrayIdx? UdtMember? FuncPtrOrMemberDeref? .
''Variable = ID .
''WithVariable = '.' UdtMember FuncPtrOrMemberDeref? .
''Variable = '.'? ID ArrayIdx? UdtMember? FuncPtrOrMemberDeref? .
''ImplicitDataMember = UdtMember? FuncPtrOrMemberDeref? .
''cVarOrDeref = Deref | PtrTypeCasting | AddrOf | Variable
'' Identifier = (ID{namespace|class} '.')* ID
'' | ID ('.' ID)* .
'' ParentId = ID{namespace|class} ('.' ID{namespace|class})* .
''AsmCode = (Text !(END|Comment|NEWLINE))*
''PropHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
''CtorHeader = CDECL? OVERLOAD? (ALIAS LIT_STRING)? Parameters? STATIC? EXPORT?
''ProcStmtBegin = (PRIVATE|PUBLIC)? STATIC?
''ProcStmtEnd = END (SUB | FUNCTION) .
''ProcArg = BYVAL? (ID(('(' ')')? | Expression) .
''ProcParam = BYVAL? (ID(('(' ')')? | Expression) .
''ProcArgList = ProcArg (DECL_SEPARATOR ProcArg)* .
''ProcArgList = ProcArg (DECL_SEPARATOR ProcArg)* .
'' | ID ProcParamList?
'' | (ID | FUNCTION | OPERATOR | PROPERTY) '=' Expression .
''cArrayFunct = (LBOUND|UBOUND) '(' ID (',' Expression)? ')' .
''TypeConvExpr = (C### '(' expression ')') .
'' AnonUDT = TYPE ('<' SymbolType '>')? '(' ... ')'
'' ViewStmt = VIEW (PRINT (Expression TO Expression)?) .
'' ScreenFunct = SCREEN '(' expr ',' expr ( ',' expr )? ')'
'' | SCREEN ( '(' ')' )? -- returns the current active/visible pages
''ErrorStmt = ERROR Expression
'' | ERR '=' Expression .
''cErrorFunct = ERR .
'' WriteStmt = WRITE ('#' Expression)? (Expression? "," )*
'' LineInputStmt = LINE INPUT ';'? ('#' Expression| Expression{str}?) (','|';')? Variable? .
'' InputStmt = INPUT ';'? (('#' Expression| STRING_LIT) (','|';'))? Variable (',' Variable)*
'' Put = PUT '#' Expression ',' Expression? ',' Expression{str|int|float|array} (',' Expression)?
'' Get = GET '#' Expression ',' Expression? ',' Variable{str|int|float|array} (',' Expression)?
'' FileOpen = OPEN Expression{str}
'' FileStmt = OPEN ...
'' | CLOSE ...
'' | SEEK ...
'' | LOCK ...
'' | ...
'' FileFunct = SEEK '(' Expression ')' |
'' GfxPset = PSET ( Expr ',' )? STEP? '(' Expr ',' Expr ')' (',' Expr )?
'' GfxCircle = CIRCLE ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' Expr ((',' Expr? (',' Expr? (',' Expr? (',' Expr (',' Expr)? )? )?)?)?)?
'' GfxPaint = PAINT ( Expr ',' )? STEP? '(' expr ',' expr ')' (',' expr? (',' expr? ) )
'' GfxDrawString = DRAW STRING ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' Expr ( ',' Expr ( ',' Expr ( ',' Expr ( ',' Expr )? )? )? )?
'' GfxDraw = DRAW ( Expr ',' )? Expr
'' GfxView = VIEW (SCREEN? '(' Expr ',' Expr ')' '-' '(' Expr ',' Expr ')' (',' Expr? (',' Expr)?)? )?
'' GfxPalette = PALETTE GET? ((USING Variable) | (Expr ',' Expr (',' Expr ',' Expr)?)?)
'' GfxPut = PUT ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' ('(' Expr ',' Expr ')' '-' '(' Expr ',' Expr ')' ',')? Variable (',' Mode (',' Alpha)?)?
'' GfxGet = GET ( Expr ',' )? STEP? '(' Expr ',' Expr ')' '-' STEP? '(' Expr ',' Expr ')' ',' Variable
''
'' GfxScreen = SCREEN (num | ((expr (((',' expr)? ',' expr)? expr)? ',' expr))
'' GfxScreenRes = SCREENRES expr ',' expr (((',' expr)? ',' expr)? ',' expr)?
'' GfxPoint = POINT '(' Expr ( ',' ( Expr )? ( ',' Expr )? )? ')'
'' GfxImageCreate = IMAGECREATE '(' Expr ',' Expr ( ',' ( Expr )? ( ',' Expr )? )? ')'
''cIIFFunct = IIF '(' condexpr ',' truexpr ',' falsexpr ')' .
'' cMathFunct = ABS( Expression )
'' | SGN( Expression )
'' | FIX( Expression )
'' | INT( Expression )
'' | LEN( data type | Expression ) .
''cOperatorNew = NEW DataType|Constructor()
'' | NEW DataType[Expr] .
''cOperatorDelete = DELETE expr
'' | DELETE[] expr .
''OnStmt = ON LOCAL? (Keyword | Expression) (GOTO|GOSUB) Label .
''PokeStmt = POKE Expression, Expression .
'' PeekFunct = PEEK '(' (SymbolType ',')? Expression ')' .

'' LsetStmt = LSET String|UDT (','|'=') Expression|UDT
'' cCVXFunct = CVD '(' Expression{str} ')'
'' | CVS '(' Expression{str} ')'
'' | CVI '(' Expression{str} ')'
'' | CVL '(' Expression{str} ')'
'' | CVSHORT '(' Expression{str} ')'
'' | CVLONGINT '(' Expression{str} ')'
'' cMKXFunct = MKD '(' Expression{double} ')'
'' | MKS '(' Expression{float} ')'
'' | MKI '(' Expression{int} ')'
'' | MKL '(' Expression{long} ')'
'' | MKSHORT '(' Expression{short} ')'
'' | MKLONGINT '(' Expression{longint} ')'
'' cStringFunct = W|STR$ '(' Expression{int|float|double} ')'
'' | MID$ '(' Expression ',' Expression (',' Expression)? ')'
'' | W|STRING$ '(' Expression ',' Expression{int|str} ')' .
'' | INSTR '(' (Expression{int} ',')? Expression{str}, "ANY"? Expression{str} ')'
'' | INSTRREV '(' Expression{str}, "ANY"? Expression{str} (',' Expression{int})? ')'
'' | RTRIM$ '(' Expression{str} (, "ANY" Expression{str} )? ')'
''cVAFunct = VA_FIRST ('(' ')')? .
''QuirkFunction = QBFUNCTION ('(' ProcParamList ')')? .
''Expression = LogExpression .
''LogExpression = RelExpression ( (AND | OR) RelExpression )* .
''RelExpression = AddExpression ( (EQ | GT | LT | NE | LE | GE) AddExpression )* .
''AddExpression = MultExpression ( ('+' | '-') MultExpression )* .
''MultExpression = ParentExpr ( ('*' | '/' | '\') ParentExpr )* .
'' ParentExpr = '(' Expression ')'
'' | DEFINED'(' ID ')'
'' | LITERAL
'' | NOT RelExpression .
'' Define = DEFINE ID (!WHITESPC '(' ID (',' ID)* ')')? LITERAL+
'' | MACRO ID '(' ID (',' ID)* ')' Comment? EOL
'' Pragma = PRAGMA
'' | POP '(' symbol ')'
'' | symbol ('=' expression{int})?
'' PreProcess = '#'DEFINE ID (!WHITESPC '(' ID (',' ID)* ')')? LITERAL*
'' | '#'UNDEF ID
'' | '#'IFDEF ID
'' | '#'IFNDEF ID
'' | '#'IF Expression
'' | '#'ELSE
'' | '#'ELSEIF Expression
'' | '#'ENDIF
'' | '#'PRINT LITERAL*
'' | '#'INCLUDE ONCE? LIT_STR
'' | '#'INCLIB LIT_STR
'' | '#'LIBPATH LIT_STR
'' | '#'ERROR LIT_STR .
'' ppInclude = '#'INCLUDE ONCE? LIT_STR
'' ppIncLib = '#'INCLIB LIT_STR
'' ppLibPath = '#'LIBPATH LIT_STR
'' ppLine = '#'LINE LIT_NUM LIT_STR?
''EnumDecl = ENUM ID? (ALIAS LITSTR)? EXPLICIT? Comment? SttSeparator
'' EnumLine+
'' END ENUM .
''Parameters= '(' ParamDecl (',' ParamDecl)* ')' .
''TypeDecl = (TYPE|UNION) ID (ALIAS LITSTR)? (FIELD '=' Expression)? Comment? SttSeparator
'' TypeLine+
'' END (TYPE|UNION) .
''SymbolType = CONST? UNSIGNED? (
'' ANY
'' | CHAR|BYTE
'' | SHORT|WORD
'' | INTEGER|LONG|DWORD
'' | SINGLE
'' | DOUBLE
'' | STRING ('*' NUM_LIT)?
'' | USERDEFTYPE
'' | (FUNCTION|SUB) ('(' args ')') (AS SymbolType)?
'' (CONST? (PTR|POINTER))* .
'' Typedef = TYPE ((symbol AS DataType (',')?)+
'' AS DataType (symbol (',')?)+
''ArrayDecl = '(' Expression (TO Expression)?
'' (',' Expression (TO Expression)?)*
'' ')' .
''AutoVarDecl = AUTO SHARED? SymbolDef '=' VarInitializer
'' (',' SymbolDef)* .
''NegNotExpression= ('-'|'+'|) ExpExpression
'' | NOT RelExpression
'' | HighestPresExpr .
''AsmBlock = ASM Comment? SttSeparator
'' (AsmCode Comment? NewLine)+
'' END ASM .
''ProcHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? ((AS SymbolType)? | CONSTRUCTOR|DESTRUCTOR)?
'' Priority? STATIC? EXPORT?
''OperatorHeader = Operator CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? (AS SymbolType)? STATIC? EXPORT?
''PropHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? (AS SymbolType)? STATIC? EXPORT?
''ProcStmtBegin = (PRIVATE|PUBLIC)? STATIC?
(SUB|FUNCTION|CONSTRUCTOR|DESTRUCTOR|OPERATOR) ProcHeader .
'' FileFunct = SEEK '(' Expression ')' |
'' INPUT '(' Expr, (',' '#'? Expr)? ')'.

In case you are wondering what the

* PatternSearchMatcher[(['][']\s*\w+\s+[=]\s+.+)|(['][']\s*[|]\s*.+),true]

at the top is: it is the regular expression I used to extract the lines containing grammar. It's not very exact (I had to get some lines by hand) but it worked for most of the grammar.

[Edit 22/12/2008] Started changing the sequence of rules to make the grammer more readable.
Removed some garbage and one or two duplicate rules.
[Edit 23/12/2008] Changed sequence of rules. Last rule added to sequenced rules was compoundstatement. Removed some more garbage and whitelines from the grammar.
[Edit 24/12/2008] Finished adding compoundstatement. There are more CompoundStmts in the sourcecode that are not mentioned in the comment on CompoundStmt (With, Namespace, Using, Scope).
[Edit 26/12/2008] Added quirckstatement (incomplete), asmblock and assignment. Removed some duplicate lines.

stylin · Post by **stylin** » Dec 19, 2008 0:09

AGS, very nice. Now I guess the next step is to go through the lexer/parser modules and check the grammar -- and/or the procedures themselves -- for correctness (whether they match up, and also the [probably unlikely] chance that a parser procedure's grammar wasn't commented).

I think we'll also need to decide on a template to document the grammar with, ie., are we going to put everything on one page, or split it up ? It might be nice to make non-terminals links for easier navigation. It would probably be good to get all this in order so we don't end up with errors if something needs to be changed later on. Any ideas ? Again, good work !

edit: I guess we'll need to distinguish between the grammar of the different dialects somehow, too. Like how the "fb" dialect allows statements like Open to be used as a Function procedure and how DIM doesn't allow implicit type inference, for examples. It seems lots of this will be a duplication of the existing "syntax" in the KeyPg*s, particularly with the quirks. Hmm..

AGS · Post by **AGS** » Dec 20, 2008 22:19

stylin wrote:AGS, very nice. Now I guess the next step is to go through the lexer/parser modules and check the grammar -- and/or the procedures themselves -- for correctness (whether they match up, and also the [probably unlikely] chance that a parser procedure's grammar wasn't commented).

Parser procedures tend to have a name starting with a c.
Using geany (it has some support for FreeBASIC) I could find several procedures starting with a c that did not have a grammar comment.

For instance, cGfxStmt and cGfxFunct have no comment. Other parsing procedures have no comment either and I am not exactly sure whether they remained uncommented by mistake or because the developers found it more logical to not add grammar comment to certain parsing routines.

stylin wrote: I think we'll also need to decide on a template to document the grammar with, ie., are we going to put everything on one page, or split it up ? It might be nice to make non-terminals links for easier navigation. It would probably be good to get all this in order so we don't end up with errors if something needs to be changed later on. Any ideas ? Again, good work !

edit: I guess we'll need to distinguish between the grammar of the different dialects somehow, too. Like how the "fb" dialect allows statements like Open to be used as a Function procedure and how DIM doesn't allow implicit type inference, for examples. It seems lots of this will be a duplication of the existing "syntax" in the KeyPg*s, particularly with the quirks. Hmm..

The language reference contained within the FB manual is like the grammar + semantics and what I want to add is a recapitulation of the grammar in x pages.

Starting at the top rule

Program = Line* EOF?
Line = Label? Statement? Comment? EOL

going all the way down to the constants.

FB syntax only because otherwise you'd have to write three grammars (one for QB, one for Lite and one for FB) which are almost identical.

A grammar could look like this on a webpage (has to be LALR(1) if you want to generate it automagically with the goldparser)

http://www.devincook.com/GOLDParser/bui ... m#mult_exp

That's all the rules on one page, hyperlinked. In the case of FB this would yield a very lengthy page. I've got a VB6 grammar lying around (written for TXL, an interpreting parser generator) that goes on for about
13 pages (printed A4) with rules like

define access_modifier
'public | 'private | 'friend | 'static
end define

and containing quite a lot of statements (60+ including loops, choice etc...). To put 16 pages of printed A4 on one web page is a bit much. Better to start at the toplevel (rule program) and split it up at some arbitrary point (how far are people willing to scroll down a page?).

McLovin · Post by **McLovin** » Dec 21, 2008 0:18

Hi,

I was playing around with the Gold Parser grammar definitions. I got a bit of the FB grammar prepared for Gold but eventually ran into Reduce-Reduce errors on the Expressions definitions. I haven't got back to figuring out the reason for it.

I also translated the "Kessels" Gold template file. If you use Gold then you know that you can output the grammar definitions to include files that your program can read directly (rather than having to load and parse the grammar all the time from the .cgt compiled grammar file).

Here is the template. Save it with a .pgt extension and place it in the Templates subdirectory.

Code: Select all

##TEMPLATE-NAME 'FreeBASIC - McLovin engine grammar.bi'
##LANGUAGE 'FreeBASIC'
##ENGINE-NAME 'McLovin FreeBASIC engine'
##AUTHOR 'McLovin'
##FILE-EXTENSION 'bi'
##NOTES
This template creates a .bi file for use by the FreeBASIC-McLovin engine.
The complete CGT (Compiled Grammar Table) is hardcoded into the .bi
file, so initializing the engine is very fast (it does not have
to load the CGT file from disk). Based on Kessels engine.
##END-NOTES
##ID-SEPARATOR '_'
##ID-SYMBOL-PREFIX 'Symbol'
##ID-RULE-PREFIX 'Rule'
##DELIMITER ','
''
''
''  This file was generated by the "FreeBASIC - McLovin engine grammar.bi" template. 
''  (Based on the translation from "C - Kessels engine grammar.h")
''
''

#include once "\crt\wchar.bi"            ' wchar_t 

##CHAR-SET-TABLE
##CHAR-SETS
#define GrammarCharset%Index%CharCount %CharCount%
Dim GrammarCharset%Index%( %CharCount% ) As wchar_t => { _
##CHARS
  %UnicodeIndex%, _
##END-CHARS
  0}
##END-CHAR-SETS
##END-CHAR-SET-TABLE


''
''  struct DfaEdgeStruct {
''    int TargetState;
''    int CharCount;
''    wchar_t *CharacterSet;
''    };
''
##DFA-TABLE
##DFA-STATES
Dim GrammarDfaEdgeArray%Index%( %EdgeCount% ) As DfaEdgeStruct => { _
##DFA-EDGES
  (%Target%,GrammarCharset%CharSetIndex%CharCount,@GrammarCharset%CharSetIndex%(0)), _
##END-DFA-EDGES
  (-1,0,NULL)}
##END-DFA-STATES
##END-DFA-TABLE



##RULE-TABLE
##RULES
Dim GrammarRuleSymbolArray%Index%( %SymbolCount% ) As Integer => { _
##RULE-SYMBOLS
  %SymbolIndex%, _
##END-RULE-SYMBOLS
  -1}
##END-RULES
##END-RULE-TABLE


##LALR-TABLE
##LALR-STATES
Dim GrammarLalrActionArray%Index%( %ActionCount% ) As ActionStruct => { _
##LALR-ACTIONS
  (%SymbolIndex%,%Action%,%Value%), _
##END-LALR-ACTIONS
  (-1,-1,-1)}
##END-LALR-STATES
##END-LALR-TABLE



##SYMBOL-TABLE
''
''  struct SymbolStruct {
''    short Kind;
''    wchar_t *Name;
''    };
''
Dim GrammarSymbolArray( %Count%-1 ) As SymbolStruct => { _
##SYMBOLS
  /' %Value.Padded% '/   (%Kind%,@wstr("%Name%"))%Delimiter% _
##END-SYMBOLS
  }
##END-SYMBOL-TABLE



##DFA-TABLE
''
''  struct DfaEdgeStruct {
''    int TargetState;
''    wchar_t *CharacterSet;
''    };
''  struct DfaStateStruct {
''    int AcceptSymbol;
''    int EdgeCount;
''    struct DfaEdgeStruct *Edges;
''    };
''
Dim GrammarDfaStateArray ( %Count% ) As DfaStateStruct => { _
##DFA-STATES
  /' %Index% '/    (%AcceptIndex%,%EdgeCount%,@GrammarDfaEdgeArray%Index%(0)), _
##END-DFA-STATES
  (-1,-1,NULL)}
##END-DFA-TABLE



##RULE-TABLE
''
''  struct RuleStruct {
''    int Head;
''    int SymbolsCount;
''    int *Symbols;
''    wchar_t *Description;
''    };
''
Dim GrammarRuleArray( %Count%-1 ) As RuleStruct => { _
##RULES
  /' %Value.Padded% '/   (%NonterminalIndex%,%SymbolCount%,@GrammarRuleSymbolArray%Index%(0),@wstr("%Description%"))%Delimiter% _
##END-RULES
  }
##END-RULE-TABLE



##LALR-TABLE
''
''  struct ActionStruct {
''    int Entry;
''    short Action;
''    int Target;
''    };
''  struct LalrStateStruct {
''    int ActionCount;
''    struct ActionStruct *Actions;
''    };
''
Dim GrammarLalrStateArray( %Count% ) As LalrStateStruct => { _
##LALR-STATES
  /' %Index% '/   (%ActionCount%,@GrammarLalrActionArray%Index%(0)), _
##END-LALR-STATES
  (-1,NULL)}
##END-LALR-TABLE



''
''  struct GrammarStruct {
''    char CaseSensitive;
''    int InitialSymbol;
''    int InitialDfaState;
''    int InitialLalrState;
''    int SymbolCount;
''    struct SymbolStruct *SymbolArray;
''    int RuleCount;
''    struct RuleStruct *RuleArray;
''    int DfaStateCount;
''    struct DfaStateStruct *DfaArray;
''    int LalrStateCount;
''    struct LalrStateStruct *LalrArray;
''    };
''
Dim Shared Grammar As GrammarStruct 
With Grammar
##PARAMETERS
  .CaseSensitive    =    %CaseSensitive%
  .InitialSymbol    =    %StartSymbol%
##END-PARAMETERS
##DFA-TABLE
  .InitialDfaState  =    %InitialState%
##END-DFA-TABLE
##LALR-TABLE
  .InitialLalrState =    %InitialState%
##END-LALR-TABLE
##SYMBOL-TABLE
  .SymbolCount      =    %Count%
  .SymbolArray      =    @GrammarSymbolArray(0)
##END-SYMBOL-TABLE
##RULE-TABLE
  .RuleCount        =    %Count%
  .RuleArray        =    @GrammarRuleArray(0)
##END-RULE-TABLE
##DFA-TABLE
  .DfaStateCount    =    %Count%
  .DfaArray         =    @GrammarDfaStateArray(0)
##END-DFA-TABLE
##LALR-TABLE
  .LalrStateCount   =    %Count%
  .LalrArray        =    @GrammarLalrStateArray(0)
##END-LALR-TABLE
End With

Now, in order to read and actually work with the generated include file you would need to use an "engine" to read it. Here is the engine source code. Compile these files to engine.o and include it with your project when you compile it.

This is the "engine.bas" file. The "engine.bi" file follows after this one.

Code: Select all

''  
''  McLovin FreeBASIC engine 
''
''  Based on translation of C code from:
''    Kessels engine for GOLD.
''    Jeroen C. Kessels
''    Internet Engineer
''    http://www.kessels.com/
''  
''    Copyright:   Jeroen C. Kessels
''    Date:        22 march 2006
''    Version:     1.8
''  
''  


#include once "\crt\wchar.bi"            ' wchar_t 

#define CR     13
#define LF     10
#define CRLF   chr(13,10)

#include once "engine.bi"                ' Defines all the data structs. 


'' FIFO (first in first out) stack of Tokens. 
type TokenStackStruct 
  Token     as TokenStruct ptr
  LalrState as integer               ' Index into Grammar.LalrArray[]. 
  NextToken as TokenStackStruct ptr  ' Pointer to next item. 
end type



' Create a readable/printable string from the wchar_t pointer
function wPrint( byval pByte as ushort ptr ) as string
   
  dim st as string
  dim i  as integer
  
  if pByte = 0 then exit function
  
  i = 0 
  do until pByte[i] = 0
    st = st & chr(pByte[i]) 
    i = i + 1  
  loop
  
  return st
   
end function


'********* DFA Tokenizer *********************************************************


'' Read 'Length' characters from the InputBuf and return as a
'' malloc'ed string. The string is terminated by appending a zero,
'' although the routine is binary-safe.
'' Return NULL if out of memory. */
function ReadString( _
    InputBuf  as wchar_t ptr, _      ' The input data. 
    InputSize as long, _             ' Size of the input data. 
    InputHere as long ptr, _         ' Index into input data. 
    nline     as long ptr, _         ' Current line number. 
    nColumn   as long ptr, _         ' Current column number. 
    nLength   as long _              ' Number of characters to copy. 
    ) as wchar_t ptr

  dim wSt as wchar_t ptr
  dim i   as long

  wSt    = callocate((nLength + 1) * sizeof(wchar_t))
  if wSt = null then return(null)

  for i = 0 to nLength - 1
    
    if *InputHere < InputSize then
        if InputBuf[*InputHere] = CR then
          if (*InputHere + 1 < InputSize) and _
               (InputBuf[*InputHere + 1] <> LF) then
            *nline   = *nline + 1
            *nColumn = 0
          end if
        end if
        
        if InputBuf[*InputHere] = LF then
          *nline   = *nline + 1
          *nColumn = 0
        end if
        
        wSt[i] = InputBuf[*InputHere] 
        
        *InputHere = *InputHere + 1
        *nColumn   = *nColumn + 1
    else 
        wSt[i] = 0
    end if
  next
  
  wSt[i] = 0
  return wSt
  
end function




'' Search for a character in a characterset. Return 1 if found,
'' 0 if not found.
function FindChar( ThisChar     as wchar_t, _
                   CharacterSet as wchar_t ptr, _
                   Count        as long _
                   ) as integer
 
  dim Here     as long
  dim Interval as long

  ' Use wcschr() for charactersets with a length of up to 11 characters. 
  if (Count < 11) then
    if (wcschr(CharacterSet, ThisChar) <> null) then return 1
    return 0
  end if

  '   binary search the characterset for the character. This method is
  '   possible because GOLD always produces sorted charactersets.
  '   Measurements show that although the code is more complex, this
  '   binary search is faster than wcschr() for charactersets longer
  '   than 11 characters. At 100 characters it's 4 times faster. */
  Interval = 32768
  while (Interval > Count) 
     Interval = (Interval shr 1)
  wend   
  Here = Interval - 1
  Interval = (Interval shr 1)
  while (Interval > 0) 
    if (CharacterSet[Here] = ThisChar) then return 1
    if (CharacterSet[Here] > ThisChar) then
        Here = Here - Interval
    else  
        while (Here + Interval >= Count) 
          Interval = (Interval shr 1)
          if (Interval = 0) then return 0 
        wend
        Here = Here + Interval
    end if
    Interval = (Interval shr 1)
  wend
  
  if (CharacterSet[Here] = ThisChar) then return 1
  return 0 

end function





''  Parse a symbol from the InputBuf.
''  Symbols are lexical entities of 1 or more characters. The parser
''  uses the Grammar.DfaArray[] compiled by GOLD, which is basically
''  a tree of charactersets. The parser takes characters from the
''  input and descends the tree by looking for the character in the
''  characterset. At the last branch of the tree is an index into the
''  Grammar.SymbolsArray[], also compiled by GOLD.
''  Return the index into Grammar.SymbolsArray[], and a pointer to a
''  malloc'ed string with characters from the input.
''  Symbol 0 is 'SYMBOLEOF' (string is NULL)
''  Symbol 1 is 'SYMBOLERROR' (string is 1 character).
''
function RetrieveToken( _
    InputBuf  as wchar_t ptr, _   ' The input data. 
    InputSize as long, _          ' Size of the input data. 
    InputHere as long ptr, _      ' Index into input data. 
    nline     as long ptr, _      ' Current line number. 
    nColumn   as long ptr, _      ' Current column number. 
    Symbol    as integer ptr _
    ) as wchar_t ptr
    
  dim DfaIndex     as integer     ' Index into Grammar.DfaArray[]. 
  dim Length       as long        ' Number of processed characters from data->InputBuf. 
  dim AcceptIndex  as integer     ' Longest found symbol so far. 
  dim AcceptLength as long        ' Length of longest found symbol. 
  dim i            as long

  ' Sanity check (no input). 
  if ((InputBuf = null) or (InputBuf[0] = 0)) then
    *Symbol = 0
    return null
  end if

  ' If there are no more characters in the input then return SYMBOLEOF and NULL. 
  if (*InputHere >= InputSize) then
    *Symbol = 0
    return null
  end if  

  ' Compare characters from the input with the DFA charactersets until not found. 
  DfaIndex     = Grammar.InitialDfaState
  Length       = 0
  AcceptLength = 0
  AcceptIndex  = -1
  
  while (*InputHere + Length) < InputSize
    ' if this is a valid symbol-terminal then save it. We know the
    ' input matches the symbol, but there may be a longer symbol that
    ' matches so we have to keep scanning. 
    if Grammar.DfaArray[DfaIndex].AcceptSymbol >= 0 then
      AcceptIndex  = DfaIndex
      AcceptLength = Length
    end if

    ' Walk through the edges and scan the characterset of each edge for the current character. 
    for i = 0 to Grammar.DfaArray[DfaIndex].EdgeCount - 1
      if FindChar( InputBuf[*InputHere + Length], _
                   Grammar.DfaArray[DfaIndex].Edges[i].CharacterSet, _
                   Grammar.DfaArray[DfaIndex].Edges[i].CharCount) = 1 then 
          exit for
      end if    
    next

    ' If not found then exit the loop. 
    if (i >= Grammar.DfaArray[DfaIndex].EdgeCount) then exit while

    ' Jump to the TargetState, which points to another set of DFA edges
    ' describing the next character. 
    DfaIndex = Grammar.DfaArray[DfaIndex].Edges[i].TargetState

    ' Increment the Length, we have handled the character. 
    Length = Length + 1
  
  wend

  ' if the DFA is a terminal then return the Symbol, and Length characters from the input. 
  if Grammar.DfaArray[DfaIndex].AcceptSymbol >= 0 then
    *Symbol = Grammar.DfaArray[DfaIndex].AcceptSymbol
    return ReadString( InputBuf,InputSize,InputHere,nline,nColumn,Length )
  end if

  ' If we found a shorter terminal before, then return that Symbol, and it's characters.
  if AcceptIndex >= 0 then
    *Symbol = Grammar.DfaArray[AcceptIndex].AcceptSymbol
    return ReadString( InputBuf,InputSize,InputHere,nline,nColumn,AcceptLength )
  end if

  ' Return SYMBOLERROR and a string with 1 character from the input. 
  *Symbol = 1
  return ReadString( InputBuf,InputSize,InputHere,nline,nColumn,1 )

end function




'********* LALR Tokenizer ********************************************************


''  LALR state machine. Depending on the Token->Symbol the machine will
''  change it's state and perform actions, such as reduce the TokenStack and
''  iteratively call itself. 

#define LALRMEMORYERROR   0
#define LALRSYNTAXERROR   1
#define LALRACCEPT        2
#define LALRSHIFT         3
#define LALRGOTO          4

function ParseToken( _
    LalrState      as integer ptr, _
    TokenStack     as TokenStackStruct ptr ptr, _
    InputToken     as TokenStackStruct ptr, _
    TrimReductions as integer, _
    Debug          as integer _
    ) as integer
    
  dim PopToken  as TokenStackStruct ptr
  dim Reduction as TokenStackStruct ptr
  dim Action    as integer
  dim Rule      as integer
  dim i         as integer

  ' Find the Token->Symbol in the LALR table.
  Action = 0
  while Action < Grammar.LalrArray[*LalrState].ActionCount
    if Grammar.LalrArray[*LalrState].Actions[Action].Entry = InputToken->Token->Symbol then
      exit while
    end if
    Action += 1
  wend

  ' If not found then exit with SYNTAXERROR. The Token is not allowed in this context.
  if Action >= Grammar.LalrArray[*LalrState].ActionCount then
    if Debug > 0 then
      print "LALR Syntax error: symbol "; InputToken->Token->Symbol; " not found in LALR table "; *LalrState
    end if
    return LALRSYNTAXERROR
  end if

  ' ACTIONACCEPT: exit. We're finished parsing the input. 
  if Grammar.LalrArray[*LalrState].Actions[Action].Action = ACTIONACCEPT then
    if Debug > 0 then
      print "LALR Accept: Target="; Grammar.LalrArray[*LalrState].Actions[Action].Target
    end if
    return LALRACCEPT
  end if

  ' ACTIONSHIFT: switch the LALR state and return. We're ready to accept the next token. 
  if Grammar.LalrArray[*LalrState].Actions[Action].Action = ACTIONSHIFT then
    *LalrState = Grammar.LalrArray[*LalrState].Actions[Action].Target
    if Debug > 0 then
      print "LALR Shift: Lalr="; *LalrState
    end if
    return LALRSHIFT
  end if  

  ' ACTIONGOTO: switch the LALR state and return. We're ready to accept the next token.
  '   Note: In my implementation SHIFT and GOTO do the exact same thing. As far
  '   as I can tell GOTO only happens just after a reduction. Perhaps GOLD makes
  '   the difference to allow the program to perform special actions, which my
  '   implementation does not need. 
  if Grammar.LalrArray[*LalrState].Actions[Action].Action = ACTIONGOTO then
    *LalrState = Grammar.LalrArray[*LalrState].Actions[Action].Target
    if Debug > 0 then
      print "LALR Goto: Lalr="; *LalrState
    end if
    return LALRGOTO
  end if

  ' ACTIONREDUCE:
  '   Create a new Reduction according to the Rule that is specified by the action.
  '   - Create a new Reduction in the ReductionArray.
  '   - Pop tokens from the TokenStack and add them to the Reduction.
  '   - Push a new token on the TokenStack for the Reduction.
  '   - Iterate.
  ' 
  Rule = Grammar.LalrArray[*LalrState].Actions[Action].Target
  if Debug > 0 then
    print "LALR Reduce: Lalr="; *LalrState; " TargetRule="; _
           wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Head].wname); "["; _
           str(Grammar.RuleArray[Rule].Head); "] ==> "; wPrint(Grammar.RuleArray[Rule].Description)
  end if

  ' if TrimReductions is active, and the Rule contains a single non-terminal,
  '   then eleminate the unneeded reduction by modifying the Rule on the stack
  '   into this Rule.
  '
  if ((TrimReductions <> 0) and _
      (Grammar.RuleArray[Rule].SymbolsCount = 1) and _
      (Grammar.SymbolArray[Grammar.RuleArray[Rule].Symbols[0]].Kind = SYMBOLNONTERMINAL)) then
    if Debug > 0 then
      print "LALR TrimReduction." 
    end if

    ' Pop the Rule from the TokenStack. 
    PopToken = *TokenStack
    *TokenStack = PopToken->NextToken

    ' Rewind the LALR state. 
    *LalrState = PopToken->LalrState

    ' Change the Token into the Rule. 
    PopToken->Token->Symbol = Grammar.RuleArray[Rule].Head

    ' Feed the Token to the LALR state machine. 
    ParseToken LalrState,TokenStack,PopToken,TrimReductions,Debug

    ' Push the modified Token back onto the TokenStack. 
    PopToken->NextToken = *TokenStack
    *TokenStack = PopToken

    ' Save the new LALR state in the input token.
    InputToken->LalrState = *LalrState

    ' Feed the input Token to the LALR state machine and exit. 
    return ParseToken(LalrState,TokenStack,InputToken,TrimReductions,Debug)
  
  end if

  ' Allocate and initialize memory for the Reduction. 
  Reduction = callocate(sizeof(TokenStackStruct))
  if Reduction = null then return LALRMEMORYERROR
  Reduction->Token = callocate(sizeof(TokenStruct))
  if Reduction->Token = null then
    deallocate Reduction
    return LALRMEMORYERROR
  end if
  Reduction->Token->ReductionRule = Rule
  Reduction->Token->Tokens = callocate(sizeof(TokenStruct) * Grammar.RuleArray[Rule].SymbolsCount)
  if Reduction->Token->Tokens = null then
    deallocate Reduction->Token
    deallocate Reduction
    return LALRMEMORYERROR
  end if
  Reduction->Token->Symbol = Grammar.RuleArray[Rule].Head
  Reduction->Token->ndata = null
  Reduction->Token->nline = InputToken->Token->nline
  Reduction->Token->Column = InputToken->Token->Column
  Reduction->LalrState = *LalrState
  Reduction->NextToken = null

  ' Reduce tokens from the TokenStack by moving them to the Reduction.
  '   The Lalr state will be rewound to the state it was for the first
  '   symbol of the rule. 
  for i = Grammar.RuleArray[Rule].SymbolsCount to 1 step -1
    PopToken = *TokenStack
    *TokenStack = PopToken->NextToken
    PopToken->NextToken = null
    if Debug > 0 then
      if PopToken->Token->ndata <> null then
          print "  + Symbol="; wPrint(Grammar.SymbolArray[PopToken->Token->Symbol].wname); _
                "["; str(PopToken->Token->Symbol); "] RuleSymbol="; wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Symbols[i-1]].wname); _
                "["; str(Grammar.RuleArray[Rule].Symbols[i-1]); "] value='"; wPrint(PopToken->Token->ndata); "' Lalr="; PopToken->LalrState
      else
          print "  + Symbol="; wPrint(Grammar.SymbolArray[PopToken->Token->Symbol].wname); _
                "["; str(PopToken->Token->Symbol); "] RuleSymbol="; wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Symbols[i-1]].wname); _
                "["; str(Grammar.RuleArray[Rule].Symbols[i-1]); "] Lalr="; PopToken->LalrState
                  
      end if
    end if
    Reduction->Token->Tokens[i - 1] = PopToken->Token
    *LalrState = PopToken->LalrState
    Reduction->LalrState     = PopToken->LalrState
    Reduction->Token->nline  = PopToken->Token->nline
    Reduction->Token->Column = PopToken->Token->Column
    deallocate PopToken
  next

  ' Call the LALR state machine with the Symbol of the Rule. 
  if Debug > 0 then
    print "Calling Lalr 1: Lalr="; *LalrState; " Symbol="; _
            wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Head].wname); _
            "["; str(Grammar.RuleArray[Rule].Head); "]"
            
  end if
  ParseToken LalrState,TokenStack,Reduction,TrimReductions,Debug

  ' Push new Token on the TokenStack for the Reduction. 
  Reduction->NextToken = *TokenStack
  *TokenStack = Reduction

  ' Save the current LALR state in the InputToken. We need this to be
  ' able to rewind the state when reducing. 
  InputToken->LalrState = *LalrState

  ' call the LALR state machine with the InputToken. The state has
  ' changed because of the reduction, so we must accept the token again. 
  if Debug > 0 then
    print "Calling Lalr 2: Lalr="; *LalrState; " Symbol="; _
          wPrint(Grammar.SymbolArray[InputToken->Token->Symbol].wname); _
          "["; str(InputToken->Token->Symbol); "]"
  end if
  return ParseToken(LalrState,TokenStack,InputToken,TrimReductions,Debug)

end function




'********* Exported functions ****************************************************


' Delete a Token from memory, including all the reductions and data. 
sub DeleteTokens( Token as TokenStruct ptr ) 
  dim i as integer

  if Token = null then return
  if Token->ndata <> null then deallocate Token->ndata
  if Token->ReductionRule >= 0 then
    for i = 0 to Grammar.RuleArray[Token->ReductionRule].SymbolsCount - 1
      DeleteTokens Token->Tokens[i]
    next
    deallocate Token->Tokens
  end if

end sub


sub ParseCleanup( _
      pTop as TokenStackStruct ptr, _
      pNew as TokenStackStruct ptr, _
      pFirstToken as TokenStruct ptr ptr ) 

  dim pOldTop as TokenStackStruct ptr

  *pFirstToken = null
  if pTop <> null then
    *pFirstToken = pTop->Token
    pOldTop = pTop
    pTop = pTop->NextToken
    deallocate pOldTop
  end if
  if pNew <> null then
    DeleteTokens *pFirstToken
    *pFirstToken = pNew->Token
    deallocate pNew
  end if
  while pTop <> null
    DeleteTokens pTop->Token
    pOldTop = pTop
    pTop = pTop->NextToken
    deallocate pOldTop
  wend

end sub



'
' Parse the input data.
'   Returns a pointer to a ParserData struct, NULL if insufficient memory.
'   The Data->Result value will be one of these values:
'        PARSEACCEPT             Input parsed, no errors.
'        PARSELEXICALERROR       Input could not be tokenized.
'        PARSETOKENERROR         Input is an invalid token.
'        PARSESYNTAXERROR        Input does not match any rule.
'        PARSECOMMENTERROR       A comment was started but not finished.
'        PARSEMEMORYERROR        Insufficient memory.
'   
'
function Parse( _
    InputBuf       as wchar_t ptr, _          ' pointer to the input data. 
    InputSize      as long, _                 ' Number of characters in the input. 
    TrimReductions as integer, _              ' 0 = don't trim, 1 = trim reductions. 
    Debug          as integer, _              ' 0 = no debug, 1 = print debug info. 
    FirstToken     as TokenStruct ptr ptr _   ' output token. 
    ) as integer
    
  dim LalrState    as integer                 ' Index into Grammar.LalrArray[]. 
  dim TokenStack   as TokenStackStruct ptr    ' stack of Tokens. 
  dim Work         as TokenStackStruct ptr    ' Current token. 
  dim InputHere    as long                    ' Index into input. 
  dim nline        as long                    ' line number. 
  dim Column       as long                    ' Column number. 
  dim CommentLevel as integer                 ' Used when skipping comments, nested comment count. 
  dim Result       as integer                 ' Result from ParseToken(). 


  ' Initialize variables. 
  LalrState    = Grammar.InitialLalrState
  TokenStack   = null
  InputHere    = 0
  nline        = 1
  Column       = 1
  CommentLevel = 0
  *FirstToken  = null

  ' Sanity check. 
  if ((InputBuf = null) or (*InputBuf = 0) or (InputSize = 0)) then
    return PARSEACCEPT
  end if

  ' Accept tokens until finished. 
  while (1) 

    ' Create a new Token. Exit if out of memory. 
    Work = callocate(sizeof(TokenStackStruct))
    if Work = null then
      ParseCleanup TokenStack,null,FirstToken
      return PARSEMEMORYERROR
    end if
    
    Work->LalrState = LalrState
    Work->NextToken = null
    Work->Token     = callocate(sizeof(TokenStruct))
    
    if Work->Token = null then
      deallocate Work
      ParseCleanup TokenStack,null,FirstToken
      return PARSEMEMORYERROR
    end if
    
    Work->Token->ReductionRule = -1
    Work->Token->Tokens        = null
    Work->Token->nline         = nline
    Work->Token->Column        = Column

    ' Call the DFA tokenizer and parse a token from the input. 
    Work->Token->ndata = RetrieveToken( InputBuf,InputSize,@InputHere,@nline, _
                                       @Column,@Work->Token->Symbol )

    if (Work->Token->ndata = null) and (Work->Token->Symbol <> 0) then
      ParseCleanup TokenStack,Work,FirstToken
      return PARSEMEMORYERROR
    end if

    ' If we are inside a comment then ignore everything except the end
    ' of the comment, or the start of a nested comment. 
    if CommentLevel > 0 then
      ' Begin of nested comment: 
      if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTSTART then
        ' Push the Token on the TokenStack to keep track of line+column. 
        Work->NextToken = TokenStack
        TokenStack = Work

        CommentLevel = CommentLevel + 1
        continue while
      end if

      ' End of comment: 
      if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTEND then
        ' Delete the Token. 
        if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
        deallocate Work->Token
        deallocate Work

        ' Pop the comment-start Token from the TokenStack and delete that as well. 
        Work = TokenStack
        TokenStack = Work->NextToken
        if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
        deallocate Work->Token
        deallocate Work

        CommentLevel = CommentLevel - 1
        continue while
      end if

      ' End of file: Error exit. A comment was started but not finished. 
      if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLEOF then
        if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
        deallocate Work->Token
        deallocate Work
        ParseCleanup TokenStack,null,FirstToken
        return PARSECOMMENTERROR
      end if

      ' Any other Token: delete and loop. 
      if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
      deallocate Work->Token
      deallocate Work

      continue while
    end if

    ' If the token is the start of a comment then increment the
    ' CommentLevel and loop. The routine will keep reading tokens
    ' until the end of the comment. 
    if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTSTART then
      if Debug > 0 then print "Parse: skipping comment." 

      ' Push the Token on the TokenStack to keep track of line+column. 
      Work->NextToken = TokenStack
      TokenStack = Work

      CommentLevel = CommentLevel + 1
      continue while
    end if

    ' If the token is the start of a linecomment then skip the rest of the line. 
    if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTLINE then
      if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
      deallocate Work->Token
      deallocate Work
      while (InputHere < InputSize) and _
             (InputBuf[InputHere] <> CR) and _
             (InputBuf[InputHere] <> LF) 
        InputHere = InputHere + 1
      wend
      if ((InputHere < InputSize) and _
          (InputBuf[InputHere] = CR)) then
        InputHere = InputHere + 1
      end if
      if ((InputHere < InputSize) and _
          (InputBuf[InputHere] = LF)) then
        InputHere = InputHere + 1
      end if
      nline = nline + 1
      Column = 1
      continue while
    end if

    ' If parse error then exit. 
    if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLERROR then
      ParseCleanup TokenStack,Work,FirstToken
      return PARSELEXICALERROR
    end if

    ' Ignore whitespace. 
    if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLWHITESPACE then
      if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
      deallocate Work->Token
      deallocate Work
      continue while
    end if

    ' The tokenizer should never return a non-terminal symbol. 
    if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLNONTERMINAL then
      if Debug > 0 then
        print "Error: tokenizer returned SYMBOLNONTERMINAL '"; wPrint(Work->Token->ndata); "'."
      end if
      ParseCleanup TokenStack,Work,FirstToken
      return PARSETOKENERROR
    end if

    if Debug > 0 then
      print "Token Read: Lalr="; LalrState; " Symbol="; _
            wPrint(Grammar.SymbolArray[Work->Token->Symbol].wname); _
            "["; str(Work->Token->Symbol); "] value='"; wPrint(Work->Token->ndata); "'"
    end if

    ' Feed the Symbol to the LALR state machine. It can do several
    ' things, such as wind back and iteratively call itself. 
    Result = ParseToken(@LalrState,@TokenStack,Work,TrimReductions,Debug)

    ' If out of memory then exit. 
    if Result = LALRMEMORYERROR then
      ParseCleanup TokenStack,Work,FirstToken
      return PARSEMEMORYERROR
    end if

    ' If syntax error then exit. 
    if Result = LALRSYNTAXERROR then
      ' Return LALR state in the Token->Symbol. 
      Work->Token->Symbol = LalrState
      ParseCleanup TokenStack,Work,FirstToken
      return PARSESYNTAXERROR
    end if

    ' Exit if the LALR state machine says it has reached it's exit. 
    if Result = LALRACCEPT then
      if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLEOF then
        if Work->Token->ndata <> null then free Work->Token->ndata
        free Work->Token
        free Work
      end if
      ParseCleanup TokenStack,null,FirstToken
      return PARSEACCEPT
    end if

    ' Push the token onto the TokenStack. 
    Work->NextToken = TokenStack
    TokenStack = Work
  
  wend

  ' Should never get here. 
end function

"engine.bi" include file:

Code: Select all

''  
''  McLovin FreeBASIC engine header include (engine.bi)
''
''  Based on translation of C code from:
''    Kessels engine for GOLD.
''    Jeroen C. Kessels
''    Internet Engineer
''    http://www.kessels.com/
''  
''    Copyright:   Jeroen C. Kessels
''    Date:        22 march 2006
''    Version:     1.8
''  
''  



''  Return values of the Parse() function.
#define PARSEACCEPT             0   ' input parsed, no errors. 
#define PARSELEXICALERROR       1   ' input could not be tokenized. 
#define PARSETOKENERROR         2   ' input is an invalid token. 
#define PARSESYNTAXERROR        3   ' input does not match any rule. 
#define PARSECOMMENTERROR       4   ' a comment was started but not finished. 
#define PARSEMEMORYERROR        5   ' insufficient memory. 


''  SymbolStruct types (defined by GOLD). 
#define SYMBOLNONTERMINAL       0
#define SYMBOLTERMINAL          1
#define SYMBOLWHITESPACE        2
#define SYMBOLEOF               3
#define SYMBOLCOMMENTSTART      4
#define SYMBOLCOMMENTEND        5
#define SYMBOLCOMMENTLINE       6
#define SYMBOLERROR             7


''  ActionStruct types (defined by GOLD). 
#define ACTIONSHIFT             1
#define ACTIONREDUCE            2
#define ACTIONGOTO              3
#define ACTIONACCEPT            4


''  CaseSensitive values (defined by GOLD). 
#define False 0
#define True  1

#include once "\crt\wchar.bi"            ' wchar_t 


''  Grammar table and sub-tables. 
type SymbolStruct                ' Grammar.SymbolArray[]  
  Kind  as short                      ' 0...7, See SYMBOL defines.  
  wname as wchar_t ptr                ' String with name of symbol.  
end type

type DfaEdgeStruct               ' Grammar.DfaArray[].Edges[]  
  TargetState  as integer             ' Index into Grammar.DfaArray[].  
  CharCount    as integer             ' Number of characters in the charset.  
  CharacterSet as wchar_t ptr         ' String with characters.  
end type

type DfaStateStruct              ' Grammar.DfaArray[]  
  AcceptSymbol as integer             ' -1 (Terminal), or index into Grammar.SymbolArray[].  
  EdgeCount    as integer             ' Number of items in Edges[] array.  
  Edges        as DfaEdgeStruct ptr   ' Array of DfaEdgeStruct.  
end type

type RuleStruct                  ' Grammar.RuleArray[]  
  Head         as integer             ' Index into Grammar.SymbolArray[].  
  SymbolsCount as integer             ' Number of items in Symbols[] array.  
  Symbols      as integer ptr         ' Array of indexes into Grammar.SymbolArray[].  
  Description  as wchar_t ptr         ' String with BNF of the rule.  
end type

type ActionStruct                ' Grammar.LalrArray[].Actions[]  
  Entry as integer                    ' Index into Grammar.SymbolArray[].  
  Action as short                     ' 1...4, see ACTION defines.  
  Target as integer                   ' If Action=SHIFT then index into Grammar.LalrArray[].  
                                      ' If Action=REDUCE then index into Grammar.RuleArray[].  
                                      ' If Action=GOTO then index into Grammar.LalrArray[].  
end type

type LalrStateStruct             ' Grammar.LalrArray[]  
  ActionCount as integer             ' Number of items in Actions[] array.  
  Actions     as ActionStruct ptr    ' Array of ActionStruct.  
end type

type GrammarStruct               ' Grammar  
  CaseSensitive    as byte                 ' 'True' or 'False'.  
  InitialSymbol    as integer              ' Index into Grammar.SymbolArray[].  
  InitialDfaState  as integer              ' Index into Grammar.DfaArray[].  
  InitialLalrState as integer              ' Index into Grammar.LalrArray[].  
  SymbolCount      as integer              ' Number of items in Grammar.SymbolArray[].  
  SymbolArray      as SymbolStruct ptr
  RuleCount        as integer              ' Number of items in Grammar.RuleArray[].  
  RuleArray        as RuleStruct ptr
  DfaStateCount    as integer              ' Number of items in Grammar.DfaArray[].  
  DfaArray         as DfaStateStruct ptr
  LalrStateCount   as integer             ' Number of items in Grammar.LalrArray[].  
  LalrArray        as LalrStateStruct ptr
end type 
extern Grammar as GrammarStruct

'' Output from the parser.  
type TokenStruct 
  ReductionRule as integer                 ' Index into Grammar.RuleArray[].  
  Tokens        as TokenStruct ptr ptr     ' Array of reduction Tokens.  
  Symbol        as integer                 ' Index into Grammar.SymbolArray[].  
  ndata         as wchar_t ptr             ' String with data from the input.  
  nline         as long                    ' Line number in the input.  
  Column        as long                    ' Column in the input.  
end type


'' Exported functions.  
declare function Parse( _
  InputBuf       as wchar_t ptr, _         ' Pointer to the input data.  
  InputSize      as long, _                ' Number of characters in the input.  
  TrimReductions as integer, _             ' 0 = don't trim, 1 = trim reductions.  
  Debug          as integer, _             ' 0 = no debug, 1 = print debug info.  
  Token          as TokenStruct ptr ptr _  ' Output, the first Token.  
  ) as integer
  
declare sub DeleteTokens( Token as TokenStruct ptr )

declare function RetrieveToken( _
  InputBuf       as wchar_t ptr, _         ' The input data.  
  InputSize      as long, _                ' Size of the input data.  
  InputHere      as long ptr, _            ' Index into input data.  
  nine           as long ptr, _            ' Current line number.  
  Column         as long ptr, _            ' Current column number.  
  Symbol         as integer ptr _
  ) as wchar_t ptr
  
declare function wPrint( byval pByte as ushort ptr ) as string

Here is a translation of the sample code that reads and displays the "simple" language test grammar.

Code: Select all

''  
''  
''  Example of how to run the McLovin FreeBASIC engine and show the results.
''  Output is similar to the parse tree in the GOLD test window.
''  (based on translation of Kessels C engine)
''  
''  

'
' Link this example with engine.o
' e.g.   fbc example1.bas engine.o
'
#include once "engine.bi"                 ' The McLovin FB engine. 
#Include Once "simple.bi"                ' Generated by GOLD. 


#define TRIMREDUCTIONS 1            ' 0=off, 1=on 
#define DEBUG          1            ' 0=off, 1=on 



' Load input file from disk into memory. 
function LoadInputFile( FileName as zstring ptr ) as wchar_t ptr
  
  dim Fin      as integer
  dim filesize as uinteger
  dim Buf1     as string 
  dim Buf2     as wstring ptr
  
  ' Sanity check. 
  if ((FileName = null) or (*FileName = "")) then return(null)

  ' Open the file. 
  Fin = freefile
  if open( *FileName for binary access read as #fin ) <> 0 then
    print "Could not open input file: " & *FileName
    return null
  end if


  ' Get the size of the file. 
  filesize = lof( Fin )

  ' Allocate memory for the input. 
  Buf1 = string(filesize, 0)
  Buf2 = callocate(sizeof(wchar_t) * (filesize + 1))

  ' Load the file into memory. 
  get #Fin, , Buf1

  ' Close the file. 
  close #Fin

  ' Convert from ASCII to Unicode. 
  *Buf2 = wStr(Buf1)

  return Buf2

end function



''  Make a readable copy of a string. All characters outside 32...128 are
''  displayed as a HEX number in square brackets, for example "[0A]". 
sub ReadableString( byval wInChars as wchar_t ptr, wOutChars as zstring ptr, nWidth as long )
  
  dim i1 as long
  dim i2 as long
  
'  *wOutChars = wPrint(wInChars)
'  return
  
  ' Sanity check. 
  if ((wOutChars = null) or (nWidth < 1)) then return
  wOutChars[0] = 0
  if (wInChars = null) then return
  
  i1 = 0
  i2 = 0
  while wInChars[i1] <> 0
    if ((wInChars[i1] >= 32) and (wInChars[i1] <= 127)) then
        if nWidth > 1 then 
           wOutChars[i2] = wInChars[i1]
           i2 = i2 + 1
        end if   
    else
        if nWidth > 4 then
           woutchars[i2] = asc("[")
           woutchars[i2] = asc(hex(*wInChars,2),1)  
           woutchars[i2] = asc(hex(*wInChars,2),2)  
           woutchars[i2] = asc("]")
           i2 = i2 + 1
        end if
    end if
    i1 = i1 + 1
  wend
  
  woutchars[i2] = 0
'  print
  
end sub




sub ShowErrorMessage( Token as TokenStruct ptr, Result as integer ) 
  dim Symbol as integer
  dim i  as integer
  dim s1 as zstring * BUFSIZ

  select case Result
    case PARSELEXICALERROR
      print "Lexical error";
    case PARSECOMMENTERROR
      print "Comment error"; 
    case PARSETOKENERROR
      print "Tokenizer error"; 
    case PARSESYNTAXERROR
      print "Syntax error"; 
    case PARSEMEMORYERROR
      print "Out of memory"; 
  end select
  
  if Token <> null then print " at line "; str(Token->nLine); " column "; str(Token->Column)

  select case Result 
    
    case PARSELEXICALERROR 
       if Token->nData <> null then
           'ReadableString Token->nData, @s1, BUFSIZ
           print "The grammar does not specify what to do with '"; wPrint(Token->nData); "'"
           'print "The grammar does not specify what to do with '"; s1; "'"
       else
           print "The grammar does not specify what to do." 
       end if

    case PARSETOKENERROR 
       print "The tokenizer returned a non-terminal."

    case PARSECOMMENTERROR 
       print "The comment has no end, it was started but not finished." 

    case PARSESYNTAXERROR 
       if Token->nData <> null then
           'ReadableString Token->nData,@s1,BUFSIZ
           print "Encountered '"; wPrint(Token->nData); "', but expected ";
           'print "Encountered '"; s1; "', but expected ";
       else
           print "Expected ";
       end if
       for i = 0 to Grammar.LalrArray[Token->Symbol].ActionCount - 1
         Symbol = Grammar.LalrArray[Token->Symbol].Actions[i].Entry
         if Grammar.SymbolArray[Symbol].Kind = SYMBOLTERMINAL then
           if i > 0 then
             print ", ";
             if i >= Grammar.LalrArray[Token->Symbol].ActionCount - 2 then print "or ";
           end if
           print "'"; wPrint(Grammar.SymbolArray[Symbol].wName); "'"
         end if
       next
       
       print "." 
  
  end select

end sub




' Display all the Tokens, including reductions. 
sub ShowTokens( Token as TokenStruct ptr, Indent as integer ) 
  dim s1 as zstring * BUFSIZ
  dim i  as integer

  ' Sanity check.
  if Token = null then return

  ' Show line and column numbers. 
  print str(Token->nLine); " "; str(Token->Column); " ";
  
  ' Indent. 
  for i = 0 to Indent - 1
     print "  ";
  next   

  ' Show the Token. 
  if Token->ReductionRule < 0 then
      ' It's a symbol. 
      if Token->nData = null then
          print wPrint(Grammar.SymbolArray[Token->Symbol].wName)
      else  
          'ReadableString Token->nData,@s1,BUFSIZ
          print wPrint(Grammar.SymbolArray[Token->Symbol].wName); " = '"; wPrint(Token->nData); "'"
      end if    
          
  else
      ' It's a reduction. 
      print wPrint(Grammar.RuleArray[Token->ReductionRule].Description)
      for i = 0 to Grammar.RuleArray[Token->ReductionRule].SymbolsCount - 1
        ShowTokens Token->Tokens[i],Indent+1
      next
  end if
  
end sub
  



  dim InputBuf as wchar_t ptr
  dim Token    as TokenStruct ptr
  dim Result   as integer

  ' Load the inputfile into memory. 
  InputBuf = LoadInputFile("Example.input")
  'InputBuf = LoadInputFile("vbnet_sample.input")
  if InputBuf = null then end(1)

  ' run the Parser. 
  Result = Parse(InputBuf,wcslen(InputBuf),TRIMREDUCTIONS,DEBUG,@Token)

  ' Interpret the results. 
  if Result <> PARSEACCEPT then
      ShowErrorMessage Token,Result
  else
      ShowTokens Token,0
  end if

  ' Cleanup. 
  DeleteTokens Token
  free InputBuf

Hope this stuff is of use to someone.

:-)

AGS · Post by **AGS** » Dec 22, 2008 7:39

Great work on the Gold Parser engine, McLovin! Writing a complete grammer for a language like FB might not be something easily done with the Gold Parser but it is a nice tool to use for writing little languages/all sorts of parsing tasks.

Have a look at JS/CC

http://jscc.jmksf.com/

if you want to experiment some more with parsing.

It has some features that might help in avoiding those nasty reduce - reduce and shift - reduce conflicts. It is also (like the Gold Parser) a visual compiler compiler. JS/CC (JS/CC stands for JavaScript Compiler Compiler), uses the webbrowser to visualize things and uses a syntax more geared towards Yacc/Bison.

Perhaps you could get the site admin to post your gold engine in the library repository. I think it might be something more people would like to have because using a visual compiler compiler like the Gold Parser is a perfect way to get into the whole compiler writing/parsing 'thing'.

AGS · Post by **AGS** » Dec 22, 2008 7:52

The grammer I posted earlier contains some garbage. Some rules are defined twice, other rules are not rules. That's what you get when using a single regular expression to extract info from a file.

There is more in the source files about the grammer than what has been post ed. I will be back to update the grammer.

TheMG · Post by **TheMG** » Dec 22, 2008 14:41

Not just that, but the syntax used to define them varies throughout, replacing x? with [x], x* with {x}, etc. I am attempting to clean it up.

McLovin · Post by **McLovin** » Dec 22, 2008 15:55

Thanks AGS, I worked on an FB grammar for a week or two using the VB.Net, VBScript and C grammars as guides. I had never written a grammar in GOLD before. I got a fair amount of stuff defined and working until I got lost in the "expressions" - that's where the reduce/reduce errors occurred. It was a very nice learning exercise and I will take up the challenge again after Christmas. It is certainly not a one-to-one translation of the current FB syntax but it is amazing just how easy it is to take various FB syntaxes and make them into a GOLD grammar (e.g. the various way to declare a variable).

I started jamming the C expressions definitions into my grammar and totally screwed it up. Here is the current state of the grammar. If anyone wants to use it then certainly go ahead and make changes. As I learn more about defining a language then I will fix it up. It is a little bit of a hack here and hack there at the moment because I was learning as I went along.

Code: Select all

!================================================
!
!     FreeBASIC 2 (GOLD Edition) Grammar
!================================================

"Name"            = 'FreeBASIC Version 2 (GOLD Edition)'
"Author"          = 'McLovin: Fogell'
"Version"         = '2.00'
"About"           = 'A specification for the FreeBASIC language'
"Case Sensitive"  = False
"Start Symbol"    = <Program>



{Hex Digit}      = {Digit} + [abcdefABCDEF]
{Oct Digit}      = [01234567]

{String Ch}      = {Printable} - ["]
{Char Ch}        = {Printable} - ['']
{WS}             = {Whitespace} - {CR} - {LF}

DecLiteral       = [123456789]{digit}*
HexLiteral       = '&H' {Hex Digit}+ '&'?
OctLiteral       = '&' {Oct Digit}+ '&'?
FloatLiteral     = {digit}* '.' {digit}+ ( 'E' [+-]? {Digit}+ )? [FR]?
                 | {digit}+ 'E' [+-]? {Digit}+  [FR]?

StringLiteral    = '"'( {String Ch} | '\'{Printable} )* '"'
CharLiteral      = '' ( {Char Ch} | '\'{Printable} )''

Id              = [_]?{Letter}{ID Tail}*    
{Id Tail}       = {Alphanumeric} + [_]

! ----------------------------------------------------------------- Terminals

NewLine        = {CR} {LF}
               | {CR}
               | {LF}
!               | ':'


! Special white space definition. Whitespace is either space or tab, which
! can be followed by continuation symbol '_' followed by new line character
Whitespace     = {WS}+
               | '_' {WS}* {CR}? {LF}?


! ===================================================================
! Comments
! ===================================================================

! Special comment definitions
Comment Line   = '' | REM
Comment Start  = '/'''
Comment End    = '''/'
 

<NL>          ::= NewLine <NL>
                | NewLine

<NLopt>       ::= <NL>
                |

<Program>     ::= <NLopt> <GlobalStmtList>
              
<GlobalStmt>     ::= <Declare>
                   | <NameSpace>
                   | <ConstDecl>
                   | <SubFunction>
                   | <Structure>
                   | <Enumeration>
                   | <BlockStmt>

<GlobalStmtList> ::= <GlobalStmt> <GlobalStmtList>
                   |

<MethodStmt>     ::= <ConstDecl>
                   | <BlockStmt>

<MethodStmtList> ::= <MethodStmt> <MethodStmtList>
                   |

<BlockStmtList>  ::= <BlockStmt> <BlockStmtList>
                   |

<BlockStmt>     ::= <VarDecl>
!                  | <WithStmt>
!                  | <RedimStmt>
!                  | <LoopStmt>
!                  | <ForStmt>
!                  | <IfStmt>                 
!                  | <SelectStmt> 
                  | <InlineStmt> <NL>

<InlineStmt>    ::= <Stm List>
!                  | <MethodCall>
                  

! -------------------------------------------------------------------
! Variable Declaration
! -------------------------------------------------------------------

<IntLiteral>           ::= IntLiteral
                         | HexLiteral
                         | OctLiteral

<VarDimSharedOpt>      ::= 'Shared' 
                         |                    

<VarDim>               ::= 'Dim'
                         | 'Static'

<ArrayBounds>          ::= <Expr> ',' <ArrayBounds> 
                         | <Expr> 'To' <Expr> ',' <ArrayBounds>
                         | <Expr>
                         | <Expr> 'To' <Expr> 

<VarName>              ::= ID '(' <ArrayBounds> ')'    ! handle arrays
                         | ID

<VarNames> ::= <VarName> <VarInit> ',' <VarNames>
             | <VarName> <DataType> <VarInit> ',' <VarNames>
             | <VarName> <VarInit>
             | <VarName> <DataType> <VarInit>


! TODO: Need to modify in order to handle initializing UDT's with embedded arrays and arrays of UDT's with embedded arrays
! { (100,{1,2,3,4,5}) }

<VarInitValueOpt>      ::= ',' <Value> <VarInitValueOpt>
                         |                 

<VarInitValue>         ::= <Value>
                         | 'Any'
                         | '{' <Value> <VarInitValueOpt> '}'
   
<VarInit>              ::= '=' <VarInitValue>
                         | '=>' <VarInitValue>
                         |                 
 
<VarDecl> ::= <VarDim> <VarDimSharedOpt> <DataType> <VarNames> <NL>   ! Dim Shared As Integer a=1, b=2, c=3
            | <VarDim> <VarDimSharedOpt> <VarNames> <NL>              ! Dim Shared a As Integer=1, b As String="p", c As Byte=3



! -------------------------------------------------------------------
! CONST Declaration
! -------------------------------------------------------------------

<ConstDecl>            ::= 'Const' <ConstList> <NL>

<ConstList>            ::= ID '=' <Value> ',' <ConstList>
                         | ID '=' <Value>



! -------------------------------------------------------------------
! Parameters
! -------------------------------------------------------------------

<ParamListOpt>  ::= <ParamList>
                  |

<ParamList>     ::= '(' <ParamItems> ')'
                  | '(' ')'
 
<ParamItems>    ::= <ParamItem> ',' <ParamItems>
                  | <ParamItem>

<ParamStatic>   ::= 'Static'
                  |              

<ParamItem>     ::= <ParamPassing> ID <DataType> 


<ParamPassing>  ::= 'ByVal'
                  | 'ByRef'
                  | 'Optional' 
                  |


! -------------------------------------------------------------------
! Declares (External Procedures)   
! -------------------------------------------------------------------

<Declare>       ::= 'Declare' 'Sub'      ID <DeclareLib> <DeclareAlias> <ParamListOpt> <NL>
                  | 'Declare' 'Function' ID <DeclareLib> <DeclareAlias> <ParamListOpt> <DataType> <NL>

<DeclareLib>    ::= 'Lib' StringLiteral
                  |                         ! allow to be optional
<DeclareAlias>  ::= 'Alias' StringLiteral
                  |                         ! allow to be optional



! -------------------------------------------------------------------
! Sub/Function Methods
! -------------------------------------------------------------------

<AccessOpt>             ::= 'Public'
                          | 'Private'
                          |               

<SubFunction>           ::= <AccessOpt> 'Sub' ID <ParamListOpt> <ParamStatic> <NL> <MethodStmtList> 'End' 'Sub' <NL>
                          | <AccessOpt> 'Function' ID <ParamListOpt> <DataType> <ParamStatic> <NL> <MethodStmtList> 'End' 'Function' <NL>
                

! ------------------------------------------------------------------- 
! Enumerations
! ------------------------------------------------------------------- 

<Enumeration>   ::= 'Enum' ID <NL> <EnumList> 'End' 'Enum' <NL>
                  | 'Enum' ID 'Explicit' <NL> <EnumList> 'End' 'Enum' <NL>

<EnumList>      ::= <EnumItem> <EnumList> 
                  | 
                  
<EnumItem>      ::= ID '=' <IntLiteral> <EnumItemOpt> <NL>
                  | ID <EnumItemOpt> <NL>

<EnumItemOpt>   ::= ',' ID '=' <IntLiteral> <EnumItemOpt>
                  | ',' ID <EnumItemOpt>
                  |                  


! -------------------------------------------------------------------
! Type Structures
! -------------------------------------------------------------------

<StructurePadding>  ::= 'Field' '=' <IntLiteral>
                      |                           ! allow to be optional

<Structure>         ::= 'Type' ID <StructurePadding> <NL> <StructureList> 'End' 'Type' <NL>

<StructureList>     ::= <StructureItem> <StructureList>
                      | 

<StructureItem>     ::= <VarNames> <NL>
                      | <DataType> <VarNames> <NL>
                      | <Enumeration>


! -------------------------------------------------------------------
! NameSpace Declaration
! -------------------------------------------------------------------

<NameSpaceAlias>  ::= 'Alias' StringLiteral
                    |

<NameSpace>       ::= 'NameSpace' ID <NameSpaceAlias> <NL> <NameSpaceItems> 'End' 'NameSpace' <NL>

<NameSpaceItems>  ::= <NameSpaceItem> <NameSpaceItems>                    
                    | 

<NameSpaceItem>   ::= <Declare>
                    | <NameSpace>
                    | <ConstDecl>
                    | <VarDecl>
                    | <SubFunction>
                    | <Structure>
                    | <Enumeration>


! ===================================================================
! Function  Declaration
!! ===================================================================
!
!<Func Proto> ::= <Func ID> '(' <Types>  ')' ';'
!               | <Func ID> '(' <Params> ')' ';'
!               | <Func ID> '(' ')' ';'
!
!<Func Decl>  ::= <Func ID> '(' <Params>  ')' <Block>
!               | <Func ID> '(' <Id List> ')' <Struct Def> <Block>
!               | <Func ID> '(' ')' <Block>
!
!
!<Params>     ::= <Param> ',' <Params>
!               | <Param>
!               
!<Param>      ::= const <Type> ID
!               |       <Type> ID
!               
!<Types>      ::= <Type>  ',' <Types>
!               | <Type> 
!   
!<Id List>    ::= Id ',' <Id List>
!               | Id
!
!<Func ID>    ::= <Type> ID
!               | ID

! ===================================================================
! Type Declaration
!! ===================================================================
!
!<Typedef Decl> ::= typedef <Type> ID ';'
!
!<Struct Decl>  ::= struct Id '{' <Struct Def> '}'  ';' 
!
!<Union Decl>   ::= union Id '{' <Struct Def> '}'  ';' 
!
!
!<Struct Def>   ::= <Var Decl> <Struct Def>
!                 | <Var Decl>

! ===================================================================
! Variable Declaration
! ===================================================================

<Var Decl>     ::= <Mod> <Type> <Var> <Var List>  ';'
                 |       <Type> <Var> <Var List>  ';'
                 | <Mod>        <Var> <Var List>  ';'
             
<Var>      ::= ID <Array>
             | ID <Array> '=' <Op If> 

<Array>    ::= '[' <Expr> ']'
             | '[' ']'
             |
             
<Var List> ::=  ',' <Var Item> <Var List>
             | 

<Var Item> ::= <Pointers> <Var>

             
<Mod>      ::= extern 
             | static
             | register
             | auto
             | volatile
             | const   

! ===================================================================
! Enumerations
! ===================================================================

<Enum Decl>    ::= enum Id '{' <Enum Def> '}'  ';'
 
<Enum Def>     ::= <Enum Val> ',' <Enum Def>
                 | <Enum Val>

<Enum Val>     ::= Id
                 | Id '=' OctLiteral
                 | Id '=' HexLiteral
                 | Id '=' DecLiteral  


! -------------------------------------------------------------------
! Data Types
! -------------------------------------------------------------------

<Type>     ::= <Base> <Pointers> 

<Base>     ::= <Sign> <Scalar>
!             | struct Id 
!             | struct '{' <Struct Def> '}' 
!             | union Id
!             | union '{' <Struct Def> '}' 
!             | enum Id  


<Sign>     ::= signed 
             | unsigned
             |

<Scalar>   ::= 'Byte'
            |  'uByte'
            |  'Short'
            |  'uShort'
            |  'Integer'
            |  'uInteger'
            |  'Long'
            |  'LongInt'
            |  'uLongInt'
            |  'Single'
            |  'Double'
            |  'String'
            |  'zString'
            |  'wString'
            

<Pointers> ::= '*' <Pointers>
             |

<DataFixedString>      ::= <Scalar> '*' <IntLiteral>

<DataTypePointer>      ::= <Scalar> 'Pointer'
                        |  <Scalar> 'Ptr'

<DataType> ::= 'As' <Scalar>
             | 'As' <DataFixedString>
             | 'As' <DataTypePointer>
             | 'As' ID


! ===================================================================
! Statements
! ===================================================================

<Stm>        ::= <Var Decl>
               | Id ':'                            !Label
               | if '(' <Expr> ')' <Stm>          
               | if '(' <Expr> ')' <Then Stm> else <Stm>         
               | while '(' <Expr> ')' <Stm> 
               | for '(' <Arg> ';' <Arg> ';' <Arg> ')' <Stm>
               | <Normal Stm>

<Then Stm>   ::= if '(' <Expr> ')' <Then Stm> else <Then Stm> 
               | while '(' <Expr> ')' <Then Stm> 
               | for '(' <Arg> ';' <Arg> ';' <Arg> ')' <Then Stm>
               | <Normal Stm>

<Normal Stm> ::= do <Stm> while '(' <Expr> ')'
 !              | switch '(' <Expr> ')' '{' <Case Stms> '}'
 !              | <Block>
 !              | <Expr> ';'               
 !              | goto Id ';'
 !              | break ';'
 !              | continue ';'
 !              | return <Expr> ';'
 !              | ';'              !Null statement


<Arg>       ::= <Expr> 
              | 

<Case Stms> ::= case <Value> ':' <Stm List> <Case Stms>
              | default ':' <Stm List>                  
              |

<Block>     ::= '{' <Stm List> '}' 

<Stm List>  ::=  <Stm> <Stm List> 
              | 


! ===================================================================
! Here begins the C's 15 levels of operator precedence.
! ===================================================================

!<Expr>       ::= <Expr> ',' <Op Assign>   
!               | <Op Assign>

<Expr>       ::= <Op Assign>

<Op Assign>  ::= <Op If> '='   <Op Assign>
               | <Op If> '+='  <Op Assign>
               | <Op If> '-='  <Op Assign>
               | <Op If> '*='  <Op Assign>
               | <Op If> '/='  <Op Assign>
               | <Op If> '^='  <Op Assign>
               | <Op If> '&='  <Op Assign>
               | <Op If> '|='  <Op Assign>
               | <Op If> '>>=' <Op Assign>
               | <Op If> '<<=' <Op Assign>
               | <Op If>

<Op If>      ::= <Op Or> '?' <Op If> ':' <Op If>
               | <Op Or>

<Op Or>      ::= <Op Or> '||' <Op And>
               | <Op And>

<Op And>     ::= <Op And> '&&' <Op BinOR>
               | <Op BinOR>

<Op BinOR>   ::= <Op BinOr> '|' <Op BinXOR>
               | <Op BinXOR>

<Op BinXOR>  ::= <Op BinXOR> '^' <Op BinAND>
               | <Op BinAND>

<Op BinAND>  ::= <Op BinAND> '&' <Op Equate>
               | <Op Equate>

<Op Equate>  ::= <Op Equate> '==' <Op Compare>
               | <Op Equate> '!=' <Op Compare>
               | <Op Compare>

<Op Compare> ::= <Op Compare> '<'  <Op Shift>
               | <Op Compare> '>'  <Op Shift>
               | <Op Compare> '<=' <Op Shift>
               | <Op Compare> '>=' <Op Shift>
               | <Op Shift>

<Op Shift>   ::= <Op Shift> '<<' <Op Add>
               | <Op Shift> '>>' <Op Add>
               | <Op Add>

<Op Add>     ::= <Op Add> '+' <Op Mult>
               | <Op Add> '-' <Op Mult>
               | <Op Mult>

<Op Mult>    ::= <Op Mult> '*' <Op Unary>
               | <Op Mult> '/' <Op Unary>
               | <Op Mult> '%' <Op Unary>
               | <Op Unary>

<Op Unary>   ::= '!'    <Op Unary>
               | '~'    <Op Unary>   
               | '-'    <Op Unary>
               | '*'    <Op Unary>
               | '&'    <Op Unary>               
               | '++'   <Op Unary>
               | '--'   <Op Unary>
               | <Op Pointer> '++'
               | <Op Pointer> '--'
               | '(' <Type> ')' <Op Unary>   !CAST
               | sizeof '(' <Type> ')'
               | sizeof '(' ID <Pointers> ')'
               | <Op Pointer>

<Op Pointer> ::= <Op Pointer> '.' <Value>
               | <Op Pointer> '->' <Value>
               | <Op Pointer> '[' <Expr> ']'
               | <Value>

<Value>      ::= OctLiteral
               | HexLiteral
               | DecLiteral  
               | StringLiteral
               | CharLiteral
               | FloatLiteral
               | Id '(' <Expr> ')'
               | Id '(' ')'          

               | Id
               | '(' <Expr> ')'



! ------------------------------------------------------------------- 
! If Statement
! -------------------------------------------------------------------

<IfStmt>               ::= 'If' <Expr> 'Then' <NL> <BlockStmtList> <ElseStmtList> 'End' 'If' <NL>
                         | 'If' <Expr> 'Then' <InlineStmt> <ElseOpt> <EndIfOpt> <NL>

<ElseStmtList>         ::= 'ElseIf' <Expr> 'Then' <NL> <BlockStmtList> <ElseStmtList>
                         | 'ElseIf' <Expr> 'Then' <InlineStmt> <NL> <ElseStmtList>
                         | 'Else' <InlineStmt> <NL>
                         | 'Else' <NL> <BlockStmtList>
                         |

<ElseOpt>              ::= 'Else' <InlineStmt>
                         |

<EndIfOpt>             ::= 'End' 'If'
 

! ------------------------------------------------------------------- 
! Loop Statement
! -------------------------------------------------------------------

<LoopStmt>             ::= 'Do' <LoopType> <Expr> <NL> <BlockStmtList> 'Loop' <NL>
                         | 'Do' <NL> <BlockStmtList> 'Loop' <LoopType> <Expr> <NL>
                         | 'Do' <NL> <BlockStmtList> 'Loop' <NL>
                         | 'While' <Expr> <NL> <BlockStmtList> 'Wend' <NL>

<LoopType>             ::= 'While'
                         | 'Until'

! ------------------------------------------------------------------- 
! For Statement
! -------------------------------------------------------------------

<ForStmt>              ::= 'For' ID '=' <Expr> 'To' <Expr> <StepOpt> <NL> <BlockStmtList> 'Next' <NL>
                         | 'For' 'Each' ID 'In' <Expr> <NL> <BlockStmtList> 'Next' <NL>

<StepOpt>              ::= 'Step' <Expr>
                         |

! ------------------------------------------------------------------- 
! Select Statement
! -------------------------------------------------------------------

<SelectStmt>           ::= 'Select' 'Case' <Expr> <NL> <CaseStmtList> 'End' 'Select' <NL>

<CaseStmtList>         ::= 'Case' <ExprList> <NLOpt> <BlockStmtList> <CaseStmtList>
                         | 'Case' 'Else' <NLOpt> <BlockStmtList>
                         |

<ExprList>             ::= <Expr> ',' <ExprList>
                         | <Expr>


! ------------------------------------------------------------------- 
! END OF GRAMMAR DEFINITION
! -------------------------------------------------------------------

Good luck!

AGS · Post by **AGS** » Dec 23, 2008 5:07

McLovin wrote:Thanks AGS, I worked on an FB grammar for a week or two using the VB.Net, VBScript and C grammars as guides. I had never written a grammar in GOLD before. I got a fair amount of stuff defined and working until I got lost in the "expressions" - that's where the reduce/reduce errors occurred.

When you use a Lex/Yacc clone (like JS/CC or JFlex/CUP or Flex/Bison or...) you can define operator priorities. Otherwise you have to split the expression up and have one expression refer to the other one. This splitting has to be done to get the priorities right (otherwise the parser will not apply the right operator precedence). If you do it like that (like in the grammar you've posted) you get a lot of rules.

In JS/CC (or Flex/Bison, JFlex/CUP, Flex/Lemon or pick your own Lex/Yacc clone) you get (after defining priorities):

expr: expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
| INT
| FLOAT

Of course you'd have to put a lot more expressions in there to get where you want to get (loadsa operators in FB) and define priorities on them to get the kind of evaluation on expressions you want.

And you have to deal with the ambiguities of the FB language. Bison comes with GLR parsing power nowadays which should take care of many a problem though some cannot be solved with a formal grammar alone. You'd have to take care of those ambiguities some other way.

Nice GOLD parser grammar you've put together there. Pity it doesn't work yet.

AGS · Post by **AGS** » Dec 23, 2008 5:14

TheMG wrote:Not just that, but the syntax used to define them varies throughout, replacing x? with [x], x* with {x}, etc. I am attempting to clean it up.

It's comment inside source files, nothing more, nothing less. I'm sure the developers never intended the comments to form a formal grammar of the FB language. It'll need rewriting before it's acceptable to an LL(K) (if not a LALR(1) or a GLR) compiler compiler package.

AGS · Post by **AGS** » Dec 26, 2008 18:58

I'm moving the development of the FreeBASIC grammar to a project called FreeBASIC Grammar Part I. So no more grammar updates on this page.