FreeBASIC grammar (extracted from the FreeBASIC sources)
FreeBASIC grammar (extracted from the FreeBASIC sources)
The FreeBASIC grammar is spread out across many pages of the manual. So I thought it would be nice to get the grammar on one page. And to get the grammar I took the FreeBASIC sources and extracted the lines containing FreeBASIC grammar rules. So here it is, the FreeBASIC grammar according to the FreeBASIC developers.
* PatternSearchMatcher[(['][']\s*\w+\s+[=]\s+.+)|(['][']\s*[|]\s*.+),true]
Lexical Grammar
''char classes:
ALPHA 'A' - 'Z'
DIGIT '0' - '9'
HEXDIG 'A' - 'F' | DIGIT
OCTDIG '0' - '7'
BINDIG '0' | '1'
ALPHADIGIT ALPHA | DIGIT
ISUFFIX '%' | '&'
FSUFFIX '!' | '#'
SUFFIX ISUFFIX | FSUFFIX | '$'
EXPCHAR 'D' | 'E'
OPERATOR '=' | '<' | '>' | '+' | '-' | '*' | '/' | '\' | '^'
DELIMITER '.' | ':' | ',' | ';' | '"' | '''
indentifier = (ALPHA | '_') { [ALPHADIGIT | '_' ] } [SUFFIX].
hex_oct_bin = 'H' HEXDIG+
| 'O' OCTDIG+
| 'B' BINDIG+
float = DOT DIGIT { DIGIT } [FSUFFIX | { EXPCHAR [opadd] DIGIT DIGIT } } | ].
number = DIGIT dig_dot_nil i_fsufx_nil
| '.' float
| '&' hex_oct_bin
dig_dot_nil = DIGIT dig_dot_nil
| ('.'|EXPCHAR) float
| .
i_fsufx_nil = ISUFFIX # is integer
| FSUFFIX # is float
| . # is def### !!! context sensitive !!!
string = '"' { ANY_CHAR_BUT_QUOTE } '"'. # less quotes
MultiLineComment = '/' ''' . '/' '''
Context Free Grammar
Program = Line* EOF? .
Line = Label? Statement? Comment? EOL .
Label = NUM_LIT
| ID ':' .
Statement = STT_SEPARATOR? ( Declaration
| ProcCallOrAssign
| CompoundStmt
| QuirkStmt
| AsmBlock
| AssignmentOrPtrCall )?
(STT_SEPARATOR Statement)* .
SttSeparator = (STT_SEPARATOR | EOL)+ .
Declaration = ConstDecl | TypeDecl | VariableDecl | ProcDecl | DefDecl | OptDecl.
ConstDecl = CONST (AS SymbolType)? ConstAssign (DECL_SEPARATOR ConstAssign)*
ConstAssign = ID (AS SymbolType)? '=' ConstExpression
TypeDecl = (TYPE|UNION) ID (ALIAS LITSTR)? (FIELD '=' Expression)? Comment? SttSeparator TypeLine+ END (TYPE|UNION) .
VariableDecl = (REDIM PRESERVE?|DIM|COMMON) SHARED? SymbolDef
| EXTERN IMPORT? SymbolDef ALIAS STR_LIT
| STATIC SymbolDef .
ProcDecl = DECLARE ((SUB | FUNCTION) ProcHeader | OPERATOR OperatorHeader ) .
ProcHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
CallConvention = (CDECL|STDCALL|PASCAL)?
OperatorHeader = Operator CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
DefDecl = (DEFINT|DEFLNG|DEFSNG|DEFDBL|DEFSTR) (CHAR '-' CHAR ','?)* .
ProcCallOrAssign= CALL ID ('(' ProcParamList ')')?
| ID ProcParamList?
| (ID | FUNCTION | OPERATOR | PROPERTY) '=' Expression .
ProcParamList = ProcArg (DECL_SEPARATOR ProcArg)* .
ProcArg = BYVAL? (ID(('(' ')')? | Expression) .
CompoundStmt = IfStatement
| ForStatement
| DoStatement
| WhileStatement
| SelectStatement
| ExitStatement
| ContinueStatement
| EndStatement
IfStmtBegin = IF Expression THEN (BlockIfStatement | SingleIfStatement) .
IfStmtNext = ELSEIF Expression THEN
| ELSE .
IfStmtEnd = END IF | ENDIF .
ForStmtBegin = FOR ID (AS DataType)? '=' Expression TO Expression (STEP Expression)? .
ForStmtEnd = NEXT (ID (',' ID?))? .
DoStmtBegin = DO ((WHILE | UNTIL) Expression)? .
DoStmtEnd = LOOP ((WHILE | UNTIL) Expression)? .
WhileStmtBegin = WHILE Expression .
WhileStmtEnd = WEND
SelectStatement = SELECT CASE (AS CONST)? Expression .
SelectStmtNext = CASE (ELSE | (CaseExpression (',' CaseExpression)*)) .
CaseExpression = (Expression (TO Expression)?)?
| (IS REL_OP Expression)? .
SelectStmtEnd = END SELECT .
SelConstStmtBegin = SELECT CASE AS CONST Expression{int} .
cSelConstStmtNext = CASE (ELSE | (ConstExpression{int} (',' ConstExpression{int})*)) .
SelConstStmtEnd = END SELECT .
ExitStatement = EXIT (FOR | DO | WHILE | SELECT | SUB | FUNCTION)
ContinueStatement = CONTINUE (FOR | DO | WHILE)
CompoundEnd = END (IF | SELECT | SUB | FUNCTION | SCOPE | WITH | NAMESPACE | EXTERN)
QuirkStmt = GotoStmt
| ArrayStmt
| PrintStmt
| MidStmt
| DataStmt
| etc .
GotoStmt = GOTO LABEL
| GOSUB LABEL
| RETURN LABEL?
| RESUME NEXT? .
ArrayStmt = ERASE ID (',' ID)*;
| SWAP Variable, Variable .
PrintStmt = (PRINT|'?') ('#' Expression ',')? (USING Expression{str} ';')? (Expression? ';'|"," )*
MidStmt = MID '(' Expression{str}, Expression{int} (',' Expression{int}) ')' '=' Expression{str} .
DataStmt = RESTORE LABEL?
| READ Variable{int|flt|str} (',' Variable{int|flt|str})*
| DATA literal|constant (',' literal|constant)*
AsmBlock = ASM Comment? SttSeparator
Assignment = LET? Variable BOP? '=' Expression
| Variable{function ptr} '(' ProcParamList ')' .
''Comment = (COMMENT_CHAR | REM) ((DIRECTIVE_CHAR Directive)
'' | (any_char_but_EOL*)) .
''Directive = INCLUDE ONCE? ':' '\'' STR_LIT '\''
'' | DYNAMIC
'' | STATIC .
''ExternStmtBegin = EXTERN "mangling_spec" (LIB LITSTR)? .
''ExternStmtEnd = END EXTERN .
''NamespaceStmtBegin = NAMESPACE (ID (ALIAS LITSTR)?)? .
''NamespaceStmtEnd = END NAMESPACE .
''Usingtmt = USING ID (',' ID)*
''ScopeStmtBegin = SCOPE .
''ScopeStmtEnd = END SCOPE .
''WithStmtBegin = WITH Variable .
''WithStmtEnd = END WITH .
''EndStatement = END Expression? .
''EnumConstDecl = ID ('=' ConstExpression)? .
''EnumBody = (EnumDecl (',' EnumDecl)? Comment? SttSeparator)+ .
''EnumDecl = ENUM ID? (ALIAS LITSTR)? EXPLICIT? Comment? SttSeparator
''OptDecl = OPTION (EXPLICIT|BASE NUM_LIT|BYVAL|PRIVATE|ESCAPE|DYNAMIC|STATIC)
'' ParamDecl = (BYVAL|BYREF)? ID (('(' ')')? (AS SymbolType)?)? ('=" (NUM_LIT|STR_LIT))? .
''TypeProtoDecl = DECLARE ( CONSTRUCTOR Params
'' | DESTRUCTOR
'' | OPERATOR Op Params
'' | PROPERTY Params
'' | (STATIC|CONST)? SUB|FUNCTION Params ) .
''TypeEnumDecl = ENUM|CONST ...
''TypeMultElementDecl = AS SymbolType ID (ArrayDecl | ':' NUMLIT)? ('=' Expression)?
'' TypeElementDecl = ID (ArrayDecl| ':' NUMLIT)? AS SymbolType ('=' Expression)?
''TypeBody = ( (UNION|TYPE Comment? SttSeparator
'' | ElementDecl
'' | AS AsElementDecl )+ .
''SymbolType = CONST? UNSIGNED? (
'' | CHAR|BYTE
'' | SHORT|WORD
'' | INTEGER|LONG|DWORD
'' | SINGLE
'' | DOUBLE
'' | STRING ('*' NUM_LIT)?
'' | USERDEFTYPE
'' | (FUNCTION|SUB) ('(' args ')') (AS SymbolType)?
'' Typedef = TYPE ((symbol AS DataType (',')?)+
''VarDecl = ID ('(' ArrayDecl? ')')? (AS SymbolType)? ('=' VarInitializer)?
''ArrayDecl = '(' Expression (TO Expression)?
''ArrayDecl = '(' Expression (TO Expression)?
''AutoVarDecl = AUTO SHARED? SymbolDef '=' VarInitializer
''Atom = Constant | Function | QuirkFunction | Variable | Literal .
''Expression = LogExpression .
''LogExpression = LogOrExpression ( (XOR | EQV | IMP) LogOrExpression )* .
''LogOrExpression = LogAndExpression ( OR LogAndExpression )* .
''LogAndExpression = RelExpression ( AND RelExpression )* .
''RelExpression = CatExpression ( (EQ | GT | LT | NE | LE | GE) CatExpression )* .
''CatExpression = AddExpression ( & AddExpression )* .
''AddExpression = ShiftExpression ( ('+' | '-') ShiftExpression )* .
''ShiftExpression = ModExpression ( (SHL | SHR) ModExpression )* .
''ModExpression = IntDivExpression ( MOD IntDivExpression )* .
''MultExpression = ExpExpression ( ('*' | '/') ExpExpression )* .
''ExpExpression = NegNotExpression ( '^' NegNotExpression )* .
'' Constant = ID .
'' LitString = STR_LITERAL STR_LITERAL* .
''Literal = NUM_LITERAL
'' | STR_LITERAL STR_LITERAL* .
''Function = ID ('(' ProcParamList ')')? FuncPtrOrMemberDeref? .
'' | NOT RelExpression
'' | HighestPresExpr .
'' | ( DerefExpr
'' | CastingExpr
'' | PtrTypeCastingExpr
'' | ParentExpression
'' | AnonUDT
'' | Atom .
'' CastingExpr = CAST '(' DataType ',' Expression ')'
'' PtrTypeCastingExpr = CPTR '(' DataType ',' Expression{int|uint|ptr} ')'
''DerefExpression = DREF+ HighestPresExpr .
''AddrOfExpression = VARPTR '(' HighPrecExpr ')'
'' | PROCPTR '(' Proc ('('')')? ')'
'' | '@' (Proc ('('')')? | HighPrecExpr)
'' | SADD|STRPTR '(' Variable{str}|Const{str}|Literal{str} ')' .
''FieldArray = '(' Expression (',' Expression)* ')' .
'' MemberId = ID ArrayIdx?
'' UdtMember = MemberId ('.' MemberId)*
''MemberDeref = (('->' DREF* | '[' Expression ']' '.'?) UdtMember)* .
''FuncPtrOrDeref = FuncPtr '(' Args? ')'
'' | MemberDeref .
''DynArrayIdx = '(' Expression (',' Expression)* ')' .
''ArgArrayIdx = '(' Expression (',' Expression)* ')' .
''ArrayIdx = '(' Expression (',' Expression)* ')' .
''Variable = ID ArrayIdx? UdtMember? FuncPtrOrMemberDeref? .
''Variable = ID .
''WithVariable = '.' UdtMember FuncPtrOrMemberDeref? .
''Variable = '.'? ID ArrayIdx? UdtMember? FuncPtrOrMemberDeref? .
''ImplicitDataMember = UdtMember? FuncPtrOrMemberDeref? .
''cVarOrDeref = Deref | PtrTypeCasting | AddrOf | Variable
'' Identifier = (ID{namespace|class} '.')* ID
'' | ID ('.' ID)* .
'' ParentId = ID{namespace|class} ('.' ID{namespace|class})* .
''AsmCode = (Text !(END|Comment|NEWLINE))*
''PropHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
''CtorHeader = CDECL? OVERLOAD? (ALIAS LIT_STRING)? Parameters? STATIC? EXPORT?
''ProcStmtBegin = (PRIVATE|PUBLIC)? STATIC?
''ProcStmtEnd = END (SUB | FUNCTION) .
''ProcArg = BYVAL? (ID(('(' ')')? | Expression) .
''ProcParam = BYVAL? (ID(('(' ')')? | Expression) .
''ProcArgList = ProcArg (DECL_SEPARATOR ProcArg)* .
''ProcArgList = ProcArg (DECL_SEPARATOR ProcArg)* .
'' | ID ProcParamList?
'' | (ID | FUNCTION | OPERATOR | PROPERTY) '=' Expression .
''cArrayFunct = (LBOUND|UBOUND) '(' ID (',' Expression)? ')' .
''TypeConvExpr = (C### '(' expression ')') .
'' AnonUDT = TYPE ('<' SymbolType '>')? '(' ... ')'
'' ViewStmt = VIEW (PRINT (Expression TO Expression)?) .
'' ScreenFunct = SCREEN '(' expr ',' expr ( ',' expr )? ')'
'' | SCREEN ( '(' ')' )? -- returns the current active/visible pages
''ErrorStmt = ERROR Expression
'' | ERR '=' Expression .
''cErrorFunct = ERR .
'' WriteStmt = WRITE ('#' Expression)? (Expression? "," )*
'' LineInputStmt = LINE INPUT ';'? ('#' Expression| Expression{str}?) (','|';')? Variable? .
'' InputStmt = INPUT ';'? (('#' Expression| STRING_LIT) (','|';'))? Variable (',' Variable)*
'' Put = PUT '#' Expression ',' Expression? ',' Expression{str|int|float|array} (',' Expression)?
'' Get = GET '#' Expression ',' Expression? ',' Variable{str|int|float|array} (',' Expression)?
'' FileOpen = OPEN Expression{str}
'' FileStmt = OPEN ...
'' | CLOSE ...
'' | SEEK ...
'' | LOCK ...
'' | ...
'' FileFunct = SEEK '(' Expression ')' |
'' GfxPset = PSET ( Expr ',' )? STEP? '(' Expr ',' Expr ')' (',' Expr )?
'' GfxCircle = CIRCLE ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' Expr ((',' Expr? (',' Expr? (',' Expr? (',' Expr (',' Expr)? )? )?)?)?)?
'' GfxPaint = PAINT ( Expr ',' )? STEP? '(' expr ',' expr ')' (',' expr? (',' expr? ) )
'' GfxDrawString = DRAW STRING ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' Expr ( ',' Expr ( ',' Expr ( ',' Expr ( ',' Expr )? )? )? )?
'' GfxDraw = DRAW ( Expr ',' )? Expr
'' GfxView = VIEW (SCREEN? '(' Expr ',' Expr ')' '-' '(' Expr ',' Expr ')' (',' Expr? (',' Expr)?)? )?
'' GfxPalette = PALETTE GET? ((USING Variable) | (Expr ',' Expr (',' Expr ',' Expr)?)?)
'' GfxPut = PUT ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' ('(' Expr ',' Expr ')' '-' '(' Expr ',' Expr ')' ',')? Variable (',' Mode (',' Alpha)?)?
'' GfxGet = GET ( Expr ',' )? STEP? '(' Expr ',' Expr ')' '-' STEP? '(' Expr ',' Expr ')' ',' Variable
''
'' GfxScreen = SCREEN (num | ((expr (((',' expr)? ',' expr)? expr)? ',' expr))
'' GfxScreenRes = SCREENRES expr ',' expr (((',' expr)? ',' expr)? ',' expr)?
'' GfxPoint = POINT '(' Expr ( ',' ( Expr )? ( ',' Expr )? )? ')'
'' GfxImageCreate = IMAGECREATE '(' Expr ',' Expr ( ',' ( Expr )? ( ',' Expr )? )? ')'
''cIIFFunct = IIF '(' condexpr ',' truexpr ',' falsexpr ')' .
'' cMathFunct = ABS( Expression )
'' | SGN( Expression )
'' | FIX( Expression )
'' | INT( Expression )
'' | LEN( data type | Expression ) .
''cOperatorNew = NEW DataType|Constructor()
'' | NEW DataType[Expr] .
''cOperatorDelete = DELETE expr
'' | DELETE[] expr .
''OnStmt = ON LOCAL? (Keyword | Expression) (GOTO|GOSUB) Label .
''PokeStmt = POKE Expression, Expression .
'' PeekFunct = PEEK '(' (SymbolType ',')? Expression ')' .
'' LsetStmt = LSET String|UDT (','|'=') Expression|UDT
'' cCVXFunct = CVD '(' Expression{str} ')'
'' | CVS '(' Expression{str} ')'
'' | CVI '(' Expression{str} ')'
'' | CVL '(' Expression{str} ')'
'' | CVSHORT '(' Expression{str} ')'
'' | CVLONGINT '(' Expression{str} ')'
'' cMKXFunct = MKD '(' Expression{double} ')'
'' | MKS '(' Expression{float} ')'
'' | MKI '(' Expression{int} ')'
'' | MKL '(' Expression{long} ')'
'' | MKSHORT '(' Expression{short} ')'
'' | MKLONGINT '(' Expression{longint} ')'
'' cStringFunct = W|STR$ '(' Expression{int|float|double} ')'
'' | MID$ '(' Expression ',' Expression (',' Expression)? ')'
'' | W|STRING$ '(' Expression ',' Expression{int|str} ')' .
'' | INSTR '(' (Expression{int} ',')? Expression{str}, "ANY"? Expression{str} ')'
'' | INSTRREV '(' Expression{str}, "ANY"? Expression{str} (',' Expression{int})? ')'
'' | RTRIM$ '(' Expression{str} (, "ANY" Expression{str} )? ')'
''cVAFunct = VA_FIRST ('(' ')')? .
''QuirkFunction = QBFUNCTION ('(' ProcParamList ')')? .
''Expression = LogExpression .
''LogExpression = RelExpression ( (AND | OR) RelExpression )* .
''RelExpression = AddExpression ( (EQ | GT | LT | NE | LE | GE) AddExpression )* .
''AddExpression = MultExpression ( ('+' | '-') MultExpression )* .
''MultExpression = ParentExpr ( ('*' | '/' | '\') ParentExpr )* .
'' ParentExpr = '(' Expression ')'
'' | DEFINED'(' ID ')'
'' | LITERAL
'' | NOT RelExpression .
'' Define = DEFINE ID (!WHITESPC '(' ID (',' ID)* ')')? LITERAL+
'' | MACRO ID '(' ID (',' ID)* ')' Comment? EOL
'' Pragma = PRAGMA
'' | POP '(' symbol ')'
'' | symbol ('=' expression{int})?
'' PreProcess = '#'DEFINE ID (!WHITESPC '(' ID (',' ID)* ')')? LITERAL*
'' | '#'UNDEF ID
'' | '#'IFDEF ID
'' | '#'IFNDEF ID
'' | '#'IF Expression
'' | '#'ELSE
'' | '#'ELSEIF Expression
'' | '#'ENDIF
'' | '#'PRINT LITERAL*
'' | '#'INCLUDE ONCE? LIT_STR
'' | '#'INCLIB LIT_STR
'' | '#'LIBPATH LIT_STR
'' | '#'ERROR LIT_STR .
'' ppInclude = '#'INCLUDE ONCE? LIT_STR
'' ppIncLib = '#'INCLIB LIT_STR
'' ppLibPath = '#'LIBPATH LIT_STR
'' ppLine = '#'LINE LIT_NUM LIT_STR?
''EnumDecl = ENUM ID? (ALIAS LITSTR)? EXPLICIT? Comment? SttSeparator
'' EnumLine+
'' END ENUM .
''Parameters= '(' ParamDecl (',' ParamDecl)* ')' .
''TypeDecl = (TYPE|UNION) ID (ALIAS LITSTR)? (FIELD '=' Expression)? Comment? SttSeparator
'' TypeLine+
'' END (TYPE|UNION) .
''SymbolType = CONST? UNSIGNED? (
'' ANY
'' | CHAR|BYTE
'' | SHORT|WORD
'' | INTEGER|LONG|DWORD
'' | SINGLE
'' | DOUBLE
'' | STRING ('*' NUM_LIT)?
'' | USERDEFTYPE
'' | (FUNCTION|SUB) ('(' args ')') (AS SymbolType)?
'' (CONST? (PTR|POINTER))* .
'' Typedef = TYPE ((symbol AS DataType (',')?)+
'' AS DataType (symbol (',')?)+
''ArrayDecl = '(' Expression (TO Expression)?
'' (',' Expression (TO Expression)?)*
'' ')' .
''AutoVarDecl = AUTO SHARED? SymbolDef '=' VarInitializer
'' (',' SymbolDef)* .
''NegNotExpression= ('-'|'+'|) ExpExpression
'' | NOT RelExpression
'' | HighestPresExpr .
''AsmBlock = ASM Comment? SttSeparator
'' (AsmCode Comment? NewLine)+
'' END ASM .
''ProcHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? ((AS SymbolType)? | CONSTRUCTOR|DESTRUCTOR)?
'' Priority? STATIC? EXPORT?
''OperatorHeader = Operator CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? (AS SymbolType)? STATIC? EXPORT?
''PropHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? (AS SymbolType)? STATIC? EXPORT?
''ProcStmtBegin = (PRIVATE|PUBLIC)? STATIC?
(SUB|FUNCTION|CONSTRUCTOR|DESTRUCTOR|OPERATOR) ProcHeader .
'' FileFunct = SEEK '(' Expression ')' |
'' INPUT '(' Expr, (',' '#'? Expr)? ')'.
In case you are wondering what the
* PatternSearchMatcher[(['][']\s*\w+\s+[=]\s+.+)|(['][']\s*[|]\s*.+),true]
at the top is: it is the regular expression I used to extract the lines containing grammar. It's not very exact (I had to get some lines by hand) but it worked for most of the grammar.
[Edit 22/12/2008] Started changing the sequence of rules to make the grammer more readable.
Removed some garbage and one or two duplicate rules.
[Edit 23/12/2008] Changed sequence of rules. Last rule added to sequenced rules was compoundstatement. Removed some more garbage and whitelines from the grammar.
[Edit 24/12/2008] Finished adding compoundstatement. There are more CompoundStmts in the sourcecode that are not mentioned in the comment on CompoundStmt (With, Namespace, Using, Scope).
[Edit 26/12/2008] Added quirckstatement (incomplete), asmblock and assignment. Removed some duplicate lines.
* PatternSearchMatcher[(['][']\s*\w+\s+[=]\s+.+)|(['][']\s*[|]\s*.+),true]
Lexical Grammar
''char classes:
ALPHA 'A' - 'Z'
DIGIT '0' - '9'
HEXDIG 'A' - 'F' | DIGIT
OCTDIG '0' - '7'
BINDIG '0' | '1'
ALPHADIGIT ALPHA | DIGIT
ISUFFIX '%' | '&'
FSUFFIX '!' | '#'
SUFFIX ISUFFIX | FSUFFIX | '$'
EXPCHAR 'D' | 'E'
OPERATOR '=' | '<' | '>' | '+' | '-' | '*' | '/' | '\' | '^'
DELIMITER '.' | ':' | ',' | ';' | '"' | '''
indentifier = (ALPHA | '_') { [ALPHADIGIT | '_' ] } [SUFFIX].
hex_oct_bin = 'H' HEXDIG+
| 'O' OCTDIG+
| 'B' BINDIG+
float = DOT DIGIT { DIGIT } [FSUFFIX | { EXPCHAR [opadd] DIGIT DIGIT } } | ].
number = DIGIT dig_dot_nil i_fsufx_nil
| '.' float
| '&' hex_oct_bin
dig_dot_nil = DIGIT dig_dot_nil
| ('.'|EXPCHAR) float
| .
i_fsufx_nil = ISUFFIX # is integer
| FSUFFIX # is float
| . # is def### !!! context sensitive !!!
string = '"' { ANY_CHAR_BUT_QUOTE } '"'. # less quotes
MultiLineComment = '/' ''' . '/' '''
Context Free Grammar
Program = Line* EOF? .
Line = Label? Statement? Comment? EOL .
Label = NUM_LIT
| ID ':' .
Statement = STT_SEPARATOR? ( Declaration
| ProcCallOrAssign
| CompoundStmt
| QuirkStmt
| AsmBlock
| AssignmentOrPtrCall )?
(STT_SEPARATOR Statement)* .
SttSeparator = (STT_SEPARATOR | EOL)+ .
Declaration = ConstDecl | TypeDecl | VariableDecl | ProcDecl | DefDecl | OptDecl.
ConstDecl = CONST (AS SymbolType)? ConstAssign (DECL_SEPARATOR ConstAssign)*
ConstAssign = ID (AS SymbolType)? '=' ConstExpression
TypeDecl = (TYPE|UNION) ID (ALIAS LITSTR)? (FIELD '=' Expression)? Comment? SttSeparator TypeLine+ END (TYPE|UNION) .
VariableDecl = (REDIM PRESERVE?|DIM|COMMON) SHARED? SymbolDef
| EXTERN IMPORT? SymbolDef ALIAS STR_LIT
| STATIC SymbolDef .
ProcDecl = DECLARE ((SUB | FUNCTION) ProcHeader | OPERATOR OperatorHeader ) .
ProcHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
CallConvention = (CDECL|STDCALL|PASCAL)?
OperatorHeader = Operator CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
DefDecl = (DEFINT|DEFLNG|DEFSNG|DEFDBL|DEFSTR) (CHAR '-' CHAR ','?)* .
ProcCallOrAssign= CALL ID ('(' ProcParamList ')')?
| ID ProcParamList?
| (ID | FUNCTION | OPERATOR | PROPERTY) '=' Expression .
ProcParamList = ProcArg (DECL_SEPARATOR ProcArg)* .
ProcArg = BYVAL? (ID(('(' ')')? | Expression) .
CompoundStmt = IfStatement
| ForStatement
| DoStatement
| WhileStatement
| SelectStatement
| ExitStatement
| ContinueStatement
| EndStatement
IfStmtBegin = IF Expression THEN (BlockIfStatement | SingleIfStatement) .
IfStmtNext = ELSEIF Expression THEN
| ELSE .
IfStmtEnd = END IF | ENDIF .
ForStmtBegin = FOR ID (AS DataType)? '=' Expression TO Expression (STEP Expression)? .
ForStmtEnd = NEXT (ID (',' ID?))? .
DoStmtBegin = DO ((WHILE | UNTIL) Expression)? .
DoStmtEnd = LOOP ((WHILE | UNTIL) Expression)? .
WhileStmtBegin = WHILE Expression .
WhileStmtEnd = WEND
SelectStatement = SELECT CASE (AS CONST)? Expression .
SelectStmtNext = CASE (ELSE | (CaseExpression (',' CaseExpression)*)) .
CaseExpression = (Expression (TO Expression)?)?
| (IS REL_OP Expression)? .
SelectStmtEnd = END SELECT .
SelConstStmtBegin = SELECT CASE AS CONST Expression{int} .
cSelConstStmtNext = CASE (ELSE | (ConstExpression{int} (',' ConstExpression{int})*)) .
SelConstStmtEnd = END SELECT .
ExitStatement = EXIT (FOR | DO | WHILE | SELECT | SUB | FUNCTION)
ContinueStatement = CONTINUE (FOR | DO | WHILE)
CompoundEnd = END (IF | SELECT | SUB | FUNCTION | SCOPE | WITH | NAMESPACE | EXTERN)
QuirkStmt = GotoStmt
| ArrayStmt
| PrintStmt
| MidStmt
| DataStmt
| etc .
GotoStmt = GOTO LABEL
| GOSUB LABEL
| RETURN LABEL?
| RESUME NEXT? .
ArrayStmt = ERASE ID (',' ID)*;
| SWAP Variable, Variable .
PrintStmt = (PRINT|'?') ('#' Expression ',')? (USING Expression{str} ';')? (Expression? ';'|"," )*
MidStmt = MID '(' Expression{str}, Expression{int} (',' Expression{int}) ')' '=' Expression{str} .
DataStmt = RESTORE LABEL?
| READ Variable{int|flt|str} (',' Variable{int|flt|str})*
| DATA literal|constant (',' literal|constant)*
AsmBlock = ASM Comment? SttSeparator
Assignment = LET? Variable BOP? '=' Expression
| Variable{function ptr} '(' ProcParamList ')' .
''Comment = (COMMENT_CHAR | REM) ((DIRECTIVE_CHAR Directive)
'' | (any_char_but_EOL*)) .
''Directive = INCLUDE ONCE? ':' '\'' STR_LIT '\''
'' | DYNAMIC
'' | STATIC .
''ExternStmtBegin = EXTERN "mangling_spec" (LIB LITSTR)? .
''ExternStmtEnd = END EXTERN .
''NamespaceStmtBegin = NAMESPACE (ID (ALIAS LITSTR)?)? .
''NamespaceStmtEnd = END NAMESPACE .
''Usingtmt = USING ID (',' ID)*
''ScopeStmtBegin = SCOPE .
''ScopeStmtEnd = END SCOPE .
''WithStmtBegin = WITH Variable .
''WithStmtEnd = END WITH .
''EndStatement = END Expression? .
''EnumConstDecl = ID ('=' ConstExpression)? .
''EnumBody = (EnumDecl (',' EnumDecl)? Comment? SttSeparator)+ .
''EnumDecl = ENUM ID? (ALIAS LITSTR)? EXPLICIT? Comment? SttSeparator
''OptDecl = OPTION (EXPLICIT|BASE NUM_LIT|BYVAL|PRIVATE|ESCAPE|DYNAMIC|STATIC)
'' ParamDecl = (BYVAL|BYREF)? ID (('(' ')')? (AS SymbolType)?)? ('=" (NUM_LIT|STR_LIT))? .
''TypeProtoDecl = DECLARE ( CONSTRUCTOR Params
'' | DESTRUCTOR
'' | OPERATOR Op Params
'' | PROPERTY Params
'' | (STATIC|CONST)? SUB|FUNCTION Params ) .
''TypeEnumDecl = ENUM|CONST ...
''TypeMultElementDecl = AS SymbolType ID (ArrayDecl | ':' NUMLIT)? ('=' Expression)?
'' TypeElementDecl = ID (ArrayDecl| ':' NUMLIT)? AS SymbolType ('=' Expression)?
''TypeBody = ( (UNION|TYPE Comment? SttSeparator
'' | ElementDecl
'' | AS AsElementDecl )+ .
''SymbolType = CONST? UNSIGNED? (
'' | CHAR|BYTE
'' | SHORT|WORD
'' | INTEGER|LONG|DWORD
'' | SINGLE
'' | DOUBLE
'' | STRING ('*' NUM_LIT)?
'' | USERDEFTYPE
'' | (FUNCTION|SUB) ('(' args ')') (AS SymbolType)?
'' Typedef = TYPE ((symbol AS DataType (',')?)+
''VarDecl = ID ('(' ArrayDecl? ')')? (AS SymbolType)? ('=' VarInitializer)?
''ArrayDecl = '(' Expression (TO Expression)?
''ArrayDecl = '(' Expression (TO Expression)?
''AutoVarDecl = AUTO SHARED? SymbolDef '=' VarInitializer
''Atom = Constant | Function | QuirkFunction | Variable | Literal .
''Expression = LogExpression .
''LogExpression = LogOrExpression ( (XOR | EQV | IMP) LogOrExpression )* .
''LogOrExpression = LogAndExpression ( OR LogAndExpression )* .
''LogAndExpression = RelExpression ( AND RelExpression )* .
''RelExpression = CatExpression ( (EQ | GT | LT | NE | LE | GE) CatExpression )* .
''CatExpression = AddExpression ( & AddExpression )* .
''AddExpression = ShiftExpression ( ('+' | '-') ShiftExpression )* .
''ShiftExpression = ModExpression ( (SHL | SHR) ModExpression )* .
''ModExpression = IntDivExpression ( MOD IntDivExpression )* .
''MultExpression = ExpExpression ( ('*' | '/') ExpExpression )* .
''ExpExpression = NegNotExpression ( '^' NegNotExpression )* .
'' Constant = ID .
'' LitString = STR_LITERAL STR_LITERAL* .
''Literal = NUM_LITERAL
'' | STR_LITERAL STR_LITERAL* .
''Function = ID ('(' ProcParamList ')')? FuncPtrOrMemberDeref? .
'' | NOT RelExpression
'' | HighestPresExpr .
'' | ( DerefExpr
'' | CastingExpr
'' | PtrTypeCastingExpr
'' | ParentExpression
'' | AnonUDT
'' | Atom .
'' CastingExpr = CAST '(' DataType ',' Expression ')'
'' PtrTypeCastingExpr = CPTR '(' DataType ',' Expression{int|uint|ptr} ')'
''DerefExpression = DREF+ HighestPresExpr .
''AddrOfExpression = VARPTR '(' HighPrecExpr ')'
'' | PROCPTR '(' Proc ('('')')? ')'
'' | '@' (Proc ('('')')? | HighPrecExpr)
'' | SADD|STRPTR '(' Variable{str}|Const{str}|Literal{str} ')' .
''FieldArray = '(' Expression (',' Expression)* ')' .
'' MemberId = ID ArrayIdx?
'' UdtMember = MemberId ('.' MemberId)*
''MemberDeref = (('->' DREF* | '[' Expression ']' '.'?) UdtMember)* .
''FuncPtrOrDeref = FuncPtr '(' Args? ')'
'' | MemberDeref .
''DynArrayIdx = '(' Expression (',' Expression)* ')' .
''ArgArrayIdx = '(' Expression (',' Expression)* ')' .
''ArrayIdx = '(' Expression (',' Expression)* ')' .
''Variable = ID ArrayIdx? UdtMember? FuncPtrOrMemberDeref? .
''Variable = ID .
''WithVariable = '.' UdtMember FuncPtrOrMemberDeref? .
''Variable = '.'? ID ArrayIdx? UdtMember? FuncPtrOrMemberDeref? .
''ImplicitDataMember = UdtMember? FuncPtrOrMemberDeref? .
''cVarOrDeref = Deref | PtrTypeCasting | AddrOf | Variable
'' Identifier = (ID{namespace|class} '.')* ID
'' | ID ('.' ID)* .
'' ParentId = ID{namespace|class} ('.' ID{namespace|class})* .
''AsmCode = (Text !(END|Comment|NEWLINE))*
''PropHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
''CtorHeader = CDECL? OVERLOAD? (ALIAS LIT_STRING)? Parameters? STATIC? EXPORT?
''ProcStmtBegin = (PRIVATE|PUBLIC)? STATIC?
''ProcStmtEnd = END (SUB | FUNCTION) .
''ProcArg = BYVAL? (ID(('(' ')')? | Expression) .
''ProcParam = BYVAL? (ID(('(' ')')? | Expression) .
''ProcArgList = ProcArg (DECL_SEPARATOR ProcArg)* .
''ProcArgList = ProcArg (DECL_SEPARATOR ProcArg)* .
'' | ID ProcParamList?
'' | (ID | FUNCTION | OPERATOR | PROPERTY) '=' Expression .
''cArrayFunct = (LBOUND|UBOUND) '(' ID (',' Expression)? ')' .
''TypeConvExpr = (C### '(' expression ')') .
'' AnonUDT = TYPE ('<' SymbolType '>')? '(' ... ')'
'' ViewStmt = VIEW (PRINT (Expression TO Expression)?) .
'' ScreenFunct = SCREEN '(' expr ',' expr ( ',' expr )? ')'
'' | SCREEN ( '(' ')' )? -- returns the current active/visible pages
''ErrorStmt = ERROR Expression
'' | ERR '=' Expression .
''cErrorFunct = ERR .
'' WriteStmt = WRITE ('#' Expression)? (Expression? "," )*
'' LineInputStmt = LINE INPUT ';'? ('#' Expression| Expression{str}?) (','|';')? Variable? .
'' InputStmt = INPUT ';'? (('#' Expression| STRING_LIT) (','|';'))? Variable (',' Variable)*
'' Put = PUT '#' Expression ',' Expression? ',' Expression{str|int|float|array} (',' Expression)?
'' Get = GET '#' Expression ',' Expression? ',' Variable{str|int|float|array} (',' Expression)?
'' FileOpen = OPEN Expression{str}
'' FileStmt = OPEN ...
'' | CLOSE ...
'' | SEEK ...
'' | LOCK ...
'' | ...
'' FileFunct = SEEK '(' Expression ')' |
'' GfxPset = PSET ( Expr ',' )? STEP? '(' Expr ',' Expr ')' (',' Expr )?
'' GfxCircle = CIRCLE ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' Expr ((',' Expr? (',' Expr? (',' Expr? (',' Expr (',' Expr)? )? )?)?)?)?
'' GfxPaint = PAINT ( Expr ',' )? STEP? '(' expr ',' expr ')' (',' expr? (',' expr? ) )
'' GfxDrawString = DRAW STRING ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' Expr ( ',' Expr ( ',' Expr ( ',' Expr ( ',' Expr )? )? )? )?
'' GfxDraw = DRAW ( Expr ',' )? Expr
'' GfxView = VIEW (SCREEN? '(' Expr ',' Expr ')' '-' '(' Expr ',' Expr ')' (',' Expr? (',' Expr)?)? )?
'' GfxPalette = PALETTE GET? ((USING Variable) | (Expr ',' Expr (',' Expr ',' Expr)?)?)
'' GfxPut = PUT ( Expr ',' )? STEP? '(' Expr ',' Expr ')' ',' ('(' Expr ',' Expr ')' '-' '(' Expr ',' Expr ')' ',')? Variable (',' Mode (',' Alpha)?)?
'' GfxGet = GET ( Expr ',' )? STEP? '(' Expr ',' Expr ')' '-' STEP? '(' Expr ',' Expr ')' ',' Variable
''
'' GfxScreen = SCREEN (num | ((expr (((',' expr)? ',' expr)? expr)? ',' expr))
'' GfxScreenRes = SCREENRES expr ',' expr (((',' expr)? ',' expr)? ',' expr)?
'' GfxPoint = POINT '(' Expr ( ',' ( Expr )? ( ',' Expr )? )? ')'
'' GfxImageCreate = IMAGECREATE '(' Expr ',' Expr ( ',' ( Expr )? ( ',' Expr )? )? ')'
''cIIFFunct = IIF '(' condexpr ',' truexpr ',' falsexpr ')' .
'' cMathFunct = ABS( Expression )
'' | SGN( Expression )
'' | FIX( Expression )
'' | INT( Expression )
'' | LEN( data type | Expression ) .
''cOperatorNew = NEW DataType|Constructor()
'' | NEW DataType[Expr] .
''cOperatorDelete = DELETE expr
'' | DELETE[] expr .
''OnStmt = ON LOCAL? (Keyword | Expression) (GOTO|GOSUB) Label .
''PokeStmt = POKE Expression, Expression .
'' PeekFunct = PEEK '(' (SymbolType ',')? Expression ')' .
'' LsetStmt = LSET String|UDT (','|'=') Expression|UDT
'' cCVXFunct = CVD '(' Expression{str} ')'
'' | CVS '(' Expression{str} ')'
'' | CVI '(' Expression{str} ')'
'' | CVL '(' Expression{str} ')'
'' | CVSHORT '(' Expression{str} ')'
'' | CVLONGINT '(' Expression{str} ')'
'' cMKXFunct = MKD '(' Expression{double} ')'
'' | MKS '(' Expression{float} ')'
'' | MKI '(' Expression{int} ')'
'' | MKL '(' Expression{long} ')'
'' | MKSHORT '(' Expression{short} ')'
'' | MKLONGINT '(' Expression{longint} ')'
'' cStringFunct = W|STR$ '(' Expression{int|float|double} ')'
'' | MID$ '(' Expression ',' Expression (',' Expression)? ')'
'' | W|STRING$ '(' Expression ',' Expression{int|str} ')' .
'' | INSTR '(' (Expression{int} ',')? Expression{str}, "ANY"? Expression{str} ')'
'' | INSTRREV '(' Expression{str}, "ANY"? Expression{str} (',' Expression{int})? ')'
'' | RTRIM$ '(' Expression{str} (, "ANY" Expression{str} )? ')'
''cVAFunct = VA_FIRST ('(' ')')? .
''QuirkFunction = QBFUNCTION ('(' ProcParamList ')')? .
''Expression = LogExpression .
''LogExpression = RelExpression ( (AND | OR) RelExpression )* .
''RelExpression = AddExpression ( (EQ | GT | LT | NE | LE | GE) AddExpression )* .
''AddExpression = MultExpression ( ('+' | '-') MultExpression )* .
''MultExpression = ParentExpr ( ('*' | '/' | '\') ParentExpr )* .
'' ParentExpr = '(' Expression ')'
'' | DEFINED'(' ID ')'
'' | LITERAL
'' | NOT RelExpression .
'' Define = DEFINE ID (!WHITESPC '(' ID (',' ID)* ')')? LITERAL+
'' | MACRO ID '(' ID (',' ID)* ')' Comment? EOL
'' Pragma = PRAGMA
'' | POP '(' symbol ')'
'' | symbol ('=' expression{int})?
'' PreProcess = '#'DEFINE ID (!WHITESPC '(' ID (',' ID)* ')')? LITERAL*
'' | '#'UNDEF ID
'' | '#'IFDEF ID
'' | '#'IFNDEF ID
'' | '#'IF Expression
'' | '#'ELSE
'' | '#'ELSEIF Expression
'' | '#'ENDIF
'' | '#'PRINT LITERAL*
'' | '#'INCLUDE ONCE? LIT_STR
'' | '#'INCLIB LIT_STR
'' | '#'LIBPATH LIT_STR
'' | '#'ERROR LIT_STR .
'' ppInclude = '#'INCLUDE ONCE? LIT_STR
'' ppIncLib = '#'INCLIB LIT_STR
'' ppLibPath = '#'LIBPATH LIT_STR
'' ppLine = '#'LINE LIT_NUM LIT_STR?
''EnumDecl = ENUM ID? (ALIAS LITSTR)? EXPLICIT? Comment? SttSeparator
'' EnumLine+
'' END ENUM .
''Parameters= '(' ParamDecl (',' ParamDecl)* ')' .
''TypeDecl = (TYPE|UNION) ID (ALIAS LITSTR)? (FIELD '=' Expression)? Comment? SttSeparator
'' TypeLine+
'' END (TYPE|UNION) .
''SymbolType = CONST? UNSIGNED? (
'' ANY
'' | CHAR|BYTE
'' | SHORT|WORD
'' | INTEGER|LONG|DWORD
'' | SINGLE
'' | DOUBLE
'' | STRING ('*' NUM_LIT)?
'' | USERDEFTYPE
'' | (FUNCTION|SUB) ('(' args ')') (AS SymbolType)?
'' (CONST? (PTR|POINTER))* .
'' Typedef = TYPE ((symbol AS DataType (',')?)+
'' AS DataType (symbol (',')?)+
''ArrayDecl = '(' Expression (TO Expression)?
'' (',' Expression (TO Expression)?)*
'' ')' .
''AutoVarDecl = AUTO SHARED? SymbolDef '=' VarInitializer
'' (',' SymbolDef)* .
''NegNotExpression= ('-'|'+'|) ExpExpression
'' | NOT RelExpression
'' | HighestPresExpr .
''AsmBlock = ASM Comment? SttSeparator
'' (AsmCode Comment? NewLine)+
'' END ASM .
''ProcHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? ((AS SymbolType)? | CONSTRUCTOR|DESTRUCTOR)?
'' Priority? STATIC? EXPORT?
''OperatorHeader = Operator CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? (AS SymbolType)? STATIC? EXPORT?
''PropHeader = ID CallConvention? OVERLOAD? (ALIAS LIT_STRING)?
'' Parameters? (AS SymbolType)? STATIC? EXPORT?
''ProcStmtBegin = (PRIVATE|PUBLIC)? STATIC?
(SUB|FUNCTION|CONSTRUCTOR|DESTRUCTOR|OPERATOR) ProcHeader .
'' FileFunct = SEEK '(' Expression ')' |
'' INPUT '(' Expr, (',' '#'? Expr)? ')'.
In case you are wondering what the
* PatternSearchMatcher[(['][']\s*\w+\s+[=]\s+.+)|(['][']\s*[|]\s*.+),true]
at the top is: it is the regular expression I used to extract the lines containing grammar. It's not very exact (I had to get some lines by hand) but it worked for most of the grammar.
[Edit 22/12/2008] Started changing the sequence of rules to make the grammer more readable.
Removed some garbage and one or two duplicate rules.
[Edit 23/12/2008] Changed sequence of rules. Last rule added to sequenced rules was compoundstatement. Removed some more garbage and whitelines from the grammar.
[Edit 24/12/2008] Finished adding compoundstatement. There are more CompoundStmts in the sourcecode that are not mentioned in the comment on CompoundStmt (With, Namespace, Using, Scope).
[Edit 26/12/2008] Added quirckstatement (incomplete), asmblock and assignment. Removed some duplicate lines.
Last edited by AGS on Jan 01, 2009 21:17, edited 9 times in total.
AGS, very nice. Now I guess the next step is to go through the lexer/parser modules and check the grammar -- and/or the procedures themselves -- for correctness (whether they match up, and also the [probably unlikely] chance that a parser procedure's grammar wasn't commented).
I think we'll also need to decide on a template to document the grammar with, ie., are we going to put everything on one page, or split it up ? It might be nice to make non-terminals links for easier navigation. It would probably be good to get all this in order so we don't end up with errors if something needs to be changed later on. Any ideas ? Again, good work !
edit: I guess we'll need to distinguish between the grammar of the different dialects somehow, too. Like how the "fb" dialect allows statements like Open to be used as a Function procedure and how DIM doesn't allow implicit type inference, for examples. It seems lots of this will be a duplication of the existing "syntax" in the KeyPg*s, particularly with the quirks. Hmm..
I think we'll also need to decide on a template to document the grammar with, ie., are we going to put everything on one page, or split it up ? It might be nice to make non-terminals links for easier navigation. It would probably be good to get all this in order so we don't end up with errors if something needs to be changed later on. Any ideas ? Again, good work !
edit: I guess we'll need to distinguish between the grammar of the different dialects somehow, too. Like how the "fb" dialect allows statements like Open to be used as a Function procedure and how DIM doesn't allow implicit type inference, for examples. It seems lots of this will be a duplication of the existing "syntax" in the KeyPg*s, particularly with the quirks. Hmm..
Parser procedures tend to have a name starting with a c.stylin wrote:AGS, very nice. Now I guess the next step is to go through the lexer/parser modules and check the grammar -- and/or the procedures themselves -- for correctness (whether they match up, and also the [probably unlikely] chance that a parser procedure's grammar wasn't commented).
Using geany (it has some support for FreeBASIC) I could find several procedures starting with a c that did not have a grammar comment.
For instance, cGfxStmt and cGfxFunct have no comment. Other parsing procedures have no comment either and I am not exactly sure whether they remained uncommented by mistake or because the developers found it more logical to not add grammar comment to certain parsing routines.
stylin wrote: I think we'll also need to decide on a template to document the grammar with, ie., are we going to put everything on one page, or split it up ? It might be nice to make non-terminals links for easier navigation. It would probably be good to get all this in order so we don't end up with errors if something needs to be changed later on. Any ideas ? Again, good work !
edit: I guess we'll need to distinguish between the grammar of the different dialects somehow, too. Like how the "fb" dialect allows statements like Open to be used as a Function procedure and how DIM doesn't allow implicit type inference, for examples. It seems lots of this will be a duplication of the existing "syntax" in the KeyPg*s, particularly with the quirks. Hmm..
The language reference contained within the FB manual is like the grammar + semantics and what I want to add is a recapitulation of the grammar in x pages.
Starting at the top rule
Program = Line* EOF?
Line = Label? Statement? Comment? EOL
going all the way down to the constants.
FB syntax only because otherwise you'd have to write three grammars (one for QB, one for Lite and one for FB) which are almost identical.
A grammar could look like this on a webpage (has to be LALR(1) if you want to generate it automagically with the goldparser)
http://www.devincook.com/GOLDParser/bui ... m#mult_exp
That's all the rules on one page, hyperlinked. In the case of FB this would yield a very lengthy page. I've got a VB6 grammar lying around (written for TXL, an interpreting parser generator) that goes on for about
13 pages (printed A4) with rules like
define access_modifier
'public | 'private | 'friend | 'static
end define
and containing quite a lot of statements (60+ including loops, choice etc...). To put 16 pages of printed A4 on one web page is a bit much. Better to start at the toplevel (rule program) and split it up at some arbitrary point (how far are people willing to scroll down a page?).
Hi,
I was playing around with the Gold Parser grammar definitions. I got a bit of the FB grammar prepared for Gold but eventually ran into Reduce-Reduce errors on the Expressions definitions. I haven't got back to figuring out the reason for it.
I also translated the "Kessels" Gold template file. If you use Gold then you know that you can output the grammar definitions to include files that your program can read directly (rather than having to load and parse the grammar all the time from the .cgt compiled grammar file).
Here is the template. Save it with a .pgt extension and place it in the Templates subdirectory.
Now, in order to read and actually work with the generated include file you would need to use an "engine" to read it. Here is the engine source code. Compile these files to engine.o and include it with your project when you compile it.
This is the "engine.bas" file. The "engine.bi" file follows after this one.
"engine.bi" include file:
Here is a translation of the sample code that reads and displays the "simple" language test grammar.
Hope this stuff is of use to someone.
:-)
I was playing around with the Gold Parser grammar definitions. I got a bit of the FB grammar prepared for Gold but eventually ran into Reduce-Reduce errors on the Expressions definitions. I haven't got back to figuring out the reason for it.
I also translated the "Kessels" Gold template file. If you use Gold then you know that you can output the grammar definitions to include files that your program can read directly (rather than having to load and parse the grammar all the time from the .cgt compiled grammar file).
Here is the template. Save it with a .pgt extension and place it in the Templates subdirectory.
Code: Select all
##TEMPLATE-NAME 'FreeBASIC - McLovin engine grammar.bi'
##LANGUAGE 'FreeBASIC'
##ENGINE-NAME 'McLovin FreeBASIC engine'
##AUTHOR 'McLovin'
##FILE-EXTENSION 'bi'
##NOTES
This template creates a .bi file for use by the FreeBASIC-McLovin engine.
The complete CGT (Compiled Grammar Table) is hardcoded into the .bi
file, so initializing the engine is very fast (it does not have
to load the CGT file from disk). Based on Kessels engine.
##END-NOTES
##ID-SEPARATOR '_'
##ID-SYMBOL-PREFIX 'Symbol'
##ID-RULE-PREFIX 'Rule'
##DELIMITER ','
''
''
'' This file was generated by the "FreeBASIC - McLovin engine grammar.bi" template.
'' (Based on the translation from "C - Kessels engine grammar.h")
''
''
#include once "\crt\wchar.bi" ' wchar_t
##CHAR-SET-TABLE
##CHAR-SETS
#define GrammarCharset%Index%CharCount %CharCount%
Dim GrammarCharset%Index%( %CharCount% ) As wchar_t => { _
##CHARS
%UnicodeIndex%, _
##END-CHARS
0}
##END-CHAR-SETS
##END-CHAR-SET-TABLE
''
'' struct DfaEdgeStruct {
'' int TargetState;
'' int CharCount;
'' wchar_t *CharacterSet;
'' };
''
##DFA-TABLE
##DFA-STATES
Dim GrammarDfaEdgeArray%Index%( %EdgeCount% ) As DfaEdgeStruct => { _
##DFA-EDGES
(%Target%,GrammarCharset%CharSetIndex%CharCount,@GrammarCharset%CharSetIndex%(0)), _
##END-DFA-EDGES
(-1,0,NULL)}
##END-DFA-STATES
##END-DFA-TABLE
##RULE-TABLE
##RULES
Dim GrammarRuleSymbolArray%Index%( %SymbolCount% ) As Integer => { _
##RULE-SYMBOLS
%SymbolIndex%, _
##END-RULE-SYMBOLS
-1}
##END-RULES
##END-RULE-TABLE
##LALR-TABLE
##LALR-STATES
Dim GrammarLalrActionArray%Index%( %ActionCount% ) As ActionStruct => { _
##LALR-ACTIONS
(%SymbolIndex%,%Action%,%Value%), _
##END-LALR-ACTIONS
(-1,-1,-1)}
##END-LALR-STATES
##END-LALR-TABLE
##SYMBOL-TABLE
''
'' struct SymbolStruct {
'' short Kind;
'' wchar_t *Name;
'' };
''
Dim GrammarSymbolArray( %Count%-1 ) As SymbolStruct => { _
##SYMBOLS
/' %Value.Padded% '/ (%Kind%,@wstr("%Name%"))%Delimiter% _
##END-SYMBOLS
}
##END-SYMBOL-TABLE
##DFA-TABLE
''
'' struct DfaEdgeStruct {
'' int TargetState;
'' wchar_t *CharacterSet;
'' };
'' struct DfaStateStruct {
'' int AcceptSymbol;
'' int EdgeCount;
'' struct DfaEdgeStruct *Edges;
'' };
''
Dim GrammarDfaStateArray ( %Count% ) As DfaStateStruct => { _
##DFA-STATES
/' %Index% '/ (%AcceptIndex%,%EdgeCount%,@GrammarDfaEdgeArray%Index%(0)), _
##END-DFA-STATES
(-1,-1,NULL)}
##END-DFA-TABLE
##RULE-TABLE
''
'' struct RuleStruct {
'' int Head;
'' int SymbolsCount;
'' int *Symbols;
'' wchar_t *Description;
'' };
''
Dim GrammarRuleArray( %Count%-1 ) As RuleStruct => { _
##RULES
/' %Value.Padded% '/ (%NonterminalIndex%,%SymbolCount%,@GrammarRuleSymbolArray%Index%(0),@wstr("%Description%"))%Delimiter% _
##END-RULES
}
##END-RULE-TABLE
##LALR-TABLE
''
'' struct ActionStruct {
'' int Entry;
'' short Action;
'' int Target;
'' };
'' struct LalrStateStruct {
'' int ActionCount;
'' struct ActionStruct *Actions;
'' };
''
Dim GrammarLalrStateArray( %Count% ) As LalrStateStruct => { _
##LALR-STATES
/' %Index% '/ (%ActionCount%,@GrammarLalrActionArray%Index%(0)), _
##END-LALR-STATES
(-1,NULL)}
##END-LALR-TABLE
''
'' struct GrammarStruct {
'' char CaseSensitive;
'' int InitialSymbol;
'' int InitialDfaState;
'' int InitialLalrState;
'' int SymbolCount;
'' struct SymbolStruct *SymbolArray;
'' int RuleCount;
'' struct RuleStruct *RuleArray;
'' int DfaStateCount;
'' struct DfaStateStruct *DfaArray;
'' int LalrStateCount;
'' struct LalrStateStruct *LalrArray;
'' };
''
Dim Shared Grammar As GrammarStruct
With Grammar
##PARAMETERS
.CaseSensitive = %CaseSensitive%
.InitialSymbol = %StartSymbol%
##END-PARAMETERS
##DFA-TABLE
.InitialDfaState = %InitialState%
##END-DFA-TABLE
##LALR-TABLE
.InitialLalrState = %InitialState%
##END-LALR-TABLE
##SYMBOL-TABLE
.SymbolCount = %Count%
.SymbolArray = @GrammarSymbolArray(0)
##END-SYMBOL-TABLE
##RULE-TABLE
.RuleCount = %Count%
.RuleArray = @GrammarRuleArray(0)
##END-RULE-TABLE
##DFA-TABLE
.DfaStateCount = %Count%
.DfaArray = @GrammarDfaStateArray(0)
##END-DFA-TABLE
##LALR-TABLE
.LalrStateCount = %Count%
.LalrArray = @GrammarLalrStateArray(0)
##END-LALR-TABLE
End With
This is the "engine.bas" file. The "engine.bi" file follows after this one.
Code: Select all
''
'' McLovin FreeBASIC engine
''
'' Based on translation of C code from:
'' Kessels engine for GOLD.
'' Jeroen C. Kessels
'' Internet Engineer
'' http://www.kessels.com/
''
'' Copyright: Jeroen C. Kessels
'' Date: 22 march 2006
'' Version: 1.8
''
''
#include once "\crt\wchar.bi" ' wchar_t
#define CR 13
#define LF 10
#define CRLF chr(13,10)
#include once "engine.bi" ' Defines all the data structs.
'' FIFO (first in first out) stack of Tokens.
type TokenStackStruct
Token as TokenStruct ptr
LalrState as integer ' Index into Grammar.LalrArray[].
NextToken as TokenStackStruct ptr ' Pointer to next item.
end type
' Create a readable/printable string from the wchar_t pointer
function wPrint( byval pByte as ushort ptr ) as string
dim st as string
dim i as integer
if pByte = 0 then exit function
i = 0
do until pByte[i] = 0
st = st & chr(pByte[i])
i = i + 1
loop
return st
end function
'********* DFA Tokenizer *********************************************************
'' Read 'Length' characters from the InputBuf and return as a
'' malloc'ed string. The string is terminated by appending a zero,
'' although the routine is binary-safe.
'' Return NULL if out of memory. */
function ReadString( _
InputBuf as wchar_t ptr, _ ' The input data.
InputSize as long, _ ' Size of the input data.
InputHere as long ptr, _ ' Index into input data.
nline as long ptr, _ ' Current line number.
nColumn as long ptr, _ ' Current column number.
nLength as long _ ' Number of characters to copy.
) as wchar_t ptr
dim wSt as wchar_t ptr
dim i as long
wSt = callocate((nLength + 1) * sizeof(wchar_t))
if wSt = null then return(null)
for i = 0 to nLength - 1
if *InputHere < InputSize then
if InputBuf[*InputHere] = CR then
if (*InputHere + 1 < InputSize) and _
(InputBuf[*InputHere + 1] <> LF) then
*nline = *nline + 1
*nColumn = 0
end if
end if
if InputBuf[*InputHere] = LF then
*nline = *nline + 1
*nColumn = 0
end if
wSt[i] = InputBuf[*InputHere]
*InputHere = *InputHere + 1
*nColumn = *nColumn + 1
else
wSt[i] = 0
end if
next
wSt[i] = 0
return wSt
end function
'' Search for a character in a characterset. Return 1 if found,
'' 0 if not found.
function FindChar( ThisChar as wchar_t, _
CharacterSet as wchar_t ptr, _
Count as long _
) as integer
dim Here as long
dim Interval as long
' Use wcschr() for charactersets with a length of up to 11 characters.
if (Count < 11) then
if (wcschr(CharacterSet, ThisChar) <> null) then return 1
return 0
end if
' binary search the characterset for the character. This method is
' possible because GOLD always produces sorted charactersets.
' Measurements show that although the code is more complex, this
' binary search is faster than wcschr() for charactersets longer
' than 11 characters. At 100 characters it's 4 times faster. */
Interval = 32768
while (Interval > Count)
Interval = (Interval shr 1)
wend
Here = Interval - 1
Interval = (Interval shr 1)
while (Interval > 0)
if (CharacterSet[Here] = ThisChar) then return 1
if (CharacterSet[Here] > ThisChar) then
Here = Here - Interval
else
while (Here + Interval >= Count)
Interval = (Interval shr 1)
if (Interval = 0) then return 0
wend
Here = Here + Interval
end if
Interval = (Interval shr 1)
wend
if (CharacterSet[Here] = ThisChar) then return 1
return 0
end function
'' Parse a symbol from the InputBuf.
'' Symbols are lexical entities of 1 or more characters. The parser
'' uses the Grammar.DfaArray[] compiled by GOLD, which is basically
'' a tree of charactersets. The parser takes characters from the
'' input and descends the tree by looking for the character in the
'' characterset. At the last branch of the tree is an index into the
'' Grammar.SymbolsArray[], also compiled by GOLD.
'' Return the index into Grammar.SymbolsArray[], and a pointer to a
'' malloc'ed string with characters from the input.
'' Symbol 0 is 'SYMBOLEOF' (string is NULL)
'' Symbol 1 is 'SYMBOLERROR' (string is 1 character).
''
function RetrieveToken( _
InputBuf as wchar_t ptr, _ ' The input data.
InputSize as long, _ ' Size of the input data.
InputHere as long ptr, _ ' Index into input data.
nline as long ptr, _ ' Current line number.
nColumn as long ptr, _ ' Current column number.
Symbol as integer ptr _
) as wchar_t ptr
dim DfaIndex as integer ' Index into Grammar.DfaArray[].
dim Length as long ' Number of processed characters from data->InputBuf.
dim AcceptIndex as integer ' Longest found symbol so far.
dim AcceptLength as long ' Length of longest found symbol.
dim i as long
' Sanity check (no input).
if ((InputBuf = null) or (InputBuf[0] = 0)) then
*Symbol = 0
return null
end if
' If there are no more characters in the input then return SYMBOLEOF and NULL.
if (*InputHere >= InputSize) then
*Symbol = 0
return null
end if
' Compare characters from the input with the DFA charactersets until not found.
DfaIndex = Grammar.InitialDfaState
Length = 0
AcceptLength = 0
AcceptIndex = -1
while (*InputHere + Length) < InputSize
' if this is a valid symbol-terminal then save it. We know the
' input matches the symbol, but there may be a longer symbol that
' matches so we have to keep scanning.
if Grammar.DfaArray[DfaIndex].AcceptSymbol >= 0 then
AcceptIndex = DfaIndex
AcceptLength = Length
end if
' Walk through the edges and scan the characterset of each edge for the current character.
for i = 0 to Grammar.DfaArray[DfaIndex].EdgeCount - 1
if FindChar( InputBuf[*InputHere + Length], _
Grammar.DfaArray[DfaIndex].Edges[i].CharacterSet, _
Grammar.DfaArray[DfaIndex].Edges[i].CharCount) = 1 then
exit for
end if
next
' If not found then exit the loop.
if (i >= Grammar.DfaArray[DfaIndex].EdgeCount) then exit while
' Jump to the TargetState, which points to another set of DFA edges
' describing the next character.
DfaIndex = Grammar.DfaArray[DfaIndex].Edges[i].TargetState
' Increment the Length, we have handled the character.
Length = Length + 1
wend
' if the DFA is a terminal then return the Symbol, and Length characters from the input.
if Grammar.DfaArray[DfaIndex].AcceptSymbol >= 0 then
*Symbol = Grammar.DfaArray[DfaIndex].AcceptSymbol
return ReadString( InputBuf,InputSize,InputHere,nline,nColumn,Length )
end if
' If we found a shorter terminal before, then return that Symbol, and it's characters.
if AcceptIndex >= 0 then
*Symbol = Grammar.DfaArray[AcceptIndex].AcceptSymbol
return ReadString( InputBuf,InputSize,InputHere,nline,nColumn,AcceptLength )
end if
' Return SYMBOLERROR and a string with 1 character from the input.
*Symbol = 1
return ReadString( InputBuf,InputSize,InputHere,nline,nColumn,1 )
end function
'********* LALR Tokenizer ********************************************************
'' LALR state machine. Depending on the Token->Symbol the machine will
'' change it's state and perform actions, such as reduce the TokenStack and
'' iteratively call itself.
#define LALRMEMORYERROR 0
#define LALRSYNTAXERROR 1
#define LALRACCEPT 2
#define LALRSHIFT 3
#define LALRGOTO 4
function ParseToken( _
LalrState as integer ptr, _
TokenStack as TokenStackStruct ptr ptr, _
InputToken as TokenStackStruct ptr, _
TrimReductions as integer, _
Debug as integer _
) as integer
dim PopToken as TokenStackStruct ptr
dim Reduction as TokenStackStruct ptr
dim Action as integer
dim Rule as integer
dim i as integer
' Find the Token->Symbol in the LALR table.
Action = 0
while Action < Grammar.LalrArray[*LalrState].ActionCount
if Grammar.LalrArray[*LalrState].Actions[Action].Entry = InputToken->Token->Symbol then
exit while
end if
Action += 1
wend
' If not found then exit with SYNTAXERROR. The Token is not allowed in this context.
if Action >= Grammar.LalrArray[*LalrState].ActionCount then
if Debug > 0 then
print "LALR Syntax error: symbol "; InputToken->Token->Symbol; " not found in LALR table "; *LalrState
end if
return LALRSYNTAXERROR
end if
' ACTIONACCEPT: exit. We're finished parsing the input.
if Grammar.LalrArray[*LalrState].Actions[Action].Action = ACTIONACCEPT then
if Debug > 0 then
print "LALR Accept: Target="; Grammar.LalrArray[*LalrState].Actions[Action].Target
end if
return LALRACCEPT
end if
' ACTIONSHIFT: switch the LALR state and return. We're ready to accept the next token.
if Grammar.LalrArray[*LalrState].Actions[Action].Action = ACTIONSHIFT then
*LalrState = Grammar.LalrArray[*LalrState].Actions[Action].Target
if Debug > 0 then
print "LALR Shift: Lalr="; *LalrState
end if
return LALRSHIFT
end if
' ACTIONGOTO: switch the LALR state and return. We're ready to accept the next token.
' Note: In my implementation SHIFT and GOTO do the exact same thing. As far
' as I can tell GOTO only happens just after a reduction. Perhaps GOLD makes
' the difference to allow the program to perform special actions, which my
' implementation does not need.
if Grammar.LalrArray[*LalrState].Actions[Action].Action = ACTIONGOTO then
*LalrState = Grammar.LalrArray[*LalrState].Actions[Action].Target
if Debug > 0 then
print "LALR Goto: Lalr="; *LalrState
end if
return LALRGOTO
end if
' ACTIONREDUCE:
' Create a new Reduction according to the Rule that is specified by the action.
' - Create a new Reduction in the ReductionArray.
' - Pop tokens from the TokenStack and add them to the Reduction.
' - Push a new token on the TokenStack for the Reduction.
' - Iterate.
'
Rule = Grammar.LalrArray[*LalrState].Actions[Action].Target
if Debug > 0 then
print "LALR Reduce: Lalr="; *LalrState; " TargetRule="; _
wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Head].wname); "["; _
str(Grammar.RuleArray[Rule].Head); "] ==> "; wPrint(Grammar.RuleArray[Rule].Description)
end if
' if TrimReductions is active, and the Rule contains a single non-terminal,
' then eleminate the unneeded reduction by modifying the Rule on the stack
' into this Rule.
'
if ((TrimReductions <> 0) and _
(Grammar.RuleArray[Rule].SymbolsCount = 1) and _
(Grammar.SymbolArray[Grammar.RuleArray[Rule].Symbols[0]].Kind = SYMBOLNONTERMINAL)) then
if Debug > 0 then
print "LALR TrimReduction."
end if
' Pop the Rule from the TokenStack.
PopToken = *TokenStack
*TokenStack = PopToken->NextToken
' Rewind the LALR state.
*LalrState = PopToken->LalrState
' Change the Token into the Rule.
PopToken->Token->Symbol = Grammar.RuleArray[Rule].Head
' Feed the Token to the LALR state machine.
ParseToken LalrState,TokenStack,PopToken,TrimReductions,Debug
' Push the modified Token back onto the TokenStack.
PopToken->NextToken = *TokenStack
*TokenStack = PopToken
' Save the new LALR state in the input token.
InputToken->LalrState = *LalrState
' Feed the input Token to the LALR state machine and exit.
return ParseToken(LalrState,TokenStack,InputToken,TrimReductions,Debug)
end if
' Allocate and initialize memory for the Reduction.
Reduction = callocate(sizeof(TokenStackStruct))
if Reduction = null then return LALRMEMORYERROR
Reduction->Token = callocate(sizeof(TokenStruct))
if Reduction->Token = null then
deallocate Reduction
return LALRMEMORYERROR
end if
Reduction->Token->ReductionRule = Rule
Reduction->Token->Tokens = callocate(sizeof(TokenStruct) * Grammar.RuleArray[Rule].SymbolsCount)
if Reduction->Token->Tokens = null then
deallocate Reduction->Token
deallocate Reduction
return LALRMEMORYERROR
end if
Reduction->Token->Symbol = Grammar.RuleArray[Rule].Head
Reduction->Token->ndata = null
Reduction->Token->nline = InputToken->Token->nline
Reduction->Token->Column = InputToken->Token->Column
Reduction->LalrState = *LalrState
Reduction->NextToken = null
' Reduce tokens from the TokenStack by moving them to the Reduction.
' The Lalr state will be rewound to the state it was for the first
' symbol of the rule.
for i = Grammar.RuleArray[Rule].SymbolsCount to 1 step -1
PopToken = *TokenStack
*TokenStack = PopToken->NextToken
PopToken->NextToken = null
if Debug > 0 then
if PopToken->Token->ndata <> null then
print " + Symbol="; wPrint(Grammar.SymbolArray[PopToken->Token->Symbol].wname); _
"["; str(PopToken->Token->Symbol); "] RuleSymbol="; wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Symbols[i-1]].wname); _
"["; str(Grammar.RuleArray[Rule].Symbols[i-1]); "] value='"; wPrint(PopToken->Token->ndata); "' Lalr="; PopToken->LalrState
else
print " + Symbol="; wPrint(Grammar.SymbolArray[PopToken->Token->Symbol].wname); _
"["; str(PopToken->Token->Symbol); "] RuleSymbol="; wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Symbols[i-1]].wname); _
"["; str(Grammar.RuleArray[Rule].Symbols[i-1]); "] Lalr="; PopToken->LalrState
end if
end if
Reduction->Token->Tokens[i - 1] = PopToken->Token
*LalrState = PopToken->LalrState
Reduction->LalrState = PopToken->LalrState
Reduction->Token->nline = PopToken->Token->nline
Reduction->Token->Column = PopToken->Token->Column
deallocate PopToken
next
' Call the LALR state machine with the Symbol of the Rule.
if Debug > 0 then
print "Calling Lalr 1: Lalr="; *LalrState; " Symbol="; _
wPrint(Grammar.SymbolArray[Grammar.RuleArray[Rule].Head].wname); _
"["; str(Grammar.RuleArray[Rule].Head); "]"
end if
ParseToken LalrState,TokenStack,Reduction,TrimReductions,Debug
' Push new Token on the TokenStack for the Reduction.
Reduction->NextToken = *TokenStack
*TokenStack = Reduction
' Save the current LALR state in the InputToken. We need this to be
' able to rewind the state when reducing.
InputToken->LalrState = *LalrState
' call the LALR state machine with the InputToken. The state has
' changed because of the reduction, so we must accept the token again.
if Debug > 0 then
print "Calling Lalr 2: Lalr="; *LalrState; " Symbol="; _
wPrint(Grammar.SymbolArray[InputToken->Token->Symbol].wname); _
"["; str(InputToken->Token->Symbol); "]"
end if
return ParseToken(LalrState,TokenStack,InputToken,TrimReductions,Debug)
end function
'********* Exported functions ****************************************************
' Delete a Token from memory, including all the reductions and data.
sub DeleteTokens( Token as TokenStruct ptr )
dim i as integer
if Token = null then return
if Token->ndata <> null then deallocate Token->ndata
if Token->ReductionRule >= 0 then
for i = 0 to Grammar.RuleArray[Token->ReductionRule].SymbolsCount - 1
DeleteTokens Token->Tokens[i]
next
deallocate Token->Tokens
end if
end sub
sub ParseCleanup( _
pTop as TokenStackStruct ptr, _
pNew as TokenStackStruct ptr, _
pFirstToken as TokenStruct ptr ptr )
dim pOldTop as TokenStackStruct ptr
*pFirstToken = null
if pTop <> null then
*pFirstToken = pTop->Token
pOldTop = pTop
pTop = pTop->NextToken
deallocate pOldTop
end if
if pNew <> null then
DeleteTokens *pFirstToken
*pFirstToken = pNew->Token
deallocate pNew
end if
while pTop <> null
DeleteTokens pTop->Token
pOldTop = pTop
pTop = pTop->NextToken
deallocate pOldTop
wend
end sub
'
' Parse the input data.
' Returns a pointer to a ParserData struct, NULL if insufficient memory.
' The Data->Result value will be one of these values:
' PARSEACCEPT Input parsed, no errors.
' PARSELEXICALERROR Input could not be tokenized.
' PARSETOKENERROR Input is an invalid token.
' PARSESYNTAXERROR Input does not match any rule.
' PARSECOMMENTERROR A comment was started but not finished.
' PARSEMEMORYERROR Insufficient memory.
'
'
function Parse( _
InputBuf as wchar_t ptr, _ ' pointer to the input data.
InputSize as long, _ ' Number of characters in the input.
TrimReductions as integer, _ ' 0 = don't trim, 1 = trim reductions.
Debug as integer, _ ' 0 = no debug, 1 = print debug info.
FirstToken as TokenStruct ptr ptr _ ' output token.
) as integer
dim LalrState as integer ' Index into Grammar.LalrArray[].
dim TokenStack as TokenStackStruct ptr ' stack of Tokens.
dim Work as TokenStackStruct ptr ' Current token.
dim InputHere as long ' Index into input.
dim nline as long ' line number.
dim Column as long ' Column number.
dim CommentLevel as integer ' Used when skipping comments, nested comment count.
dim Result as integer ' Result from ParseToken().
' Initialize variables.
LalrState = Grammar.InitialLalrState
TokenStack = null
InputHere = 0
nline = 1
Column = 1
CommentLevel = 0
*FirstToken = null
' Sanity check.
if ((InputBuf = null) or (*InputBuf = 0) or (InputSize = 0)) then
return PARSEACCEPT
end if
' Accept tokens until finished.
while (1)
' Create a new Token. Exit if out of memory.
Work = callocate(sizeof(TokenStackStruct))
if Work = null then
ParseCleanup TokenStack,null,FirstToken
return PARSEMEMORYERROR
end if
Work->LalrState = LalrState
Work->NextToken = null
Work->Token = callocate(sizeof(TokenStruct))
if Work->Token = null then
deallocate Work
ParseCleanup TokenStack,null,FirstToken
return PARSEMEMORYERROR
end if
Work->Token->ReductionRule = -1
Work->Token->Tokens = null
Work->Token->nline = nline
Work->Token->Column = Column
' Call the DFA tokenizer and parse a token from the input.
Work->Token->ndata = RetrieveToken( InputBuf,InputSize,@InputHere,@nline, _
@Column,@Work->Token->Symbol )
if (Work->Token->ndata = null) and (Work->Token->Symbol <> 0) then
ParseCleanup TokenStack,Work,FirstToken
return PARSEMEMORYERROR
end if
' If we are inside a comment then ignore everything except the end
' of the comment, or the start of a nested comment.
if CommentLevel > 0 then
' Begin of nested comment:
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTSTART then
' Push the Token on the TokenStack to keep track of line+column.
Work->NextToken = TokenStack
TokenStack = Work
CommentLevel = CommentLevel + 1
continue while
end if
' End of comment:
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTEND then
' Delete the Token.
if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
deallocate Work->Token
deallocate Work
' Pop the comment-start Token from the TokenStack and delete that as well.
Work = TokenStack
TokenStack = Work->NextToken
if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
deallocate Work->Token
deallocate Work
CommentLevel = CommentLevel - 1
continue while
end if
' End of file: Error exit. A comment was started but not finished.
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLEOF then
if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
deallocate Work->Token
deallocate Work
ParseCleanup TokenStack,null,FirstToken
return PARSECOMMENTERROR
end if
' Any other Token: delete and loop.
if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
deallocate Work->Token
deallocate Work
continue while
end if
' If the token is the start of a comment then increment the
' CommentLevel and loop. The routine will keep reading tokens
' until the end of the comment.
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTSTART then
if Debug > 0 then print "Parse: skipping comment."
' Push the Token on the TokenStack to keep track of line+column.
Work->NextToken = TokenStack
TokenStack = Work
CommentLevel = CommentLevel + 1
continue while
end if
' If the token is the start of a linecomment then skip the rest of the line.
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLCOMMENTLINE then
if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
deallocate Work->Token
deallocate Work
while (InputHere < InputSize) and _
(InputBuf[InputHere] <> CR) and _
(InputBuf[InputHere] <> LF)
InputHere = InputHere + 1
wend
if ((InputHere < InputSize) and _
(InputBuf[InputHere] = CR)) then
InputHere = InputHere + 1
end if
if ((InputHere < InputSize) and _
(InputBuf[InputHere] = LF)) then
InputHere = InputHere + 1
end if
nline = nline + 1
Column = 1
continue while
end if
' If parse error then exit.
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLERROR then
ParseCleanup TokenStack,Work,FirstToken
return PARSELEXICALERROR
end if
' Ignore whitespace.
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLWHITESPACE then
if Work->Token->ndata <> null then deallocate(Work->Token->ndata)
deallocate Work->Token
deallocate Work
continue while
end if
' The tokenizer should never return a non-terminal symbol.
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLNONTERMINAL then
if Debug > 0 then
print "Error: tokenizer returned SYMBOLNONTERMINAL '"; wPrint(Work->Token->ndata); "'."
end if
ParseCleanup TokenStack,Work,FirstToken
return PARSETOKENERROR
end if
if Debug > 0 then
print "Token Read: Lalr="; LalrState; " Symbol="; _
wPrint(Grammar.SymbolArray[Work->Token->Symbol].wname); _
"["; str(Work->Token->Symbol); "] value='"; wPrint(Work->Token->ndata); "'"
end if
' Feed the Symbol to the LALR state machine. It can do several
' things, such as wind back and iteratively call itself.
Result = ParseToken(@LalrState,@TokenStack,Work,TrimReductions,Debug)
' If out of memory then exit.
if Result = LALRMEMORYERROR then
ParseCleanup TokenStack,Work,FirstToken
return PARSEMEMORYERROR
end if
' If syntax error then exit.
if Result = LALRSYNTAXERROR then
' Return LALR state in the Token->Symbol.
Work->Token->Symbol = LalrState
ParseCleanup TokenStack,Work,FirstToken
return PARSESYNTAXERROR
end if
' Exit if the LALR state machine says it has reached it's exit.
if Result = LALRACCEPT then
if Grammar.SymbolArray[Work->Token->Symbol].Kind = SYMBOLEOF then
if Work->Token->ndata <> null then free Work->Token->ndata
free Work->Token
free Work
end if
ParseCleanup TokenStack,null,FirstToken
return PARSEACCEPT
end if
' Push the token onto the TokenStack.
Work->NextToken = TokenStack
TokenStack = Work
wend
' Should never get here.
end function
"engine.bi" include file:
Code: Select all
''
'' McLovin FreeBASIC engine header include (engine.bi)
''
'' Based on translation of C code from:
'' Kessels engine for GOLD.
'' Jeroen C. Kessels
'' Internet Engineer
'' http://www.kessels.com/
''
'' Copyright: Jeroen C. Kessels
'' Date: 22 march 2006
'' Version: 1.8
''
''
'' Return values of the Parse() function.
#define PARSEACCEPT 0 ' input parsed, no errors.
#define PARSELEXICALERROR 1 ' input could not be tokenized.
#define PARSETOKENERROR 2 ' input is an invalid token.
#define PARSESYNTAXERROR 3 ' input does not match any rule.
#define PARSECOMMENTERROR 4 ' a comment was started but not finished.
#define PARSEMEMORYERROR 5 ' insufficient memory.
'' SymbolStruct types (defined by GOLD).
#define SYMBOLNONTERMINAL 0
#define SYMBOLTERMINAL 1
#define SYMBOLWHITESPACE 2
#define SYMBOLEOF 3
#define SYMBOLCOMMENTSTART 4
#define SYMBOLCOMMENTEND 5
#define SYMBOLCOMMENTLINE 6
#define SYMBOLERROR 7
'' ActionStruct types (defined by GOLD).
#define ACTIONSHIFT 1
#define ACTIONREDUCE 2
#define ACTIONGOTO 3
#define ACTIONACCEPT 4
'' CaseSensitive values (defined by GOLD).
#define False 0
#define True 1
#include once "\crt\wchar.bi" ' wchar_t
'' Grammar table and sub-tables.
type SymbolStruct ' Grammar.SymbolArray[]
Kind as short ' 0...7, See SYMBOL defines.
wname as wchar_t ptr ' String with name of symbol.
end type
type DfaEdgeStruct ' Grammar.DfaArray[].Edges[]
TargetState as integer ' Index into Grammar.DfaArray[].
CharCount as integer ' Number of characters in the charset.
CharacterSet as wchar_t ptr ' String with characters.
end type
type DfaStateStruct ' Grammar.DfaArray[]
AcceptSymbol as integer ' -1 (Terminal), or index into Grammar.SymbolArray[].
EdgeCount as integer ' Number of items in Edges[] array.
Edges as DfaEdgeStruct ptr ' Array of DfaEdgeStruct.
end type
type RuleStruct ' Grammar.RuleArray[]
Head as integer ' Index into Grammar.SymbolArray[].
SymbolsCount as integer ' Number of items in Symbols[] array.
Symbols as integer ptr ' Array of indexes into Grammar.SymbolArray[].
Description as wchar_t ptr ' String with BNF of the rule.
end type
type ActionStruct ' Grammar.LalrArray[].Actions[]
Entry as integer ' Index into Grammar.SymbolArray[].
Action as short ' 1...4, see ACTION defines.
Target as integer ' If Action=SHIFT then index into Grammar.LalrArray[].
' If Action=REDUCE then index into Grammar.RuleArray[].
' If Action=GOTO then index into Grammar.LalrArray[].
end type
type LalrStateStruct ' Grammar.LalrArray[]
ActionCount as integer ' Number of items in Actions[] array.
Actions as ActionStruct ptr ' Array of ActionStruct.
end type
type GrammarStruct ' Grammar
CaseSensitive as byte ' 'True' or 'False'.
InitialSymbol as integer ' Index into Grammar.SymbolArray[].
InitialDfaState as integer ' Index into Grammar.DfaArray[].
InitialLalrState as integer ' Index into Grammar.LalrArray[].
SymbolCount as integer ' Number of items in Grammar.SymbolArray[].
SymbolArray as SymbolStruct ptr
RuleCount as integer ' Number of items in Grammar.RuleArray[].
RuleArray as RuleStruct ptr
DfaStateCount as integer ' Number of items in Grammar.DfaArray[].
DfaArray as DfaStateStruct ptr
LalrStateCount as integer ' Number of items in Grammar.LalrArray[].
LalrArray as LalrStateStruct ptr
end type
extern Grammar as GrammarStruct
'' Output from the parser.
type TokenStruct
ReductionRule as integer ' Index into Grammar.RuleArray[].
Tokens as TokenStruct ptr ptr ' Array of reduction Tokens.
Symbol as integer ' Index into Grammar.SymbolArray[].
ndata as wchar_t ptr ' String with data from the input.
nline as long ' Line number in the input.
Column as long ' Column in the input.
end type
'' Exported functions.
declare function Parse( _
InputBuf as wchar_t ptr, _ ' Pointer to the input data.
InputSize as long, _ ' Number of characters in the input.
TrimReductions as integer, _ ' 0 = don't trim, 1 = trim reductions.
Debug as integer, _ ' 0 = no debug, 1 = print debug info.
Token as TokenStruct ptr ptr _ ' Output, the first Token.
) as integer
declare sub DeleteTokens( Token as TokenStruct ptr )
declare function RetrieveToken( _
InputBuf as wchar_t ptr, _ ' The input data.
InputSize as long, _ ' Size of the input data.
InputHere as long ptr, _ ' Index into input data.
nine as long ptr, _ ' Current line number.
Column as long ptr, _ ' Current column number.
Symbol as integer ptr _
) as wchar_t ptr
declare function wPrint( byval pByte as ushort ptr ) as string
Code: Select all
''
''
'' Example of how to run the McLovin FreeBASIC engine and show the results.
'' Output is similar to the parse tree in the GOLD test window.
'' (based on translation of Kessels C engine)
''
''
'
' Link this example with engine.o
' e.g. fbc example1.bas engine.o
'
#include once "engine.bi" ' The McLovin FB engine.
#Include Once "simple.bi" ' Generated by GOLD.
#define TRIMREDUCTIONS 1 ' 0=off, 1=on
#define DEBUG 1 ' 0=off, 1=on
' Load input file from disk into memory.
function LoadInputFile( FileName as zstring ptr ) as wchar_t ptr
dim Fin as integer
dim filesize as uinteger
dim Buf1 as string
dim Buf2 as wstring ptr
' Sanity check.
if ((FileName = null) or (*FileName = "")) then return(null)
' Open the file.
Fin = freefile
if open( *FileName for binary access read as #fin ) <> 0 then
print "Could not open input file: " & *FileName
return null
end if
' Get the size of the file.
filesize = lof( Fin )
' Allocate memory for the input.
Buf1 = string(filesize, 0)
Buf2 = callocate(sizeof(wchar_t) * (filesize + 1))
' Load the file into memory.
get #Fin, , Buf1
' Close the file.
close #Fin
' Convert from ASCII to Unicode.
*Buf2 = wStr(Buf1)
return Buf2
end function
'' Make a readable copy of a string. All characters outside 32...128 are
'' displayed as a HEX number in square brackets, for example "[0A]".
sub ReadableString( byval wInChars as wchar_t ptr, wOutChars as zstring ptr, nWidth as long )
dim i1 as long
dim i2 as long
' *wOutChars = wPrint(wInChars)
' return
' Sanity check.
if ((wOutChars = null) or (nWidth < 1)) then return
wOutChars[0] = 0
if (wInChars = null) then return
i1 = 0
i2 = 0
while wInChars[i1] <> 0
if ((wInChars[i1] >= 32) and (wInChars[i1] <= 127)) then
if nWidth > 1 then
wOutChars[i2] = wInChars[i1]
i2 = i2 + 1
end if
else
if nWidth > 4 then
woutchars[i2] = asc("[")
woutchars[i2] = asc(hex(*wInChars,2),1)
woutchars[i2] = asc(hex(*wInChars,2),2)
woutchars[i2] = asc("]")
i2 = i2 + 1
end if
end if
i1 = i1 + 1
wend
woutchars[i2] = 0
' print
end sub
sub ShowErrorMessage( Token as TokenStruct ptr, Result as integer )
dim Symbol as integer
dim i as integer
dim s1 as zstring * BUFSIZ
select case Result
case PARSELEXICALERROR
print "Lexical error";
case PARSECOMMENTERROR
print "Comment error";
case PARSETOKENERROR
print "Tokenizer error";
case PARSESYNTAXERROR
print "Syntax error";
case PARSEMEMORYERROR
print "Out of memory";
end select
if Token <> null then print " at line "; str(Token->nLine); " column "; str(Token->Column)
select case Result
case PARSELEXICALERROR
if Token->nData <> null then
'ReadableString Token->nData, @s1, BUFSIZ
print "The grammar does not specify what to do with '"; wPrint(Token->nData); "'"
'print "The grammar does not specify what to do with '"; s1; "'"
else
print "The grammar does not specify what to do."
end if
case PARSETOKENERROR
print "The tokenizer returned a non-terminal."
case PARSECOMMENTERROR
print "The comment has no end, it was started but not finished."
case PARSESYNTAXERROR
if Token->nData <> null then
'ReadableString Token->nData,@s1,BUFSIZ
print "Encountered '"; wPrint(Token->nData); "', but expected ";
'print "Encountered '"; s1; "', but expected ";
else
print "Expected ";
end if
for i = 0 to Grammar.LalrArray[Token->Symbol].ActionCount - 1
Symbol = Grammar.LalrArray[Token->Symbol].Actions[i].Entry
if Grammar.SymbolArray[Symbol].Kind = SYMBOLTERMINAL then
if i > 0 then
print ", ";
if i >= Grammar.LalrArray[Token->Symbol].ActionCount - 2 then print "or ";
end if
print "'"; wPrint(Grammar.SymbolArray[Symbol].wName); "'"
end if
next
print "."
end select
end sub
' Display all the Tokens, including reductions.
sub ShowTokens( Token as TokenStruct ptr, Indent as integer )
dim s1 as zstring * BUFSIZ
dim i as integer
' Sanity check.
if Token = null then return
' Show line and column numbers.
print str(Token->nLine); " "; str(Token->Column); " ";
' Indent.
for i = 0 to Indent - 1
print " ";
next
' Show the Token.
if Token->ReductionRule < 0 then
' It's a symbol.
if Token->nData = null then
print wPrint(Grammar.SymbolArray[Token->Symbol].wName)
else
'ReadableString Token->nData,@s1,BUFSIZ
print wPrint(Grammar.SymbolArray[Token->Symbol].wName); " = '"; wPrint(Token->nData); "'"
end if
else
' It's a reduction.
print wPrint(Grammar.RuleArray[Token->ReductionRule].Description)
for i = 0 to Grammar.RuleArray[Token->ReductionRule].SymbolsCount - 1
ShowTokens Token->Tokens[i],Indent+1
next
end if
end sub
dim InputBuf as wchar_t ptr
dim Token as TokenStruct ptr
dim Result as integer
' Load the inputfile into memory.
InputBuf = LoadInputFile("Example.input")
'InputBuf = LoadInputFile("vbnet_sample.input")
if InputBuf = null then end(1)
' run the Parser.
Result = Parse(InputBuf,wcslen(InputBuf),TRIMREDUCTIONS,DEBUG,@Token)
' Interpret the results.
if Result <> PARSEACCEPT then
ShowErrorMessage Token,Result
else
ShowTokens Token,0
end if
' Cleanup.
DeleteTokens Token
free InputBuf
:-)
Great work on the Gold Parser engine, McLovin! Writing a complete grammer for a language like FB might not be something easily done with the Gold Parser but it is a nice tool to use for writing little languages/all sorts of parsing tasks.
Have a look at JS/CC
http://jscc.jmksf.com/
if you want to experiment some more with parsing.
It has some features that might help in avoiding those nasty reduce - reduce and shift - reduce conflicts. It is also (like the Gold Parser) a visual compiler compiler. JS/CC (JS/CC stands for JavaScript Compiler Compiler), uses the webbrowser to visualize things and uses a syntax more geared towards Yacc/Bison.
Perhaps you could get the site admin to post your gold engine in the library repository. I think it might be something more people would like to have because using a visual compiler compiler like the Gold Parser is a perfect way to get into the whole compiler writing/parsing 'thing'.
Have a look at JS/CC
http://jscc.jmksf.com/
if you want to experiment some more with parsing.
It has some features that might help in avoiding those nasty reduce - reduce and shift - reduce conflicts. It is also (like the Gold Parser) a visual compiler compiler. JS/CC (JS/CC stands for JavaScript Compiler Compiler), uses the webbrowser to visualize things and uses a syntax more geared towards Yacc/Bison.
Perhaps you could get the site admin to post your gold engine in the library repository. I think it might be something more people would like to have because using a visual compiler compiler like the Gold Parser is a perfect way to get into the whole compiler writing/parsing 'thing'.
Last edited by AGS on Dec 23, 2008 5:08, edited 1 time in total.
The grammer I posted earlier contains some garbage. Some rules are defined twice, other rules are not rules. That's what you get when using a single regular expression to extract info from a file.
There is more in the source files about the grammer than what has been post ed. I will be back to update the grammer.
There is more in the source files about the grammer than what has been post ed. I will be back to update the grammer.
Thanks AGS, I worked on an FB grammar for a week or two using the VB.Net, VBScript and C grammars as guides. I had never written a grammar in GOLD before. I got a fair amount of stuff defined and working until I got lost in the "expressions" - that's where the reduce/reduce errors occurred. It was a very nice learning exercise and I will take up the challenge again after Christmas. It is certainly not a one-to-one translation of the current FB syntax but it is amazing just how easy it is to take various FB syntaxes and make them into a GOLD grammar (e.g. the various way to declare a variable).
I started jamming the C expressions definitions into my grammar and totally screwed it up. Here is the current state of the grammar. If anyone wants to use it then certainly go ahead and make changes. As I learn more about defining a language then I will fix it up. It is a little bit of a hack here and hack there at the moment because I was learning as I went along.
Good luck!
I started jamming the C expressions definitions into my grammar and totally screwed it up. Here is the current state of the grammar. If anyone wants to use it then certainly go ahead and make changes. As I learn more about defining a language then I will fix it up. It is a little bit of a hack here and hack there at the moment because I was learning as I went along.
Code: Select all
!================================================
!
! FreeBASIC 2 (GOLD Edition) Grammar
!================================================
"Name" = 'FreeBASIC Version 2 (GOLD Edition)'
"Author" = 'McLovin: Fogell'
"Version" = '2.00'
"About" = 'A specification for the FreeBASIC language'
"Case Sensitive" = False
"Start Symbol" = <Program>
{Hex Digit} = {Digit} + [abcdefABCDEF]
{Oct Digit} = [01234567]
{String Ch} = {Printable} - ["]
{Char Ch} = {Printable} - ['']
{WS} = {Whitespace} - {CR} - {LF}
DecLiteral = [123456789]{digit}*
HexLiteral = '&H' {Hex Digit}+ '&'?
OctLiteral = '&' {Oct Digit}+ '&'?
FloatLiteral = {digit}* '.' {digit}+ ( 'E' [+-]? {Digit}+ )? [FR]?
| {digit}+ 'E' [+-]? {Digit}+ [FR]?
StringLiteral = '"'( {String Ch} | '\'{Printable} )* '"'
CharLiteral = '' ( {Char Ch} | '\'{Printable} )''
Id = [_]?{Letter}{ID Tail}*
{Id Tail} = {Alphanumeric} + [_]
! ----------------------------------------------------------------- Terminals
NewLine = {CR} {LF}
| {CR}
| {LF}
! | ':'
! Special white space definition. Whitespace is either space or tab, which
! can be followed by continuation symbol '_' followed by new line character
Whitespace = {WS}+
| '_' {WS}* {CR}? {LF}?
! ===================================================================
! Comments
! ===================================================================
! Special comment definitions
Comment Line = '' | REM
Comment Start = '/'''
Comment End = '''/'
<NL> ::= NewLine <NL>
| NewLine
<NLopt> ::= <NL>
|
<Program> ::= <NLopt> <GlobalStmtList>
<GlobalStmt> ::= <Declare>
| <NameSpace>
| <ConstDecl>
| <SubFunction>
| <Structure>
| <Enumeration>
| <BlockStmt>
<GlobalStmtList> ::= <GlobalStmt> <GlobalStmtList>
|
<MethodStmt> ::= <ConstDecl>
| <BlockStmt>
<MethodStmtList> ::= <MethodStmt> <MethodStmtList>
|
<BlockStmtList> ::= <BlockStmt> <BlockStmtList>
|
<BlockStmt> ::= <VarDecl>
! | <WithStmt>
! | <RedimStmt>
! | <LoopStmt>
! | <ForStmt>
! | <IfStmt>
! | <SelectStmt>
| <InlineStmt> <NL>
<InlineStmt> ::= <Stm List>
! | <MethodCall>
! -------------------------------------------------------------------
! Variable Declaration
! -------------------------------------------------------------------
<IntLiteral> ::= IntLiteral
| HexLiteral
| OctLiteral
<VarDimSharedOpt> ::= 'Shared'
|
<VarDim> ::= 'Dim'
| 'Static'
<ArrayBounds> ::= <Expr> ',' <ArrayBounds>
| <Expr> 'To' <Expr> ',' <ArrayBounds>
| <Expr>
| <Expr> 'To' <Expr>
<VarName> ::= ID '(' <ArrayBounds> ')' ! handle arrays
| ID
<VarNames> ::= <VarName> <VarInit> ',' <VarNames>
| <VarName> <DataType> <VarInit> ',' <VarNames>
| <VarName> <VarInit>
| <VarName> <DataType> <VarInit>
! TODO: Need to modify in order to handle initializing UDT's with embedded arrays and arrays of UDT's with embedded arrays
! { (100,{1,2,3,4,5}) }
<VarInitValueOpt> ::= ',' <Value> <VarInitValueOpt>
|
<VarInitValue> ::= <Value>
| 'Any'
| '{' <Value> <VarInitValueOpt> '}'
<VarInit> ::= '=' <VarInitValue>
| '=>' <VarInitValue>
|
<VarDecl> ::= <VarDim> <VarDimSharedOpt> <DataType> <VarNames> <NL> ! Dim Shared As Integer a=1, b=2, c=3
| <VarDim> <VarDimSharedOpt> <VarNames> <NL> ! Dim Shared a As Integer=1, b As String="p", c As Byte=3
! -------------------------------------------------------------------
! CONST Declaration
! -------------------------------------------------------------------
<ConstDecl> ::= 'Const' <ConstList> <NL>
<ConstList> ::= ID '=' <Value> ',' <ConstList>
| ID '=' <Value>
! -------------------------------------------------------------------
! Parameters
! -------------------------------------------------------------------
<ParamListOpt> ::= <ParamList>
|
<ParamList> ::= '(' <ParamItems> ')'
| '(' ')'
<ParamItems> ::= <ParamItem> ',' <ParamItems>
| <ParamItem>
<ParamStatic> ::= 'Static'
|
<ParamItem> ::= <ParamPassing> ID <DataType>
<ParamPassing> ::= 'ByVal'
| 'ByRef'
| 'Optional'
|
! -------------------------------------------------------------------
! Declares (External Procedures)
! -------------------------------------------------------------------
<Declare> ::= 'Declare' 'Sub' ID <DeclareLib> <DeclareAlias> <ParamListOpt> <NL>
| 'Declare' 'Function' ID <DeclareLib> <DeclareAlias> <ParamListOpt> <DataType> <NL>
<DeclareLib> ::= 'Lib' StringLiteral
| ! allow to be optional
<DeclareAlias> ::= 'Alias' StringLiteral
| ! allow to be optional
! -------------------------------------------------------------------
! Sub/Function Methods
! -------------------------------------------------------------------
<AccessOpt> ::= 'Public'
| 'Private'
|
<SubFunction> ::= <AccessOpt> 'Sub' ID <ParamListOpt> <ParamStatic> <NL> <MethodStmtList> 'End' 'Sub' <NL>
| <AccessOpt> 'Function' ID <ParamListOpt> <DataType> <ParamStatic> <NL> <MethodStmtList> 'End' 'Function' <NL>
! -------------------------------------------------------------------
! Enumerations
! -------------------------------------------------------------------
<Enumeration> ::= 'Enum' ID <NL> <EnumList> 'End' 'Enum' <NL>
| 'Enum' ID 'Explicit' <NL> <EnumList> 'End' 'Enum' <NL>
<EnumList> ::= <EnumItem> <EnumList>
|
<EnumItem> ::= ID '=' <IntLiteral> <EnumItemOpt> <NL>
| ID <EnumItemOpt> <NL>
<EnumItemOpt> ::= ',' ID '=' <IntLiteral> <EnumItemOpt>
| ',' ID <EnumItemOpt>
|
! -------------------------------------------------------------------
! Type Structures
! -------------------------------------------------------------------
<StructurePadding> ::= 'Field' '=' <IntLiteral>
| ! allow to be optional
<Structure> ::= 'Type' ID <StructurePadding> <NL> <StructureList> 'End' 'Type' <NL>
<StructureList> ::= <StructureItem> <StructureList>
|
<StructureItem> ::= <VarNames> <NL>
| <DataType> <VarNames> <NL>
| <Enumeration>
! -------------------------------------------------------------------
! NameSpace Declaration
! -------------------------------------------------------------------
<NameSpaceAlias> ::= 'Alias' StringLiteral
|
<NameSpace> ::= 'NameSpace' ID <NameSpaceAlias> <NL> <NameSpaceItems> 'End' 'NameSpace' <NL>
<NameSpaceItems> ::= <NameSpaceItem> <NameSpaceItems>
|
<NameSpaceItem> ::= <Declare>
| <NameSpace>
| <ConstDecl>
| <VarDecl>
| <SubFunction>
| <Structure>
| <Enumeration>
! ===================================================================
! Function Declaration
!! ===================================================================
!
!<Func Proto> ::= <Func ID> '(' <Types> ')' ';'
! | <Func ID> '(' <Params> ')' ';'
! | <Func ID> '(' ')' ';'
!
!<Func Decl> ::= <Func ID> '(' <Params> ')' <Block>
! | <Func ID> '(' <Id List> ')' <Struct Def> <Block>
! | <Func ID> '(' ')' <Block>
!
!
!<Params> ::= <Param> ',' <Params>
! | <Param>
!
!<Param> ::= const <Type> ID
! | <Type> ID
!
!<Types> ::= <Type> ',' <Types>
! | <Type>
!
!<Id List> ::= Id ',' <Id List>
! | Id
!
!<Func ID> ::= <Type> ID
! | ID
! ===================================================================
! Type Declaration
!! ===================================================================
!
!<Typedef Decl> ::= typedef <Type> ID ';'
!
!<Struct Decl> ::= struct Id '{' <Struct Def> '}' ';'
!
!<Union Decl> ::= union Id '{' <Struct Def> '}' ';'
!
!
!<Struct Def> ::= <Var Decl> <Struct Def>
! | <Var Decl>
! ===================================================================
! Variable Declaration
! ===================================================================
<Var Decl> ::= <Mod> <Type> <Var> <Var List> ';'
| <Type> <Var> <Var List> ';'
| <Mod> <Var> <Var List> ';'
<Var> ::= ID <Array>
| ID <Array> '=' <Op If>
<Array> ::= '[' <Expr> ']'
| '[' ']'
|
<Var List> ::= ',' <Var Item> <Var List>
|
<Var Item> ::= <Pointers> <Var>
<Mod> ::= extern
| static
| register
| auto
| volatile
| const
! ===================================================================
! Enumerations
! ===================================================================
<Enum Decl> ::= enum Id '{' <Enum Def> '}' ';'
<Enum Def> ::= <Enum Val> ',' <Enum Def>
| <Enum Val>
<Enum Val> ::= Id
| Id '=' OctLiteral
| Id '=' HexLiteral
| Id '=' DecLiteral
! -------------------------------------------------------------------
! Data Types
! -------------------------------------------------------------------
<Type> ::= <Base> <Pointers>
<Base> ::= <Sign> <Scalar>
! | struct Id
! | struct '{' <Struct Def> '}'
! | union Id
! | union '{' <Struct Def> '}'
! | enum Id
<Sign> ::= signed
| unsigned
|
<Scalar> ::= 'Byte'
| 'uByte'
| 'Short'
| 'uShort'
| 'Integer'
| 'uInteger'
| 'Long'
| 'LongInt'
| 'uLongInt'
| 'Single'
| 'Double'
| 'String'
| 'zString'
| 'wString'
<Pointers> ::= '*' <Pointers>
|
<DataFixedString> ::= <Scalar> '*' <IntLiteral>
<DataTypePointer> ::= <Scalar> 'Pointer'
| <Scalar> 'Ptr'
<DataType> ::= 'As' <Scalar>
| 'As' <DataFixedString>
| 'As' <DataTypePointer>
| 'As' ID
! ===================================================================
! Statements
! ===================================================================
<Stm> ::= <Var Decl>
| Id ':' !Label
| if '(' <Expr> ')' <Stm>
| if '(' <Expr> ')' <Then Stm> else <Stm>
| while '(' <Expr> ')' <Stm>
| for '(' <Arg> ';' <Arg> ';' <Arg> ')' <Stm>
| <Normal Stm>
<Then Stm> ::= if '(' <Expr> ')' <Then Stm> else <Then Stm>
| while '(' <Expr> ')' <Then Stm>
| for '(' <Arg> ';' <Arg> ';' <Arg> ')' <Then Stm>
| <Normal Stm>
<Normal Stm> ::= do <Stm> while '(' <Expr> ')'
! | switch '(' <Expr> ')' '{' <Case Stms> '}'
! | <Block>
! | <Expr> ';'
! | goto Id ';'
! | break ';'
! | continue ';'
! | return <Expr> ';'
! | ';' !Null statement
<Arg> ::= <Expr>
|
<Case Stms> ::= case <Value> ':' <Stm List> <Case Stms>
| default ':' <Stm List>
|
<Block> ::= '{' <Stm List> '}'
<Stm List> ::= <Stm> <Stm List>
|
! ===================================================================
! Here begins the C's 15 levels of operator precedence.
! ===================================================================
!<Expr> ::= <Expr> ',' <Op Assign>
! | <Op Assign>
<Expr> ::= <Op Assign>
<Op Assign> ::= <Op If> '=' <Op Assign>
| <Op If> '+=' <Op Assign>
| <Op If> '-=' <Op Assign>
| <Op If> '*=' <Op Assign>
| <Op If> '/=' <Op Assign>
| <Op If> '^=' <Op Assign>
| <Op If> '&=' <Op Assign>
| <Op If> '|=' <Op Assign>
| <Op If> '>>=' <Op Assign>
| <Op If> '<<=' <Op Assign>
| <Op If>
<Op If> ::= <Op Or> '?' <Op If> ':' <Op If>
| <Op Or>
<Op Or> ::= <Op Or> '||' <Op And>
| <Op And>
<Op And> ::= <Op And> '&&' <Op BinOR>
| <Op BinOR>
<Op BinOR> ::= <Op BinOr> '|' <Op BinXOR>
| <Op BinXOR>
<Op BinXOR> ::= <Op BinXOR> '^' <Op BinAND>
| <Op BinAND>
<Op BinAND> ::= <Op BinAND> '&' <Op Equate>
| <Op Equate>
<Op Equate> ::= <Op Equate> '==' <Op Compare>
| <Op Equate> '!=' <Op Compare>
| <Op Compare>
<Op Compare> ::= <Op Compare> '<' <Op Shift>
| <Op Compare> '>' <Op Shift>
| <Op Compare> '<=' <Op Shift>
| <Op Compare> '>=' <Op Shift>
| <Op Shift>
<Op Shift> ::= <Op Shift> '<<' <Op Add>
| <Op Shift> '>>' <Op Add>
| <Op Add>
<Op Add> ::= <Op Add> '+' <Op Mult>
| <Op Add> '-' <Op Mult>
| <Op Mult>
<Op Mult> ::= <Op Mult> '*' <Op Unary>
| <Op Mult> '/' <Op Unary>
| <Op Mult> '%' <Op Unary>
| <Op Unary>
<Op Unary> ::= '!' <Op Unary>
| '~' <Op Unary>
| '-' <Op Unary>
| '*' <Op Unary>
| '&' <Op Unary>
| '++' <Op Unary>
| '--' <Op Unary>
| <Op Pointer> '++'
| <Op Pointer> '--'
| '(' <Type> ')' <Op Unary> !CAST
| sizeof '(' <Type> ')'
| sizeof '(' ID <Pointers> ')'
| <Op Pointer>
<Op Pointer> ::= <Op Pointer> '.' <Value>
| <Op Pointer> '->' <Value>
| <Op Pointer> '[' <Expr> ']'
| <Value>
<Value> ::= OctLiteral
| HexLiteral
| DecLiteral
| StringLiteral
| CharLiteral
| FloatLiteral
| Id '(' <Expr> ')'
| Id '(' ')'
| Id
| '(' <Expr> ')'
! -------------------------------------------------------------------
! If Statement
! -------------------------------------------------------------------
<IfStmt> ::= 'If' <Expr> 'Then' <NL> <BlockStmtList> <ElseStmtList> 'End' 'If' <NL>
| 'If' <Expr> 'Then' <InlineStmt> <ElseOpt> <EndIfOpt> <NL>
<ElseStmtList> ::= 'ElseIf' <Expr> 'Then' <NL> <BlockStmtList> <ElseStmtList>
| 'ElseIf' <Expr> 'Then' <InlineStmt> <NL> <ElseStmtList>
| 'Else' <InlineStmt> <NL>
| 'Else' <NL> <BlockStmtList>
|
<ElseOpt> ::= 'Else' <InlineStmt>
|
<EndIfOpt> ::= 'End' 'If'
! -------------------------------------------------------------------
! Loop Statement
! -------------------------------------------------------------------
<LoopStmt> ::= 'Do' <LoopType> <Expr> <NL> <BlockStmtList> 'Loop' <NL>
| 'Do' <NL> <BlockStmtList> 'Loop' <LoopType> <Expr> <NL>
| 'Do' <NL> <BlockStmtList> 'Loop' <NL>
| 'While' <Expr> <NL> <BlockStmtList> 'Wend' <NL>
<LoopType> ::= 'While'
| 'Until'
! -------------------------------------------------------------------
! For Statement
! -------------------------------------------------------------------
<ForStmt> ::= 'For' ID '=' <Expr> 'To' <Expr> <StepOpt> <NL> <BlockStmtList> 'Next' <NL>
| 'For' 'Each' ID 'In' <Expr> <NL> <BlockStmtList> 'Next' <NL>
<StepOpt> ::= 'Step' <Expr>
|
! -------------------------------------------------------------------
! Select Statement
! -------------------------------------------------------------------
<SelectStmt> ::= 'Select' 'Case' <Expr> <NL> <CaseStmtList> 'End' 'Select' <NL>
<CaseStmtList> ::= 'Case' <ExprList> <NLOpt> <BlockStmtList> <CaseStmtList>
| 'Case' 'Else' <NLOpt> <BlockStmtList>
|
<ExprList> ::= <Expr> ',' <ExprList>
| <Expr>
! -------------------------------------------------------------------
! END OF GRAMMAR DEFINITION
! -------------------------------------------------------------------
When you use a Lex/Yacc clone (like JS/CC or JFlex/CUP or Flex/Bison or...) you can define operator priorities. Otherwise you have to split the expression up and have one expression refer to the other one. This splitting has to be done to get the priorities right (otherwise the parser will not apply the right operator precedence). If you do it like that (like in the grammar you've posted) you get a lot of rules.McLovin wrote:Thanks AGS, I worked on an FB grammar for a week or two using the VB.Net, VBScript and C grammars as guides. I had never written a grammar in GOLD before. I got a fair amount of stuff defined and working until I got lost in the "expressions" - that's where the reduce/reduce errors occurred.
In JS/CC (or Flex/Bison, JFlex/CUP, Flex/Lemon or pick your own Lex/Yacc clone) you get (after defining priorities):
expr: expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
| INT
| FLOAT
Of course you'd have to put a lot more expressions in there to get where you want to get (loadsa operators in FB) and define priorities on them to get the kind of evaluation on expressions you want.
And you have to deal with the ambiguities of the FB language. Bison comes with GLR parsing power nowadays which should take care of many a problem though some cannot be solved with a formal grammar alone. You'd have to take care of those ambiguities some other way.
Nice GOLD parser grammar you've put together there. Pity it doesn't work yet.
It's comment inside source files, nothing more, nothing less. I'm sure the developers never intended the comments to form a formal grammar of the FB language. It'll need rewriting before it's acceptable to an LL(K) (if not a LALR(1) or a GLR) compiler compiler package.TheMG wrote:Not just that, but the syntax used to define them varies throughout, replacing x? with [x], x* with {x}, etc. I am attempting to clean it up.