a compiler compiler.
This example is an example of an operator precedence parser. Operator precedence parsers are a kind of bottom - up (shift - reduce) parsers you can implement by hand.
http://en.wikipedia.org/wiki/Operator-precedence_parser
An operator precedence parser (or at least the one I'm presenting here) uses two stacks, an operator precedence table and a couple of routines to parse an expression.
One stack is used for storing operands while the other stack is used for storing operators. The precedence table is used by the parsing routine to figure out (given the next token and the token at the top of the operator stack) what action to undertake (either SHIFT, REDUCE, ACCEPT or print a syntax error).
The precedence table is a two dimensional array sized (number_of_operators,number_of_operators)
A short description of possible table entries (x and y are operators):
table(x,y) = S and x unequal to y ==> precedence(y) > precedence(x)
table(x,y) = S and x = y ==> x is right associative
table(x,y) = R and x unequal to y ==> precedence(x) > precedence(y)
table(x,y) = R and x = y ==> x is left associative
table(x,y) = En ==> SYNTAX ERROR
table(x,y) = A ==> ACCEPT (parsing completed succesfully)
S stands for SHIFT
R stands for REDUCE
A stands for ACCEPT
At each step during the parsing of an expression the parsing routine (parse) uses the precedence table to decide what action to undertake. Depending on what's on top of the operator stack and the next input token the action can be:
- SHIFT (S) ==> push input token on the operator stack;
- SHIFT (S) ==> push input token on the value stack;
- REDUCE (R) ==>l remove one or more tokens from the operator stack. The operator that's removed from the operator stack can be applied to the topmost entrie(s) of the
value stack. This will lead to the removal of tokens from the value stack. The result of the operation is pushed back onto the value stack.
- ACCEPT (A) ==> end of parsing (finished evaluating expression)
- display an error message (En) as there is some error in the expression.
operators defined:
+ - * / ^ unary minus , ( ) f(actorial) p(ermutation) c(ombination)
format of a number (notated as a regular expression): [0-9]+(\.[0-9]+)?
Code: Select all
'uncommenting the DEBUGGING_ macro makes subroutine split print tokens on stdout
'#define DEBUGGING_
' actions
enum ActionEnum
S ' shift
R ' reduce
A ' accept
E1 ' error: missing right parenthesis
E2 ' error: missing operator
E3 ' error: unbalanced right parenthesis
E4 ' error: invalid function argument
end enum
' tokens
enum tokEnum
' operators
tAdd ' +
tSub ' -
tMul ' *
tDiv ' /
tPow ' ^ (power)
tUmi ' - (unary minus)
tFact ' f(x): factorial
tPerm ' p(n,r): permutations, n objects, r at a time
tComb ' c(n,r): combinations, n objects, r at a time
tComa ' comma
tLpr ' (
tRpr ' )
tEof ' end of string
tMaxOp ' maximum number of operators
' non-operators
tVal ' value
end enum
dim shared Tok as tokEnum ' token
dim shared Tokval as double ' token value
const MaxOpr as integer = 50
const MaxV as integer = 50
dim shared Opr(MaxOpr) as integer ' operator stack
dim shared V(MaxV) as double ' value stack
dim shared OprTop as integer ' top of operator stack
dim shared VTop as integer ' top of value stack
redim shared Term(0 to 0) as string ' array of terms
dim shared TermIndex as integer ' current term
dim shared failure as integer = 0 ' set to 1 when evaluation of expression fails
dim shared parseTbl(tMaxOp,tMaxOp) as byte = { _
/' stk ------------------ input ------------------------ '/ _
/' + - * / ^ M f p c , ( ) $ '/ _
/' -- -- -- -- -- -- -- -- -- -- -- -- -- '/ _
/' + '/ { R, R, S, S, S, S, S, S, S, R, S, R, R },_
/' - '/ { R, R, S, S, S, S, S, S, S, R, S, R, R },_
/' * '/ { R, R, R, R, S, S, S, S, S, R, S, R, R },_
/' / '/ { R, R, R, R, S, S, S, S, S, R, S, R, R },_
/' ^ '/ { R, R, R, R, S, S, S, S, S, R, S, R, R },_
/' M '/ { R, R, R, R, R, S, S, S, S, R, S, R, R },_
/' f '/ { R, R, R, R, R, R, R, R, R, R, S, R, R },_
/' p '/ { R, R, R, R, R, R, R, R, R, R, S, R, R },_
/' c '/ { R, R, R, R, R, R, R, R, R, R, S, R, R },_
/' , '/ { R, R, R, R, R, R, R, R, R, R, R, R, E4},_
/' ( '/ { S, S, S, S, S, S, S, S, S, S, S, S, E1},_
/' ) '/ { R, R, R, R, R, R, E3, E3, E3, R, E2, R, R },_
/' $ '/ { S, S, S, S, S, S, S, S, S, E4, S, E3, A }_
}
declare function ErrorMsg(byref msg as string) as integer
'splits expression into tokens
'resulting tokens are placed in dynamic array terms
'returns: 0 on succes
' failure is set to 1 when an illegal token
' is found (failure is a global variable)
function split(byref expression as string) as integer
'To make the parser a bit more solid split wraps numbers in parens
'That way the parser will report an error when
'two consecutive numbers without an operator in between are found.
'An example of how this works ($ = top of stack at start of parse)
'expression ==> 8 5 +
'parse:
'input = 8
'input = 5 action: shift 8 on the operand stack operand stack: 58$
'input = + action: reduce
'The reason for this erroneous behaviour of the parser has to do with the
'way binary operators are reduced. On a reduce a binary operator simply looks
'at the value stack. if there are 2 values on the stack it applies the operator
'to those values and puts the result on the value stack.
'Wrapping a number into parens leads to different parser behaviour as ( and )
'are operators that lead to a reduction. So parsing of 8 5 + (after adding parens)
'now looks like
'expression ==> (8) (5) +
'parse:
'input = ( action: push ( on the operator stack operator stack: ($
'input = 8 action: push 8 on the value stack value stack : 8$
'input = ) error as parsetabl('(',')' equals E2 (missing operator).
'expression ==> (8) f((5))
'parse:
'input = ( action: push ( on the operator stack operator stack: ($
'input = 8 action: push 8 on the value stack value stack : 8$
'input = ) error as parsetbl('(','f') equals E3 (unbalanced right parens)
'E3 is a 'bad' error message as there is no problem with the parens.
'Error handling could be better at the expense of a more complicated lexer/parser.
dim mark as integer
dim ct as integer
dim lexp as integer = len(expression)
function = 0
if (len(expression) = 0) then
errormsg("empty expression")
failure = 1
exit function
end if
while (expression[ct] = asc(" ") orelse expression[ct] = asc(!" \t") andalso ct < lexp)
ct += 1
wend
'expression contained nothing but space
if (ct = lexp) then
errormsg("empty expression")
failure = 1
exit function
end if
var upper = 0
var lower = 0
mark = ct
while (ct < lexp)
select case as const expression[ct]
'operator or single char function
case asc("+"),asc("-"),asc("*"),asc("/"),asc("^"),asc("("),asc(")"),asc(","),_
asc("f"),asc("p"),asc("c")
if (lower = upper andalso term(0) = "") then
term(0) = chr(expression[ct])
#ifdef DEBUGGING_
print chr(expression[ct])
#endif
else
redim preserve term(0 to upper+1)
upper += 1
term(upper) = chr(expression[ct])
#ifdef DEBUGGING_
print chr(expression[ct])
#endif
end if
ct += 1
'number
case asc("0"),asc("1"),asc("2"),asc("3"),asc("4"),_
asc("5"),asc("6"),asc("7"),asc("8"),asc("9")
number_loop:
'at end of expression?
if (ct = lexp - 1) then
if (lower = upper andalso term(0) = "") then
redim preserve term(0 to upper+2)
'parenthesize numbers to evoke errors from the parser
upper += 2
term(upper-2) = "("
term(upper-1) = mid(expression,mark+1,iif(ct - mark=0,1,ct+1 - mark))
term(upper) = ")"
#ifdef DEBUGGING_
print mid(expression,mark+1,iif(ct - mark=0,1,ct+1 - mark))
#endif
else
redim preserve term(0 to upper + 3)
'parenthesize numbers to evoke errors from the parser
upper += 3
term(upper-2) = "("
term(upper-1) = mid(expression,mark+1,iif(ct - mark=0,1,ct+1 - mark))
term(upper) = ")"
#ifdef DEBUGGING_
print mid(expression,mark+1,iif(ct - mark=0,1,ct+1 - mark))
#endif
end if
return 0
end if
ct += 1
select case as const expression[ct]
case asc("0"),asc("1"),asc("2"),asc("3"),asc("4"),_
asc("5"),asc("6"),asc("7"),asc("8"),asc("9"),asc(".")
goto number_loop
case else'asc(" "),asc(!"\t")
if (lower = upper andalso term(0) = "") then
redim preserve term(0 to upper + 2)
upper += 2
'parenthesize numbers to evoke errors from the parser
term(upper-2) = "("
term(upper-1) = mid(expression,mark+1,ct - mark)
term(upper) = ")"
#ifdef DEBUGGING_
print mid(expression,mark+1,ct - mark)
#endif
else
redim preserve term(0 to upper + 3)
upper += 3
'parenthesize numbers to evoke errors from the parser
term(upper-2) = "("
term(upper-1) = mid(expression,mark+1,ct - mark)
term(upper) = ")"
#ifdef DEBUGGING_
print mid(expression,mark+1,ct - mark)
#endif
end if
end select
'illegal character in expression (expression will evaluate to 0)
case else
if (expression[ct] <> Asc(" ") andalso expression[ct] <> asc(!"\t")) then
errormsg("syntax error at character " & ct + 1 & " (" & chr(expression[ct]) & ")")
print expression
print space(ct);"^"
failure = 1
end
end if
end select
'skip whitespace
while (expression[ct] = asc(" ") orelse expression[ct] = asc(!"\t") andalso ct < lexp)
ct += 1
wend
mark = ct
wend
end function
function ErrorMsg(byref msg as string) as integer
print "Error: " & msg
return 1
end function
function GetTok() as integer
dim TokStr as string
static PrevTok as tokEnum
' get next token
GetTok = 0
TermIndex = TermIndex + 1
if TermIndex > UBound(Term) then
Tok = tEof
exit function
end if
TokStr = Term(TermIndex)
' convert symbol to token
select case TokStr
case "+": Tok = tAdd
case "-": Tok = tSub
case "*": Tok = tMul
case "/": Tok = tDiv
case "^": Tok = tPow
case "(": Tok = tLpr
case ")": Tok = tRpr
case ",": Tok = tComa
case "f": Tok = tFact
case "p": Tok = tPerm
case "c": Tok = tComb
case else
'check whether token is a valid (possibly floating point) number
var dot = 0
for i as integer = 0 to len(TokStr) - 1
select case as const TokStr[i]
case asc("0"),asc("1"),asc("2"),asc("3"),_
asc("4"),asc("5"),asc("6"),asc("7"),_
asc("8"),asc("9")
case asc(".")
dot += 1
if (dot > 1) then
dot = 0
goto case_else
end if
case else
case_else:
print "token not numeric (" & TokStr & "), use spaces as separators"
failure = 1
function = 1
end select
next i
Tokval = Val(TokStr)
Tok = tVal
end select
' check for unary minus
if Tok = tSub then
if TermIndex > 0 then
if PrevTok <> tVal And PrevTok <> tRpr then
Tok = tUmi
end if
'uminus at start of expression
ElseIf TermIndex = 0 then
Tok = tUmi
end if
end if
PrevTok = Tok
end function
function Shift() as integer
if Tok = tVal then
VTop = VTop + 1
if VTop >= MaxV then
Shift = ErrorMsg("V stack exhausted")
exit function
end if
V(VTop) = Tokval
else
OprTop = OprTop + 1
if OprTop >= MaxOpr then
Shift = ErrorMsg("Opr stack exhausted")
failure = 1
exit function
end if
Opr(OprTop) = Tok
end if
if GetTok() <> 0 then
Shift = 1
exit function
end if
Shift = 0
end function
function Fact(byval N as double) as double
dim i as double
dim t as double
t = 1:i=1
while (i <= N)
t *= i
i += 1
wend
return t
end function
'comb is used for debugging purposes (to check whether
'parse yields a mathematically correct result)
function comb(byval N as double,byval M as double) as double
return Fact(n) / (Fact(m) * Fact(n - m))
end function
'perm is used for debugging purposes (to check whether
'parse yields a mathematically correct result)
function perm(byval n as double,byval m as double) as double
return Fact(n) / Fact(n - m)
end function
function Reduce() as integer
select case Opr(OprTop)
case tAdd
' apply E := E + E
if VTop < 1 then
Reduce = ErrorMsg("Syntax error")
failure = 1
exit function
end if
V(VTop - 1) = V(VTop - 1) + V(VTop)
VTop = VTop - 1
case tSub
' apply E := E - E
if VTop < 1 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop - 1) = V(VTop - 1) - V(VTop)
VTop = VTop - 1
case tMul
' apply E := E * E
if VTop < 1 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop - 1) = V(VTop - 1) * V(VTop)
VTop = VTop - 1
case tDiv
' apply E := E / E
if VTop < 1 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop - 1) = V(VTop - 1) / V(VTop)
VTop = VTop - 1
case tUmi
' apply E := -E
if VTop < 0 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop) = -V(VTop)
case tPow
' apply E := E ^ E
if VTop < 1 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
if (V(VTop - 1) < 0) then
V(VTop - 1) = -(V(VTop - 1) ^ V(VTop))
else
V(VTop - 1) = V(VTop - 1) ^ V(VTop)
end if
VTop = VTop - 1
case tFact
' apply E := f(E)
if VTop < 0 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop) = Fact(V(VTop))
case tPerm
' apply E := p(N,R)
if VTop < 1 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop - 1) = Fact(V(VTop - 1)) / Fact(V(VTop - 1) - V(VTop))
VTop = VTop - 1
case tComb
' apply E := c(N,R)
if VTop < 1 then
failure = 1
Reduce = ErrorMsg("Syntax error")
exit function
end if
V(VTop - 1) = Fact(V(VTop - 1)) / _
(Fact(V(VTop)) * Fact(V(VTop - 1) - V(VTop)))
VTop = VTop - 1
case tRpr
' pop () off stack
OprTop = OprTop - 1
end select
OprTop = OprTop - 1
Reduce = 0
end function
function Parse(byref Expr as string) as double
' initialize for next expression
OprTop = 0
VTop = -1
Opr(OprTop) = tEof
TermIndex = -1
'split expression into terms (put terms in global array term)
Split(Expr)
if GetTok() <> 0 then exit function
do
' input is Value (operand)
if Tok = tVal then
' shift token to value stack
if Shift() <> 0 then exit function
else
' input is operator
select case ParseTbl(Opr(OprTop), Tok)
case R
if Reduce() <> 0 then exit function
case S
if Shift() <> 0 then exit function
case A
' accept
if VTop = 0 then
'print "value = " & V(0)
return V(0)
else
failure = 1
ErrorMsg("Syntax error")
end if
exit function
case E1
ErrorMsg("Missing right parenthesis")
failure = 1
exit function
case E2
failure = 1
ErrorMsg("Missing operator")
exit function
case E3
failure = 1
ErrorMsg("Unbalanced right parenthesis")
exit function
case E4
failure = 1
ErrorMsg("Invalid function argument")
exit function
end select
end if
loop
end function
function Test(byref Expr as string) as double
return(Parse(Expr))
end function
'some testing code
dim t as double
dim s as string = "f(8) + f(6) + f(5) + c(f(2),4)"
var t1 = timer()
t = test(s)
var t2 = timer()
print using "#######.#######";t2 - t1
if (failure) then
print "eval ==> ";s;" ==> error"
else
print "eval ==> ";s;" = ";t
end if
t1 = timer()
t = fact(8) + fact(6) + fact(5) + comb(fact(2),4)
t2 = timer()
print using "#######.#######";t2 - t1
print "fb ==> ";s;" = ";t
redim term(0 to 0)