- From: Sandro Hawke <sandro@w3.org>
- Date: Sat, 08 Dec 2012 10:53:04 -0500
- To: Eric Prud'hommeaux <eric@w3.org>, W3C RDF WG <public-rdf-wg@w3.org>
I believe I've written a strict LALR(1) (aka yacc/bison) grammar for
Turtle. It's left-recursive (that is, it doesn't grow the stack) and
unambiguous. I haven't done the lexer or a test harness, so I haven't
actually tested it on the real test suite. Instead, the lexer just
uses "words", and I test it with things like "a b c;d e,f". I hope to
do the lexer and pass the positive and negative syntax tests, but I
think what I've got is enough to settle the question we need for
ACTION-217 and going to CR at the next meeting.
A bit of commentary, while I'm thinking about grammars in specs.
Basically, there are lots of possible grammars for a language. In many
cases, one could write a left-recursive version of a production, which
would work well in an LALR (bottom-up) parser, or a right-recursive
version, which would work well in a recursive descent (top-down)
parser. Which should we put in a spec? In either case, there will be
some parser generators who can't really handle the grammar in the spec.
I implemented this without looking at the spec, so I could keep my brain
in LALR land.
Jison code below. Oh, it doesn't do lists, or directives, or a few
other things that seem irrelevant to ACTION-217.
-- Sandro
%lex
%%
\s+ /* skip whitespace */
[a-zA-Z_]+[0-9a-zA-Z_]* return 'WORD'
"." return 'DOT'
";" return 'SEMI'
"," return 'COMMA'
"[" return 'LBRACKET'
"]" return 'RBRACKET'
<<EOF>> return 'EOF'
. return 'INVALID'
/lex
%start doc
%%
doc
: slist EOF
{ console.log('ended with trailing dot'); }
| slist stmt EOF
{ console.log('ended with NO trailing dot'); }
;
slist
: /* nothing */
| slist stmt DOT
;
stmt
: subject polist
;
polist
: predicate olist
| polist SEMI
| polist SEMI predicate olist
;
olist
: object
| olist COMMA object
;
subject
: term
{ yy.turtle_subject = $1; }
;
predicate
: term
{ yy.turtle_predicate = $1; }
;
object
: term
{ console.log('got triple: ', yy.turtle_subject, yy.turtle_predicate, $1); }
| bracket_expr
// | list_expr
;
bracket_expr
: start_bracket_expr polist end_bracket_expr
;
start_bracket_expr
: LBRACKET
{ if (!yy.bracket_stack) yy.bracket_stack = [];
yy.bracket_stack.push( [yy.turtle_subject, yy.turtle_predicate] );
console.log('pushed to make:', yy.bracket_stack);
if (yy.bnode_counter) {
yy.bnode_counter++;
} else {
yy.bnode_counter = 1
}
yy.turtle_subject = '_:u' + yy.bnode_counter;
yy.turtle_predicate = undefined;
}
;
end_bracket_expr
: RBRACKET
{
var value = yy.turtle_subject;
sp = yy.bracket_stack.pop();
yy.turtle_subject = sp[0];
yy.turtle_predicate = sp[1];
console.log('got triple: ', yy.turtle_subject, yy.turtle_predicate, value);
}
;
term
: WORD // for now we're not worrying about lexing.
;
Received on Saturday, 8 December 2012 15:53:07 UTC