- From: Sandro Hawke <sandro@w3.org>
- Date: Sat, 08 Dec 2012 10:53:04 -0500
- To: Eric Prud'hommeaux <eric@w3.org>, W3C RDF WG <public-rdf-wg@w3.org>
I believe I've written a strict LALR(1) (aka yacc/bison) grammar for Turtle. It's left-recursive (that is, it doesn't grow the stack) and unambiguous. I haven't done the lexer or a test harness, so I haven't actually tested it on the real test suite. Instead, the lexer just uses "words", and I test it with things like "a b c;d e,f". I hope to do the lexer and pass the positive and negative syntax tests, but I think what I've got is enough to settle the question we need for ACTION-217 and going to CR at the next meeting. A bit of commentary, while I'm thinking about grammars in specs. Basically, there are lots of possible grammars for a language. In many cases, one could write a left-recursive version of a production, which would work well in an LALR (bottom-up) parser, or a right-recursive version, which would work well in a recursive descent (top-down) parser. Which should we put in a spec? In either case, there will be some parser generators who can't really handle the grammar in the spec. I implemented this without looking at the spec, so I could keep my brain in LALR land. Jison code below. Oh, it doesn't do lists, or directives, or a few other things that seem irrelevant to ACTION-217. -- Sandro %lex %% \s+ /* skip whitespace */ [a-zA-Z_]+[0-9a-zA-Z_]* return 'WORD' "." return 'DOT' ";" return 'SEMI' "," return 'COMMA' "[" return 'LBRACKET' "]" return 'RBRACKET' <<EOF>> return 'EOF' . return 'INVALID' /lex %start doc %% doc : slist EOF { console.log('ended with trailing dot'); } | slist stmt EOF { console.log('ended with NO trailing dot'); } ; slist : /* nothing */ | slist stmt DOT ; stmt : subject polist ; polist : predicate olist | polist SEMI | polist SEMI predicate olist ; olist : object | olist COMMA object ; subject : term { yy.turtle_subject = $1; } ; predicate : term { yy.turtle_predicate = $1; } ; object : term { console.log('got triple: ', yy.turtle_subject, yy.turtle_predicate, $1); } | bracket_expr // | list_expr ; bracket_expr : start_bracket_expr polist end_bracket_expr ; start_bracket_expr : LBRACKET { if (!yy.bracket_stack) yy.bracket_stack = []; yy.bracket_stack.push( [yy.turtle_subject, yy.turtle_predicate] ); console.log('pushed to make:', yy.bracket_stack); if (yy.bnode_counter) { yy.bnode_counter++; } else { yy.bnode_counter = 1 } yy.turtle_subject = '_:u' + yy.bnode_counter; yy.turtle_predicate = undefined; } ; end_bracket_expr : RBRACKET { var value = yy.turtle_subject; sp = yy.bracket_stack.pop(); yy.turtle_subject = sp[0]; yy.turtle_predicate = sp[1]; console.log('got triple: ', yy.turtle_subject, yy.turtle_predicate, value); } ; term : WORD // for now we're not worrying about lexing. ;
Received on Saturday, 8 December 2012 15:53:07 UTC