- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Tue, 26 Apr 2005 18:05:28 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Details on how the HTML for the grammar is produced, with links to sources. The grammar production system is in: http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/ The SPARQL grammar for javacc [1] is: http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/sparql.jj?view=markup The java code it generates is specific to the ARQ engine. If you remove all the Java code blocks from sparql.jj, you will get a parser that checks the language but produces no parser tree. The process of producing the HTML for rq23 is: 1/ Ensure all files up to date - make the parser and lexer with the script "grammar" which produces both the code (javacc) and the material needed for the HTML form (jjdoc - part of the javacc package). As the full javacc process is run, any errors and warnings can be dealt with. 2/ Run the syntax test suite. 3/ Ensure tokens.txt is correct The text output only includes the parser rules, not the lexer rules. The lexer tokens for display are produced manually because javacc format has to be converted to BNF as well as some control over inlining tokens for clarity. Theer is support cod for this process but it isn't perfect. 4/ Run a Perl script to take text from jjdoc to produce an HTML table. jj2html. 5/ Validate the HTML ; run "tidy -e" on the table. 6/ rq23 has markers "GRAMMAR" in comments - delete everything between the markers and insert the new HTML table. Documentation (no tokens) is produced by jjdoc (part of javacc): http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/sparql.txt?view=markup and is checked in each time. Tokens: http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/tokens.txt?view=markup It uses [] round a token to indicate that it should be inlined. jj2tokens helps produce this file but it still need hand editing. [Aside: Stepping back, the file sparql.jj is itself produced from master.jj which is a gramamr for both SPARQL and ARQ. It is just javacc with C preprocessor directive to put in the relevant part for SPARQL or ARQ. When the grammar for rq23 is finished, I will be undoing this linkage. Then ARQ will aquire more experimental features. ] Andy [1] Javacc: http://javacc.dev.java.net/ I use version Java Compiler Compiler Version 3.2 which output Java compatible with Java 1.4 and Java 1.5. BSD license. --------------------------------------- javacc support running a check for LA parsers (javacc produces an LL parser): Currently the warnings are all commented in sparql.jj: Warning: Choice conflict involving two expansions at line 92, column 5 and line 96, column 5 respectively. A common prefix is: "select" "distinct" Consider using a lookahead of 3 or more for earlier expansion. There are different sub rules for SELECT [DISTINCT] * and SELECT [DISTINCT] ?var. Warning: Choice conflict involving two expansions at line 106, column 5 and line 111, column 5 respectively. A common prefix is: "describe" Consider using a lookahead of 2 for earlier expansion. Same for DESCRIBE Warning: Choice conflict in [...] construct at line 132, column 3. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "from" Consider using a lookahead of 2 or more for nested expansion. This is "FROM" and "FROM NAMED" Warning: Choice conflict involving two expansions at line 277, column 4 and line 282, column 4 respectively. A common prefix is: "{" <VAR1> Consider using a lookahead of 3 or more for earlier expansion. Because UNION is an infix keyword, seeing { .. } is not enough until the parser can see if it is followed by UNION or not. Warning: Choice conflict in [...] construct at line 352, column 17. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "." Consider using a lookahead of 2 or more for nested expansion. DOTs are optional at the end of a bunch of triples (as per N3). Need to distinguish between "<DOT> more triples" and something like "<DOT> GRAPH ..." or even "<DOT> }" Warning: Choice conflict involving two expansions at line 357, column 3 and line 362, column 3 respectively. A common prefix is: "[" "]" Consider using a lookahead of 3 or more for earlier expansion. [] is a blank node, [:p :v] is a blank node that also generates some triples. The grammar ensures that [] is not legal on its own. "[:p :v] ." is legal -- "[]. " is not. In cwm, "[]." and "<a> . <b> . <c> ." parses. These are ruled out in SPARQL. Warning: Choice conflict involving two expansions at line 422, column 5 and line 424, column 5 respectively. A common prefix is: "[" "]" Consider using a lookahead of 3 or more for earlier expansion. Ditto for objects. Consequence of differentiating [] and [:p :v] earlier on. Warning: Choice conflict involving two expansions at line 484, column 3 and line 487, column 3 respectively. A common prefix is: "[" "]" Consider using a lookahead of 3 or more for earlier expansion. Ditto. for subjects and list items Consequence of differentiating [] and [:p :v] earlier on. Warning: Choice conflict involving two expansions at line 661, column 5 and line 663, column 5 respectively. A common prefix is: <Q_URIref> Consider using a lookahead of 2 for earlier expansion. Differentiate "q:name()" and "q:name" in expressions.
Received on Tuesday, 26 April 2005 17:06:38 UTC