Producing the rq23 grammar from Seaborne, Andy on 2005-04-26 (public-rdf-dawg@w3.org from April to June 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 26 Apr 2005 18:05:28 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <426E74D8.5060308@hp.com>
Details on how the HTML for the grammar is produced, with links to sources.

The grammar production system is in:

http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/

The SPARQL grammar for javacc [1] is:

http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/sparql.jj?view=markup

The java code it generates is specific to the ARQ engine. If you remove all the 
Java code blocks from sparql.jj, you will get a parser that checks the language 
but produces no parser tree.


The process of producing the HTML for rq23 is:

1/ Ensure all files up to date - make the parser and lexer with the script 
"grammar" which produces both the code (javacc) and the material needed for the 
HTML form (jjdoc - part of the javacc package).  As the full javacc process is 
run, any errors and warnings can be dealt with.

2/ Run the syntax test suite.

3/ Ensure tokens.txt is correct

The text output only includes the parser rules, not the lexer rules.   The lexer 
tokens for display are produced manually because javacc format has to be 
converted to BNF as well as some control over inlining tokens for clarity. 
Theer is support cod for this process but it isn't perfect.

4/ Run a Perl script to take text from jjdoc to produce an HTML table. jj2html.

5/ Validate the HTML ; run "tidy -e" on the table.

6/ rq23 has markers "GRAMMAR" in comments - delete everything between the 
markers and insert the new HTML table.


Documentation (no tokens) is produced by jjdoc (part of javacc):
http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/sparql.txt?view=markup
and is checked in each time.

Tokens:
http://cvs.sourceforge.net/viewcvs.py/jena/ARQ/Grammar/tokens.txt?view=markup

It uses [] round a token to indicate that it should be inlined.
jj2tokens helps produce this file but it still need hand editing.


[Aside:
Stepping back, the file sparql.jj is itself produced from master.jj which is a 
gramamr for both SPARQL and ARQ.  It is just javacc with C preprocessor 
directive to put in the relevant part for SPARQL or ARQ.

When the grammar for rq23 is finished, I will be undoing this linkage.  Then ARQ 
will aquire more experimental features.
]

 Andy


[1] Javacc:
http://javacc.dev.java.net/
I use version Java Compiler Compiler Version 3.2 which output Java compatible 
with Java 1.4 and Java 1.5.  BSD license.

---------------------------------------

javacc support running a check for LA parsers (javacc produces an LL parser):


Currently the warnings are all commented in sparql.jj:

Warning: Choice conflict involving two expansions at
          line 92, column 5 and line 96, column 5 respectively.
          A common prefix is: "select" "distinct"
          Consider using a lookahead of 3 or more for earlier expansion.

There are different sub rules for SELECT [DISTINCT] *
and SELECT [DISTINCT] ?var.

Warning: Choice conflict involving two expansions at
          line 106, column 5 and line 111, column 5 respectively.
          A common prefix is: "describe"
          Consider using a lookahead of 2 for earlier expansion.

Same for DESCRIBE

Warning: Choice conflict in [...] construct at line 132, column 3.
          Expansion nested within construct and expansion following construct
          have common prefixes, one of which is: "from"
          Consider using a lookahead of 2 or more for nested expansion.

This is "FROM" and "FROM NAMED"

Warning: Choice conflict involving two expansions at
          line 277, column 4 and line 282, column 4 respectively.
          A common prefix is: "{" <VAR1>
          Consider using a lookahead of 3 or more for earlier expansion.

Because UNION is an infix keyword, seeing { .. } is not enough until the parser 
can see if it is followed by UNION or not.

Warning: Choice conflict in [...] construct at line 352, column 17.
          Expansion nested within construct and expansion following construct
          have common prefixes, one of which is: "."
          Consider using a lookahead of 2 or more for nested expansion.

DOTs are optional at the end of a bunch of triples (as per N3).
Need to distinguish between "<DOT> more triples" and
something like "<DOT> GRAPH ..." or even "<DOT> }"

Warning: Choice conflict involving two expansions at
          line 357, column 3 and line 362, column 3 respectively.
          A common prefix is: "[" "]"
          Consider using a lookahead of 3 or more for earlier expansion.

[] is a blank node, [:p :v] is a blank node that also generates some triples.
The grammar ensures that [] is not legal on its own.

"[:p :v] ." is legal -- "[]. " is not.
In cwm, "[]." and "<a> . <b> . <c> ." parses.
These are ruled out in SPARQL.


Warning: Choice conflict involving two expansions at
          line 422, column 5 and line 424, column 5 respectively.
          A common prefix is: "[" "]"
          Consider using a lookahead of 3 or more for earlier expansion.

Ditto for objects.
Consequence of differentiating [] and [:p :v] earlier on.

Warning: Choice conflict involving two expansions at
          line 484, column 3 and line 487, column 3 respectively.
          A common prefix is: "[" "]"
          Consider using a lookahead of 3 or more for earlier expansion.

Ditto. for subjects and list items
Consequence of differentiating [] and [:p :v] earlier on.


Warning: Choice conflict involving two expansions at
          line 661, column 5 and line 663, column 5 respectively.
          A common prefix is: <Q_URIref>
          Consider using a lookahead of 2 for earlier expansion.

Differentiate "q:name()"  and "q:name" in expressions.
Received on Tuesday, 26 April 2005 17:06:38 UTC