- From: <stan.devitt@agfa.com>
- Date: Mon, 22 Aug 2005 04:03:34 -0400
- To: andy.seaborne@hp.com
- Cc: Richard Newman <holygoat@gmail.com>, public-rdf-dawg-comments@w3.org, public-rdf-dawg-comments-request@w3.org, Yosi Scharf <syosi@mit.edu>, Tim Berners-Lee <timbl@w3.org>
This too is just a comment on the auto-generation of the parser. For work with Euler Jos and I have been using a representation of the n3 grammar that is written in an XML document structure we have developed to automate the construction and validation of a jjtree grammar. There are additional structures in the XML representation of the grammar for automatically substituting groups of tokens or productions and for excluding individual entries from such a group. It also automatically generates code for visitors to the nodes of the parse tree, and produces very readable sorted jjtree grammars. This has approach been used to develop a latex to XML parser, for example, and was originally adopted as the hand coding of that grammar had become unmanageable. XSLT is used to transform this XML grammar into an expanded and sorted jjtree grammar which is in turn transformed into a javacc grammar finally into a java parser complete with a visitor that generates XML documents. Hyperlinked html previews of the grammar are also produced with undefined references color coded. The resulting parser with the default visitor produces an XML representation of the parse tree of the n3 document . There is provision for automatically applying an XSLT stylesheet to restructure the output (one of which turns it back into n3) The grammar we use differs slightly from the tail-recursive master grammar in that we use repetition with semantic lookahead in place of the tail recursive parts of the grammar. We find the resulting parse tree much more useful. It avoids the deeply nested structures caused by the tail-recursions of the original grammar and which usually need to be unraveled anyway before having a usable structure. (Of course we could stick with the tail recursion and unravel later it in XSLT :-) ) Our heavy reliance on semantic lookahead has had only a modest effect on the performance. It turns out all the lookahead expressions tend to fail early except where we are looking for ( for example) an optional "." at the end of a group such as {a b c. d e f. } . Even this has been manageable. Auto-generation of the parser from n3 is certainly in our plans. We would generate our XML form of the grammar directly from the n3 grammar, systematically transforming the tail recursion into our structure either using n3 rules or xslt as is appropriate. Stan Devitt "Seaborne, Andy" <andy.seaborne@hp.com> Sent by: public-rdf-dawg-comments-request@w3.org 08/21/2005 12:38 PM To: Richard Newman <holygoat@gmail.com> cc: Tim Berners-Lee <timbl@w3.org>, public-rdf-dawg-comments@w3.org, Yosi Scharf <syosi@mit.edu>, (bcc: Stan Devitt/AWKCT/CAN/AGFA/CA/BAYER) Subject: Re: Please make sure the grammar is directly machine consumable. Richard Newman wrote: > > > On 19 Aug 2005, at 04:12, Tim Berners-Lee wrote: > >> Richard, >> >> I didn't realize the grammar in the spec is machine-generated. >> Maybe it should be hand-edited and everything else >> generated from it. > > > I think that would be a good idea from one point of view (mine and > yours, certainly!), but we'd have to see what the current maintainers > of the SPARQL grammar think. > >> Yosi (on vacation right now) has generated (with a small hand tweak) >> the CFG grammar in RDF from the spec. (See sparql* in >> http://www.w3.org/2000/10/swap/grammar/ >> ) This is in plain BNF ( cfg:mustBeOneSequence properties >> with nested RDF collections ) >> >> See the bnf.n3 ontology in that directory as well as >> the bnf-rules.n3 which go from some forms of ebnf to bnf, >> also in that directory. > > > Very handy (and pretty cool!). As it seems the tools are in place, it > would be nice to have a machine-readable 'spec' grammar that could be > re-purposed into presentation EBNF, JavaCC, plain BNF, etc. -- this > would certainly save me a lot of work whenever the grammar changes! > > It is also nice, in an "eating one's own dog food" way, to have the > grammar itself in RDF. > > -R > This is not a response to the comment - just a description of some details in case it helps. The grammar is written using JavaCC, which, while an LL parser generator, also provides tools to do LA checking. JavaCC also provides a text output format. The JavaCC text output is converted to the HTML for the document by a script although the tokens have to be manually described. The process is converting javacc syntax to the EBNF syntax as described in http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-notation. The grammar in javacc is not quite LL(1) (there is a 2 state lookahead at the Triples production - related to the optional dots Richard commented on). The document grammar is also fed into yacker (a W3C tool) which checks for conversion to bison/flex (LALR(1)). There are trade-off between readability by humans and processable by machines in the current grammar. Some people find the weighting towards a machine-processable grammar makes the grammar unclear (e.g. the use of recursive rules use rather than repetition). Andy
Received on Monday, 22 August 2005 08:04:18 UTC