W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > August 2005

Re: Please make sure the grammar is directly machine consumable.

From: Yosi Scharf <syosi@MIT.EDU>
Date: Mon, 22 Aug 2005 11:27:37 -0400
Message-ID: <4309EEE9.3020001@mit.edu>
To: "Seaborne, Andy" <andy.seaborne@hp.com>
CC: Tim Berners-Lee <timbl@w3.org>, public-rdf-dawg-comments@w3.org
Seaborne, Andy wrote:

> Yosi,
>
> Could you let me know what the "small hand tweak" is to the grammar? 
> Was it to the SPARQL grammar or the output of the conversion?
>
> Was it one of the diffs you sent in July (I never found out what the
> diffs were diff'ed against - some of them are already in the LC
> grammar but I didn't get the diffs until after the text was frozen for
> LC).
>
>     Andy
>
> Tim Berners-Lee wrote:
>
>>
>> Richard,
>>
>> I didn't realize the grammar in the spec is machine-generated.
>> Maybe it should be hand-edited and everything else
>> generated from it.
>>
>> Yosi (on vacation right now) has generated (with a small hand tweak)
>> the CFG grammar in RDF from the spec.   (See sparql* in
>> http://www.w3.org/2000/10/swap/grammar/
>> )  This is in plain BNF (  cfg:mustBeOneSequence properties
>> with nested RDF collections )
>>
>> See the bnf.n3 ontology in that directory as well as
>> the bnf-rules.n3 which go from some forms of ebnf to bnf,
>> also in that directory.
>>
>> Tim
>>
>> On Aug 18, 2005, at 16:26, Richard Newman wrote:
>>
>>> As I recall from discussions with Andy Seaborne while I was 
>>> implementing twinql[1], the grammar in the SPARQL docs are directly 
>>> generated from a JavaCC grammar file. The source, therefore, is 
>>> machine-consumable -- at least, if you're using JavaCC!
>>>
>>> However, the output is not a particular friendly grammar to work 
>>> with -- optional dots after productions, for example, tripped up my 
>>> tool (so twinql makes them compulsory), and it took a bit of work 
>>> to get it into a usable state (as I detailed in a previous email[2]).
>>>
>>> -R
>>>
>>> [1] <http://www.holygoat.co.uk/projects/twinql/>
>>> [2] <http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/
>>> 2005Aug/0055.html>
>>> On 18 Aug 2005, at 11:28, Tim Berners-Lee wrote:
>>>
>>>
>>>>
>>>> This is a followup from a discussion between Yosi Scharf, 
>>>> implementer of SPARQL in cwm, currently on vacation,  and Eric 
>>>> P'dH, co-editor of the spec, several weeks ago.
>>>>
>>>> Yosi has built his implementation of SPARQL from a file which is 
>>>> almost the one generated from the TR, but with a slight tweak to 
>>>> make the file grammar able to be parsed by a predictive parser [1] 
>>>> a simple form of LL(1) recursive descent parser.  I understood 
>>>> that the tweak was editorial in that the it didn't change the 
>>>> language, just the way it was expressed as a context-free grammar.
>>>>
>>>> A situation in which code can be generated directly from the spec 
>>>> is a very strong position to be in.  I am not aware of any time 
>>>> this has previously happened for a W3C language, but I may be 
>>>> wrong.  As it is demonstrably simple to make the step here I would 
>>>> request it be done at last call stage before the call for 
>>>> implementation at CR.
>>>>
>>>> [1] http://www.inf.ed.ac.uk/teaching/courses/cs2/LectureNotes/
>>>> CS2Ah/LangProc/lp10.pdf
>>>>
>>>> Tim Berners-Lee
>>>> MIT/CSAIL/DIG
>>>>
>>>>
>>>>
>>>
>>
>>
The diff I sent you (attached) was completely hand written. I was
describing how I went from
http://www.w3.org/2005/01/yacker/uploads/s20050622 , as it was at the
time, to http://www.w3.org/2005/01/yacker/uploads/yosiJune28a . I
believe I tried making it in terms of the grammar that was then current
in rq23. I don't know how well I succeeded. The content of the changes I
did, and the reasoning behind them, remains. The first change makes
things recursive enough to avoid the optional dot problem in all cases.
The other changes are needed to make the grammar LL(1).  I then take the
grammar in http://www.w3.org/2005/01/yacker/uploads/yosiJune28a and run
it through yacker to get something which is has been slightly hand
edited into http://www.w3.org/2000/10/swap/grammar/sparql.n3 . The hand
edits have been things I could have changed yacker to do for me, if I
knew perl and yacker.

The first change (the rest are readability nightmares, and I can easily
understand your not wanting those in the grammar), makes the grammar
LALR(1), and I don't think it is so bad. I change a GraphPatternList
into 
http://www.w3.org/2005/01/yacker/uploads/yosiJune28a?lang=perl&markup=html#prod-yosiJune28a-GraphPatternList
 , a set of things that may or may not be triples. A triple is followed
by a dot and a GraphPatternList, or directly by something that is not a
triple. Therefore, we have a direct statement of how dots work, and no
optDot problem.

Yosi

Changes for my version of the grammar (my numbers are meaningless):

First, making it so GraphPatterns work with regular dots
<<<<<<
[20]       GraphPattern       ::=       ( Triples '.'? )? (
GraphPatternNotTriples '.'? GraphPattern )?
[21]       GraphPatternNotTriples       ::=       OptionalGraphPattern |
GroupOrUnionGraphPattern | GraphGraphPattern | Constraint
------------
[20] GraphPattern       ::=               ( Triples1
GraphPatternListTail | GraphPatternNotTriples GraphPatternNotTriplesTail)?
[21] GraphPatternListTail ::=                   ( Dot GraphPattern)? |
GraphPatternNotTriplesList
[28] GraphPatternNotTriplesTail  ::=            ( Dot? GraphPattern)
[281] GraphPatternNotTriplesList ::=            GraphPatternNotTriples
GraphPatternNotTriplesTail
[22]       GraphPatternNotTriples       ::=       OptionalGraphPattern |
GroupOrUnionGraphPattern | GraphGraphPattern | Constraint
>>>>>>>>>>

Next, I needed to be able to predict where a '[' and '(' went. For
subjects, the change was this
<<<<<<<<<<
[28]       Triples1       ::=       VarOrTerm PropertyListNotEmpty |
TriplesNode PropertyList
-----------
[29]    Triples1       ::=       VarOrTerm PropertyListNotEmpty | '['
Triples2 | '(' Triples3
[29a]        Triples2          ::=           ']' PropertyListNotEmpty |
PropertyListNotEmpty ']' PropertyList
[29b]   Triples3      ::=        ')'  PropertyListNotEmpty |  GraphNode+
')' PropertyList
>>>>>>>>>>>

This means that a blank node with a property list can only appear as an
object
<<<<<<<<<<<
[35]       BlankNodePropertyList       ::=       '['
PropertyListNotEmpty ']'
[36]       Collection       ::=       '(' GraphNode+ ')'
------------
[38]       BlankNodePropertyList       ::=       '[' PropertyList ']'
[39]       Collection       ::=       '(' GraphNode* ')'
>>>>>>>>>>>

For verbs, that is the only place that ``[]'' has meaning in and of
itself now
<<<<<<<<<<<
[40]       VarOrBlankNodeOrIRIref       ::=       Var | BlankNode | IRIref
[65]       BlankNode       ::=       BNODE_LABEL | '[' ']'
-----------
[43]       VarOrBlankNodeOrIRIref       ::=       Var | BlankNode |
IRIref | NamelessBlank
[68]       BlankNode       ::=       BNODE_LABEL
[68a]   NamelessBlank     ::=           '[' ']'
>>>>>>>>>>>

I've joined empty lists with notempty ones
<<<<<<<<<<<
[42]       GraphTerm       ::=       RDFTerm | '(' ')'
-----------
[45]       GraphTerm       ::=       RDFTerm
>>>>>>>>>>>

The last modification is ugliest. I need to support ``FILTER (q:name =
?x)'' and ``FILTER(q:name() = ?x)'' despite them having qnames start and
mean different things.
<<<<<<<<<<
[57]       PrimaryExpression       ::=       BrackettedExpression |
CallExpression | Var | RDFTerm
[52]       CallExpression       ::=         'STR' '(' Expression ')'
| 'LANG' '(' Expression ')'
| 'DATATYPE' '(' Expression ')'
| 'BOUND' '(' Var ')'
| 'isURI' '(' Expression ')'
| 'isBLANK' '(' Expression ')'
| 'isLITERAL' '(' Expression ')'
| RegexExpression
| FunctionCall
------------
[60]       PrimaryExpression       ::=       BrackettedExpression |
BuiltinCallExpression | Var | RDFTermOrFunc
[55]    CallExpression    ::=    BuiltinCallExpression  |
FunctionCall        
[55]       BuiltinCallExpression       ::=         'STR' '(' Expression ')'
| 'LANG' '(' Expression ')'
| 'DATATYPE' '(' Expression ')'
| 'BOUND' '(' Var ')'
| 'isURI' '(' Expression ')'
| 'isBLANK' '(' Expression ')'
| 'isLITERAL' '(' Expression ')'
| RegexExpression
[61a]   RDFTermOrFunc      ::=        IRIrefOrFunc | RDFLiteral |
NumericLiteral | BooleanLiteral | BlankNode
[61b]   IRIrefOrFunc       ::=           IRIref ArgList?
>>>>>>>>>>>>>>

arglists had a similar prediction problem
<<<<<<<<<<<<<<
[55]       ArgList       ::=       ( '(' ')' | '(' Expression ( ','
Expression )* ')' )
----------
[58]       ArgList       ::=       '(' ( Expression ( ',' Expression )*
)? ')'
>>>>>>>>>>>>>>

Those are all of the changes I remember.
Received on Monday, 22 August 2005 15:44:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:49 GMT