Re: SPARQL and Turtle Prefix Placement

* Gavin Carothers <gavin@carothers.name> [2012-06-15 10:44-0700]
> On Fri, Jun 15, 2012 at 9:48 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> > +[20]   LANGTAG         ::=     BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
> 
> 
> No, reverting back to the PREFIX BASE terminals is not acceptable.
> This was already the subject of review by Andy and Peter.
> 
> Please see thread
> http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0347.html
> for discussion on the change from PREFIX BASE to a simpler LANGTAG.

But that thread didn't terminate in consensus.
Andy's point
[[
    (to the casual reader : BASE is '@base' and PREFIX is '@prefix'
    
    Which is ambiguous - as it says:
    
    LANGTAG ::= ('@base' | '@prefix' | '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)
    
    so the string "@base" matches two ways.
    
    But even if sorted out ... it means a tokenizer may well generate the 
    token LANGTAG ... and then:
    
    [5]  base  ::=  BASE IRIREF
    
    does not match as the token is LANGTAG, not BASE.  Oops.
]]

is addressed by moving the "BASE | PREFIX | " from LANGTAG to RDFLiteral:

  RDFLiteral ::= String (BASE | PREFIX | LANGTAG | '^^' iri)?

Turtle doesn't talk about parsing rules (perhaps it should); SPARQL's note 3 says [[
When tokenizing the input and choosing grammar rules, the longest match is chosen.
]] — <http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#sparqlGrammar>

This doesn't establish a relative order between terminals implied by ""'d strings in the productions vs. explicit terminals like "LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*". After failing a few tests, people would likely add an order to make "@base" and "@prefix" parse as implicit terminals and never parse them as language tags. We can be much more explicit if use the above production for RDFLiteral. An aesthetic option would be to break it up for semantic clarity:

  RDFLiteral  ::= String (LanguageTag | '^^' iri)?
  LanguageTag ::= BASE | PREFIX | LANGTAG

I've commited that for everyone's viewing pleasure.

I also found some errors in STRING_LITERAL ("s vs. 's reverse, so 's not allowed within "" string). I'm now validating with this text (note the long quotes):
[[
[] <p> <o1>, "o2", [ <p2> _:o3 ] ;
   <p3> ( <o4> "o5"@base "o5"@prefix _:o6 [ <p4> <o8> ] ), <o9> .
[ <p5> """o10
""line"" '''2'''""", '''o11
''line'' """3"""'''^^<integer> ;
  <p6> 12, +12, -12,                   # [+-]? [0-9]+
       13.0, +13.0, -13.0,             # [+-]? [0-9]* '.' [0-9]+ with *=2
       .0, +.0, -.0,                   # [+-]? [0-9]* '.' [0-9]+ with *=0
       14.E0, +14.E0, -14.E0,          # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=0
       14.0E0, +14.0E0,                # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=1
       .14E2, +.14E2, -.14E2, -14.0E0, # [+-]? '.' [0-9]+ EXPONENT
       1.4E1, +1.4E1, -1.4E1,          # [+-]? [0-9]+ EXPONENT)
       14e0, 14e+0, 14e-0              # [eE] [+-]? [0-9]+
].
[[


> Also please make sure updates to the grammar are also checked into the
> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/turtle.bnf not
> only the HTML.

will do.
-- 
-ericP

Received on Friday, 15 June 2012 18:57:22 UTC