Re: SPARQL and Turtle Prefix Placement from Eric Prud'hommeaux on 2012-06-15 (public-rdf-wg@w3.org from June 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 15 Jun 2012 16:29:23 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: Gavin Carothers <gavin@carothers.name>, public-rdf-wg@w3.org
Message-ID: <20120615202922.GC27073@w3.org>
* Andy Seaborne <andy.seaborne@epimorphics.com> [2012-06-15 20:35+0100]
> Eric:
> 
> The problem with your way is that
> 
> [22]  LANGTAG  ::=  '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
> 
> includes "@base" and "@prefix" already

In order to match [[ "@base" IRIREF "." ]], I need to create some lexical token. Suppose I implement it like so:
  __BASE= '@' 'b' 'a' 's' 'e'
and I put that before LANGTAG in a lex file, I'll match __BASE instead of LANGTAG. I'll never parse "a"@base . If I re-order them, the parser will never see a LANGTAG token.


>  Andy
> 
> On 15/06/12 20:13, Andy Seaborne wrote:
> >I prefer Gavin's approach.
> >
> >No BASE PREFIX; Put '@base' and '@prefix' in the directives.
> >
> >http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0353.html
> >
> >(and it works in parser generators I have used)
> >
> >Andy
> >
> >On 15/06/12 19:56, Eric Prud'hommeaux wrote:
> >>* Gavin Carothers<gavin@carothers.name> [2012-06-15 10:44-0700]
> >>>On Fri, Jun 15, 2012 at 9:48 AM, Eric Prud'hommeaux<eric@w3.org> wrote:
> >>>>+[20] LANGTAG ::= BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
> >>>
> >>>
> >>>No, reverting back to the PREFIX BASE terminals is not acceptable.
> >>>This was already the subject of review by Andy and Peter.
> >>>
> >>>Please see thread
> >>>http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0347.html
> >>>for discussion on the change from PREFIX BASE to a simpler LANGTAG.
> >>
> >>But that thread didn't terminate in consensus.
> >>Andy's point
> >>[[
> >>(to the casual reader : BASE is '@base' and PREFIX is '@prefix'
> >>
> >>Which is ambiguous - as it says:
> >>
> >>LANGTAG ::= ('@base' | '@prefix' | '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)
> >>
> >>so the string "@base" matches two ways.
> >>
> >>But even if sorted out ... it means a tokenizer may well generate the
> >>token LANGTAG ... and then:
> >>
> >>[5] base ::= BASE IRIREF
> >>
> >>does not match as the token is LANGTAG, not BASE. Oops.
> >>]]
> >>
> >>is addressed by moving the "BASE | PREFIX | " from LANGTAG to RDFLiteral:
> >>
> >>RDFLiteral ::= String (BASE | PREFIX | LANGTAG | '^^' iri)?
> >>
> >>Turtle doesn't talk about parsing rules (perhaps it should); SPARQL's
> >>note 3 says [[
> >>When tokenizing the input and choosing grammar rules, the longest
> >>match is chosen.
> >>]] —<http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#sparqlGrammar>
> >>
> >>This doesn't establish a relative order between terminals implied by
> >>""'d strings in the productions vs. explicit terminals like "LANGTAG
> >>::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*". After failing a few tests,
> >>people would likely add an order to make "@base" and "@prefix" parse
> >>as implicit terminals and never parse them as language tags. We can be
> >>much more explicit if use the above production for RDFLiteral. An
> >>aesthetic option would be to break it up for semantic clarity:
> >>
> >>RDFLiteral ::= String (LanguageTag | '^^' iri)?
> >>LanguageTag ::= BASE | PREFIX | LANGTAG
> >>
> >>I've commited that for everyone's viewing pleasure.
> >>
> >>I also found some errors in STRING_LITERAL ("s vs. 's reverse, so 's
> >>not allowed within "" string). I'm now validating with this text (note
> >>the long quotes):
> >>[[
> >>[]<p> <o1>, "o2", [<p2> _:o3 ] ;
> >><p3> (<o4> "o5"@base "o5"@prefix _:o6 [<p4> <o8> ] ),<o9> .
> >>[<p5> """o10
> >>""line"" '''2'''""", '''o11
> >>''line'' """3"""'''^^<integer> ;
> >><p6> 12, +12, -12, # [+-]? [0-9]+
> >>13.0, +13.0, -13.0, # [+-]? [0-9]* '.' [0-9]+ with *=2
> >>.0, +.0, -.0, # [+-]? [0-9]* '.' [0-9]+ with *=0
> >>14.E0, +14.E0, -14.E0, # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=0
> >>14.0E0, +14.0E0, # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=1
> >>.14E2, +.14E2, -.14E2, -14.0E0, # [+-]? '.' [0-9]+ EXPONENT
> >>1.4E1, +1.4E1, -1.4E1, # [+-]? [0-9]+ EXPONENT)
> >>14e0, 14e+0, 14e-0 # [eE] [+-]? [0-9]+
> >>].
> >>[[
> >>
> >>
> >>>Also please make sure updates to the grammar are also checked into the
> >>>http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/turtle.bnf not
> >>>only the HTML.
> >>
> >>will do.

-- 
-ericP
Received on Friday, 15 June 2012 20:29:54 UTC