Re: SPARQL and Turtle Prefix Placement from Andy Seaborne on 2012-06-15 (public-rdf-wg@w3.org from June 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 15 Jun 2012 20:35:14 +0100
To: Eric Prud'hommeaux <eric@w3.org>
CC: Gavin Carothers <gavin@carothers.name>, public-rdf-wg@w3.org
Message-ID: <4FDB8E72.8020502@epimorphics.com>
Eric:

The problem with your way is that

[22]  LANGTAG  ::=  '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*

includes "@base" and "@prefix" already

 Andy

On 15/06/12 20:13, Andy Seaborne wrote:
> I prefer Gavin's approach.
>
> No BASE PREFIX; Put '@base' and '@prefix' in the directives.
>
> http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0353.html
>
> (and it works in parser generators I have used)
>
> Andy
>
> On 15/06/12 19:56, Eric Prud'hommeaux wrote:
>> * Gavin Carothers<gavin@carothers.name> [2012-06-15 10:44-0700]
>>> On Fri, Jun 15, 2012 at 9:48 AM, Eric Prud'hommeaux<eric@w3.org> wrote:
>>>> +[20] LANGTAG ::= BASE | PREFIX | '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>>>
>>>
>>> No, reverting back to the PREFIX BASE terminals is not acceptable.
>>> This was already the subject of review by Andy and Peter.
>>>
>>> Please see thread
>>> http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0347.html
>>> for discussion on the change from PREFIX BASE to a simpler LANGTAG.
>>
>> But that thread didn't terminate in consensus.
>> Andy's point
>> [[
>> (to the casual reader : BASE is '@base' and PREFIX is '@prefix'
>>
>> Which is ambiguous - as it says:
>>
>> LANGTAG ::= ('@base' | '@prefix' | '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)
>>
>> so the string "@base" matches two ways.
>>
>> But even if sorted out ... it means a tokenizer may well generate the
>> token LANGTAG ... and then:
>>
>> [5] base ::= BASE IRIREF
>>
>> does not match as the token is LANGTAG, not BASE. Oops.
>> ]]
>>
>> is addressed by moving the "BASE | PREFIX | " from LANGTAG to RDFLiteral:
>>
>> RDFLiteral ::= String (BASE | PREFIX | LANGTAG | '^^' iri)?
>>
>> Turtle doesn't talk about parsing rules (perhaps it should); SPARQL's
>> note 3 says [[
>> When tokenizing the input and choosing grammar rules, the longest
>> match is chosen.
>> ]] —<http://www.w3.org/2009/sparql/docs/query-1.1/rq25.xml#sparqlGrammar>
>>
>> This doesn't establish a relative order between terminals implied by
>> ""'d strings in the productions vs. explicit terminals like "LANGTAG
>> ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*". After failing a few tests,
>> people would likely add an order to make "@base" and "@prefix" parse
>> as implicit terminals and never parse them as language tags. We can be
>> much more explicit if use the above production for RDFLiteral. An
>> aesthetic option would be to break it up for semantic clarity:
>>
>> RDFLiteral ::= String (LanguageTag | '^^' iri)?
>> LanguageTag ::= BASE | PREFIX | LANGTAG
>>
>> I've commited that for everyone's viewing pleasure.
>>
>> I also found some errors in STRING_LITERAL ("s vs. 's reverse, so 's
>> not allowed within "" string). I'm now validating with this text (note
>> the long quotes):
>> [[
>> []<p> <o1>, "o2", [<p2> _:o3 ] ;
>> <p3> (<o4> "o5"@base "o5"@prefix _:o6 [<p4> <o8> ] ),<o9> .
>> [<p5> """o10
>> ""line"" '''2'''""", '''o11
>> ''line'' """3"""'''^^<integer> ;
>> <p6> 12, +12, -12, # [+-]? [0-9]+
>> 13.0, +13.0, -13.0, # [+-]? [0-9]* '.' [0-9]+ with *=2
>> .0, +.0, -.0, # [+-]? [0-9]* '.' [0-9]+ with *=0
>> 14.E0, +14.E0, -14.E0, # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=0
>> 14.0E0, +14.0E0, # [+-]? [0-9]+ '.' [0-9]* EXPONENT with *=1
>> .14E2, +.14E2, -.14E2, -14.0E0, # [+-]? '.' [0-9]+ EXPONENT
>> 1.4E1, +1.4E1, -1.4E1, # [+-]? [0-9]+ EXPONENT)
>> 14e0, 14e+0, 14e-0 # [eE] [+-]? [0-9]+
>> ].
>> [[
>>
>>
>>> Also please make sure updates to the grammar are also checked into the
>>> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/turtle.bnf not
>>> only the HTML.
>>
>> will do.
Received on Friday, 15 June 2012 19:35:44 UTC