Oddity in the SPARQL grammar. And Turtle. from Seaborne, Andy on 2005-11-14 (public-rdf-dawg@w3.org from October to December 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 14 Nov 2005 14:16:20 +0000
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <43789C34.70806@hp.com>

I thought I should bring this to the WG's attention.

The SPARQL grammar allows integers in triple patterns using the abbreiated 
form without quotes or explicit type.  2  for example.

But the grammar uses the NumberLiteral in GraphTerm for this:

[42] GraphTerm ::= ... | NumericLiteral | ...
[59] NumericLiteral ::= INTEGER | FLOATING_POINT
[73] INTEGER ::= [0-9]+

Wrong. What about negative numbers?

Turtle is the same BTW
http://www.dajobe.org/2004/01/turtle/#integer

N3 does the more natural allowing a sign.

SPARQL also allows expressions so it isn't a matter of allowing signed numbers 
as tokens. "+" and "-" are overloaded as unary and binary operators in things 
like  2-4  2-+-4 which is why number tokens are unsigned and it is the grammar 
that sorts out the binary/unary choice.

Possibilities:

1/ Adding (<PLUS>|<MINUS>) to GraphTerm rules works - it allows whitespace 
between the sign and the number.  - 2

2/ Adding tokens NEG_INTEGER, POS_INTEGER as <PLUS> and <MINUS> and the tokens 
for INTEGER and FPs, might work with a little effort - 2-4 becomes two tokens 
INTEGER NEG_INTEGER.

Haven't tried 2 out yet but I mention it because of an interaction of (1) with 
property paths (which is not currently an issue before the WG). Consider the 
sequence of characters  <x><p>+2.  Is the "+" part of the property path or the 
number?  Under (1), with the work in the grammar, it's ambiguous.  Under 2, 
it's not because of greedy tokization - it's the same sequence of tokens as 
<x><p>  +2.  Property path might just need explict support  <x>(<p>+)2  anyway.

3/ The other way is to use slightly different token sets for triples and 
expressions.  Triples have signed number tokens, expressions don't.  A lot of 
tokenizers have support this but I don't think basic lex has.  n3.n3 uses a 
context-sensitive technique to tell @prefix from the langtag literal ""@prefix.


Still thinking about it - advice welcome,

 Andy

(I found this on Friday when I cut out the Turtle part of the SPARQL grammar 
for a Turtle parser and was running against an N3 test suite).

Received on Monday, 14 November 2005 14:18:51 UTC