- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 30 Mar 2011 13:46:17 -0400
- To: nathan@webr3.org
- Cc: RDF Working Group WG <public-rdf-wg@w3.org>
On Wed, 2011-03-30 at 17:02 +0100, Nathan wrote: > RDF Working Group Issue Tracker wrote: > > ISSUE-18: How do we parse "18." in Turtle? > > Is there a case where it's ambiguous? > > must be an integer (else not a valid turtle doc): > <e:f> <b:f> 18. > <e:f> <b:f> "e". I'm just getting up to speed on this, but what I'm figuring out is that's not a valid turtle document. So our problem is not ambiguity, it's a counter-intuitive and ugly syntax (sometimes requiring spaces before period). The key point is that the BNF does not alone define the grammar, there's is also a comment, "When choosing a rule to match, the longest match is chosen", which settles the ambiguity, requiring greedy lexers. Then DECIMAL is defined like this [1][2][3]: DECIMAL ::= [0-9]+ '.' [0-9]* So, this means the dot is part of the number, not a statement terminator, and the document isn't turtle (or a SPARQL BGP). Sad, but true (I'm learning)... As far as I can tell, this is trivial to fix in the spec by changing that * to a + (as in doubles in N3): DECIMAL ::= [0-9]+ '.' [0-9]+ # require 1+ digit after dot Andy said this is out of scope for SPARQL to fix, but I'm wondering if it isn't just an erratum, and thus in scope. How many folks write SPARQL decimal numbers without trailing digits, relying on this behavior, before the "}"? > must be an integer (else not a valid turtle doc): > <e:f> <b:f> 18. Also not a valid turtle doc, for the same reason. If we took out the requirement for greedy lexing, this would ambiguous in SPARQL BGP, where the trailing dot is optional. In Turtle, where the dot is required, it would then parse as integer (but you'd need a more sophisticated parser). Anyway, this seems like a very easy fix to the grammar, as long as it doesn't have some deeper consequences I'm missing. -- Sandro [1] http://www.w3.org/TR/rdf-sparql-query/#rDECIMAL [2] http://www.dajobe.org/2004/01/turtle/2006-12-04/ [3] http://www.w3.org/2010/01/Turtle/#sec-grammar-grammar > must be an decimal (else not a valid turtle doc): > <e:f> <b:f> 18., "e". > > must be an decimal (else not a valid turtle doc): > <e:f> <b:f> 18.. > <e:f> <b:f> "e". > > must be an decimal (else not a valid turtle doc): > <e:f> <b:f> 18.. > > Which case is under question? > >
Received on Wednesday, 30 March 2011 17:46:27 UTC