W3C home > Mailing lists > Public > public-rdf-wg@w3.org > March 2011

Re: ISSUE-18: How do we parse "18." in Turtle?

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 30 Mar 2011 13:46:17 -0400
To: nathan@webr3.org
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <1301507177.2139.863.camel@waldron>
On Wed, 2011-03-30 at 17:02 +0100, Nathan wrote:
> RDF Working Group Issue Tracker wrote:
> > ISSUE-18: How do we parse "18." in Turtle?
> 
> Is there a case where it's ambiguous?
> 
> must be an integer (else not a valid turtle doc):
>    <e:f> <b:f> 18.
>    <e:f> <b:f> "e".

I'm just getting up to speed on this, but what I'm figuring out is
that's not a valid turtle document.  So our problem is not ambiguity,
it's a counter-intuitive and ugly syntax (sometimes requiring spaces
before period).

The key point is that the BNF does not alone define the grammar, there's
is also a comment, "When choosing a rule to match, the longest match is
chosen", which settles the ambiguity, requiring greedy lexers.  Then
DECIMAL is defined like this [1][2][3]:

        DECIMAL   ::=   [0-9]+ '.' [0-9]*

So, this means the dot is part of the number, not a statement
terminator, and the document isn't turtle (or a SPARQL BGP).  Sad, but
true (I'm learning)...

As far as I can tell, this is trivial to fix in the spec by changing
that * to a + (as in doubles in N3):

        DECIMAL   ::=   [0-9]+ '.' [0-9]+   # require 1+ digit after dot

Andy said this is out of scope for SPARQL to fix, but I'm wondering if
it isn't just an erratum, and thus in scope.   How many folks write
SPARQL decimal numbers without trailing digits, relying on this
behavior, before the "}"?

> must be an integer (else not a valid turtle doc):
>    <e:f> <b:f> 18.

Also not a valid turtle doc, for the same reason.   If we took out the
requirement for greedy lexing,  this would ambiguous in SPARQL BGP,
where the trailing dot is optional.  In Turtle, where the dot is
required, it would then parse as integer (but you'd need a more
sophisticated parser).

Anyway, this seems like a very easy fix to the grammar, as long as it
doesn't have some deeper consequences I'm missing.

    -- Sandro

[1] http://www.w3.org/TR/rdf-sparql-query/#rDECIMAL
[2] http://www.dajobe.org/2004/01/turtle/2006-12-04/
[3] http://www.w3.org/2010/01/Turtle/#sec-grammar-grammar

> must be an decimal (else not a valid turtle doc):
>    <e:f> <b:f> 18., "e".
> 
> must be an decimal (else not a valid turtle doc):
>    <e:f> <b:f> 18..
>    <e:f> <b:f> "e".
> 
> must be an decimal (else not a valid turtle doc):
>    <e:f> <b:f> 18..
> 
> Which case is under question?
> 
> 
Received on Wednesday, 30 March 2011 17:46:27 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:04 UTC