Re: ISSUE-18: How do we parse "18." in Turtle? from Peter Frederick Patel-Schneider on 2011-03-30 (public-rdf-wg@w3.org from March 2011)

From: Peter Frederick Patel-Schneider <pfps@research.bell-labs.com>
Date: Wed, 30 Mar 2011 13:02:56 -0400
To: <Yves.Raimond@bbc.co.uk>
CC: <nathan@webr3.org>, <public-rdf-wg@w3.org>
Message-ID: <20110330.130256.2004468220051810277.pfps@research.bell-labs.com>

There is indeed some ambuiguity (if you don't consider the whitespace
pragmatics), but maybe not where you expect. Collections contain
sequences of objects with no separator, so 
<a:f><b:f>(0.0).
is ambiguous, unless you realize that an emitter cannot emit 
"0.0" for an itemlist of 0. and 0.  :-)

However, the problem is that when using the standard cheat (which is to
solve this problem with a greedy lexer) you also view 18. as a single
token whenever it appears, so on  
<e:f><b:f>18.<e:f><b:f>"e".
a greedy lexer will (irretrievablely) attach the . to the 18 and cause a 
parsing error.   This not correct behaviour because there is no
possible ambiguity here, and thus the white space after 18 is not
needed.   (Yes, you could recover with some fancy parsing footwork, but
I don't think anyone would do so.)

In my view the problem really is that usually "use whitespace as
necessary" ends up conforming with intuitions because it results in aa
being a single variable name, and 12 being twelve.  Everyone agrees that
you always need to separate names with whitespace, even in languages
that have operators that look like names.  (Consider aplusb, for
example.  Does anything think that this could be interpreted as a+b?)

The problem in Turtle is that intuitions are breaking down a bit.
Sandro, in particular, doesn't like to see space before . as in
<a:f><b:f><c:f> .
I think that the problem is that . is both part of larger tokens and
also a structural token in its own right.  It's as if ( was allowed in
variable names, and thus A( B(C) was valid, unambiguous syntax for a
function call. 

So, what's my solution?  I think that the Turtle document does not need
to be changed here, and, moreover, that it *really* says that 18. is
always a decimal, because recognition is not the same as parsing!  So
integers at the end of statements need to have white space before the
statement-ending .. 

I do suggest that an exmaple would be useful.   It would say that
   <e:f> <b:f> 18.
is not a valid Turtle document because emiting 18. for an integer
followed by a . would cause the 18. to be recognized as the decimal
token 18.

It would also be nice to explicitly say what part of the grammar is for
tokenization.  

peter

From: Yves Raimond <Yves.Raimond@bbc.co.uk>
Subject: Re: ISSUE-18: How do we parse "18." in Turtle?
Date: Wed, 30 Mar 2011 11:20:07 -0500

> On Wed, Mar 30, 2011 at 05:02:18PM +0100, Nathan wrote:
>> RDF Working Group Issue Tracker wrote:
>> >ISSUE-18: How do we parse "18." in Turtle?
>> 
>> Is there a case where it's ambiguous?
> 
> Apparently it is - all Turtle parsers have tried interpret it
> differently. However we end up fixing it, that definitely proves this
> part of the grammar needs to be more specific.
> 
> Best,
> y
> 
>> 
>> must be an integer (else not a valid turtle doc):
>>   <e:f> <b:f> 18.
>>   <e:f> <b:f> "e".
>> 
>> must be an integer (else not a valid turtle doc):
>>   <e:f> <b:f> 18.
>> 
>> must be an decimal (else not a valid turtle doc):
>>   <e:f> <b:f> 18., "e".
>> 
>> must be an decimal (else not a valid turtle doc):
>>   <e:f> <b:f> 18..
>>   <e:f> <b:f> "e".
>> 
>> must be an decimal (else not a valid turtle doc):
>>   <e:f> <b:f> 18..
>> 
>> Which case is under question?
>> 
>

Received on Wednesday, 30 March 2011 17:03:48 UTC