Re: ISSUE-18: How do we parse "18." in Turtle?

On Mar 30, 2011, at 12:02 PM, Peter Frederick Patel-Schneider wrote:

> There is indeed some ambuiguity (if you don't consider the whitespace
> pragmatics), but maybe not where you expect. Collections contain
> sequences of objects with no separator, so 
> <a:f><b:f>(0.0).
> is ambiguous, unless you realize that an emitter cannot emit 
> "0.0" for an itemlist of 0. and 0.  :-)
> 
> However, the problem is that when using the standard cheat (which is to
> solve this problem with a greedy lexer) you also view 18. as a single
> token whenever it appears, so on  
> <e:f><b:f>18.<e:f><b:f>"e".
> a greedy lexer will (irretrievablely) attach the . to the 18 and cause a 
> parsing error.   This not correct behaviour because there is no
> possible ambiguity here, and thus the white space after 18 is not
> needed.   (Yes, you could recover with some fancy parsing footwork, but
> I don't think anyone would do so.)
> 
> In my view the problem really is that usually "use whitespace as
> necessary" ends up conforming with intuitions because it results in aa
> being a single variable name, and 12 being twelve.  Everyone agrees that
> you always need to separate names with whitespace, even in languages
> that have operators that look like names.  (Consider aplusb, for
> example.  Does anything think that this could be interpreted as a+b?)
> 
> The problem in Turtle is that intuitions are breaking down a bit.
> Sandro, in particular, doesn't like to see space before . as in
> <a:f><b:f><c:f> .
> I think that the problem is that . is both part of larger tokens and
> also a structural token in its own right.  It's as if ( was allowed in
> variable names, and thus A( B(C) was valid, unambiguous syntax for a
> function call. 
> 
> 
> So, what's my solution?  I think that the Turtle document does not need
> to be changed here, and, moreover, that it *really* says that 18. is
> always a decimal, because recognition is not the same as parsing!  So
> integers at the end of statements need to have white space before the
> statement-ending .. 

+1  I feel Sandro's pain, but the advantages of fast greedy lexers has to outweigh visual aesthetics. And in any case, I kind of like the spaces, they help my mental lexer when reading. 

We could say that the real statement end was <white>., but that the white can be omitted when its not needed, ie when the lexer would work without it. Then there is an obvious safe strategy, but Sandro can still avoid the worst headaches.

> 
> I do suggest that an exmaple would be useful.   It would say that
>   <e:f> <b:f> 18.
> is not a valid Turtle document because emiting 18. for an integer
> followed by a . would cause the 18. to be recognized as the decimal
> token 18.
> 
> It would also be nice to explicitly say what part of the grammar is for
> tokenization.  

And doing so would set this in stone, since <e:f><b:f>18.<e:f><b:f>"e". would indeed be unambiguously a *parse* error once the lexer has chewed it. (Reflecting my prejudice that productions used inside the lexer are not really part of the grammar.)

Pat


> 
> peter
> 
> 
> 
> From: Yves Raimond <Yves.Raimond@bbc.co.uk>
> Subject: Re: ISSUE-18: How do we parse "18." in Turtle?
> Date: Wed, 30 Mar 2011 11:20:07 -0500
> 
>> On Wed, Mar 30, 2011 at 05:02:18PM +0100, Nathan wrote:
>>> RDF Working Group Issue Tracker wrote:
>>>> ISSUE-18: How do we parse "18." in Turtle?
>>> 
>>> Is there a case where it's ambiguous?
>> 
>> Apparently it is - all Turtle parsers have tried interpret it
>> differently. However we end up fixing it, that definitely proves this
>> part of the grammar needs to be more specific.
>> 
>> Best,
>> y
>> 
>>> 
>>> must be an integer (else not a valid turtle doc):
>>>  <e:f> <b:f> 18.
>>>  <e:f> <b:f> "e".
>>> 
>>> must be an integer (else not a valid turtle doc):
>>>  <e:f> <b:f> 18.
>>> 
>>> must be an decimal (else not a valid turtle doc):
>>>  <e:f> <b:f> 18., "e".
>>> 
>>> must be an decimal (else not a valid turtle doc):
>>>  <e:f> <b:f> 18..
>>>  <e:f> <b:f> "e".
>>> 
>>> must be an decimal (else not a valid turtle doc):
>>>  <e:f> <b:f> 18..
>>> 
>>> Which case is under question?
>>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Wednesday, 30 March 2011 20:12:24 UTC