- From: Peter Frederick Patel-Schneider <pfps@research.bell-labs.com>
- Date: Wed, 30 Mar 2011 13:02:56 -0400
- To: <Yves.Raimond@bbc.co.uk>
- CC: <nathan@webr3.org>, <public-rdf-wg@w3.org>
There is indeed some ambuiguity (if you don't consider the whitespace pragmatics), but maybe not where you expect. Collections contain sequences of objects with no separator, so <a:f><b:f>(0.0). is ambiguous, unless you realize that an emitter cannot emit "0.0" for an itemlist of 0. and 0. :-) However, the problem is that when using the standard cheat (which is to solve this problem with a greedy lexer) you also view 18. as a single token whenever it appears, so on <e:f><b:f>18.<e:f><b:f>"e". a greedy lexer will (irretrievablely) attach the . to the 18 and cause a parsing error. This not correct behaviour because there is no possible ambiguity here, and thus the white space after 18 is not needed. (Yes, you could recover with some fancy parsing footwork, but I don't think anyone would do so.) In my view the problem really is that usually "use whitespace as necessary" ends up conforming with intuitions because it results in aa being a single variable name, and 12 being twelve. Everyone agrees that you always need to separate names with whitespace, even in languages that have operators that look like names. (Consider aplusb, for example. Does anything think that this could be interpreted as a+b?) The problem in Turtle is that intuitions are breaking down a bit. Sandro, in particular, doesn't like to see space before . as in <a:f><b:f><c:f> . I think that the problem is that . is both part of larger tokens and also a structural token in its own right. It's as if ( was allowed in variable names, and thus A( B(C) was valid, unambiguous syntax for a function call. So, what's my solution? I think that the Turtle document does not need to be changed here, and, moreover, that it *really* says that 18. is always a decimal, because recognition is not the same as parsing! So integers at the end of statements need to have white space before the statement-ending .. I do suggest that an exmaple would be useful. It would say that <e:f> <b:f> 18. is not a valid Turtle document because emiting 18. for an integer followed by a . would cause the 18. to be recognized as the decimal token 18. It would also be nice to explicitly say what part of the grammar is for tokenization. peter From: Yves Raimond <Yves.Raimond@bbc.co.uk> Subject: Re: ISSUE-18: How do we parse "18." in Turtle? Date: Wed, 30 Mar 2011 11:20:07 -0500 > On Wed, Mar 30, 2011 at 05:02:18PM +0100, Nathan wrote: >> RDF Working Group Issue Tracker wrote: >> >ISSUE-18: How do we parse "18." in Turtle? >> >> Is there a case where it's ambiguous? > > Apparently it is - all Turtle parsers have tried interpret it > differently. However we end up fixing it, that definitely proves this > part of the grammar needs to be more specific. > > Best, > y > >> >> must be an integer (else not a valid turtle doc): >> <e:f> <b:f> 18. >> <e:f> <b:f> "e". >> >> must be an integer (else not a valid turtle doc): >> <e:f> <b:f> 18. >> >> must be an decimal (else not a valid turtle doc): >> <e:f> <b:f> 18., "e". >> >> must be an decimal (else not a valid turtle doc): >> <e:f> <b:f> 18.. >> <e:f> <b:f> "e". >> >> must be an decimal (else not a valid turtle doc): >> <e:f> <b:f> 18.. >> >> Which case is under question? >> >
Received on Wednesday, 30 March 2011 17:03:48 UTC