Re: [Turtle] the Turtle Grammar in the revised editor's draft does not allow comments in Turtledoc from Andy Seaborne on 2011-03-08 (public-rdf-wg@w3.org from March 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 08 Mar 2011 18:32:43 +0000
To: Alex Hall <alexhall@revelytix.com>
CC: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <4D76764B.1090902@epimorphics.com>

On 08/03/11 17:57, Alex Hall wrote:
> On Tue, Mar 8, 2011 at 12:28 PM, Antoine Zimmermann
> <antoine.zimmermann@insa-lyon.fr
> <mailto:antoine.zimmermann@insa-lyon.fr>> wrote:
>
>     The grammar at http://www.w3.org/2010/01/Turtle/#prod-turtle2-WS has
>     a token called "PASSED TOKENS" which defines comments in Turtle, but
>     it cannot be reached from the root "turtleDoc".
>     It should be included in the <WS> token definition, I guess.
>
>
> I interpret that to mean that comments are recognized as tokens, but
> skipped by the lexer (i.e. not passed to the parser).  Of course that
> assumes an implementation that splits recognition into lexing and
> parsing stages -- I'm not aware of other types of recognizers but that
> doesn't mean they aren't out there.
>
> -Alex

yes - [[Section 4.2 Comments
...
Comments are treated as white space.
]]

like SPARQL, it's assumed they are removed at a low level, as tokens are 
formed.  Tools, e.g. javacc, and many other, can skip or hide comments.

[[
White space (production ws) is used to separate two tokens which would 
otherwise be (mis-)recognized as one token.
]]

Then the parser itself does not specify whitespace directly,

e.g.
[6]    	triples 	   ::=    	subject predicateObjectList

does not say <WS>* after 'subject'.  There would be a lot of <WS>* 
padding and you still have to talk about misrecognized tokens and it 
would not fit many tool chains.

I think "PASSED TOKENS" is a reflection of the tool chain Eric was using 
as indicated by it's rule name of [-].

	Andy

Received on Tuesday, 8 March 2011 18:33:21 UTC