W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2012

Re: whitespace in Turtle

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 15 May 2012 16:17:21 +0100
Message-ID: <4FB27381.9010903@epimorphics.com>
To: public-rdf-wg@w3.org

On 15/05/12 16:01, Gavin Carothers wrote:
> On Tue, May 15, 2012 at 7:46 AM, Peter F. Patel-Schneider
> <pfpschneider@gmail.com>  wrote:
>> The Turtle editor's draft says that WS is needed to prevent
>> mis-recognition of tokens, but doesn't explicitly define a token.
> token, terminal, whatever ;)
>> If the parsing Turtle depends on tokenizing, then there needs to be
>> an explicit definition of what a token is, and, further, what
>> mis-recognizing a token means.
> Preposed new language:
> White space (production WS) is used to separate two terminals which
> would otherwise be (mis-)recognized as one terminal. Rule names
> below in capitals indicate where white space is significant; these
> form a possible choice of terminals for constructing a Turtle
> parser.
> White space is significant in terminal IRIREF and the production
> String.
> ---
> Also, split that grammar table and identify all terminals with more
> than just all caps.
>> For example, is
>> @prefixprefix:<foo>.
>> a valid Turtle statement?
> Yes.

Err - somewhat tricky to integrate in with common tokenizer/parser
approaches given the wonders of language tags. Please ban it ("it" = a
lack of WS after @prefix).

[disclosure - my parsers don't care but, for speed, they are handwritten
and that means content sensitive tokenizing or messing with pushback
onto the input stream are doable - using tokenizer/parser toolkits may
make this messy]

I note that the Turtle submission bans it.

[4]	prefixID 	::= 	'@prefix' ws+ prefixName? ':' uriref

>> Things would get even worse if the @ was allowed to be dropped,
>> which is a good reason to vote against allowing dropping of @.

Not true!  The present of the ":" is enough to distinguish bareword 
keywords and prefixes.

If @ were not also used for language tags, single leading char might
make a parser writers life easier.  But we have language tags starting
with @ and @prefix is a legal language tag (just unregistered).

Pragmatically, WS after @prefix and @base.  Then can have a token type
that is "@alpha-alphanumericsanddash".


> Yes, while it is possible to create a grammar that allows this in
> general removing @ is likely to reduce human readability.

>> peter
Received on Tuesday, 15 May 2012 15:17:51 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:05 UTC