Re: whitespace in Turtle

On 15/05/12 16:01, Gavin Carothers wrote:
> On Tue, May 15, 2012 at 7:46 AM, Peter F. Patel-Schneider
> <pfpschneider@gmail.com>  wrote:
>> The Turtle editor's draft says that WS is needed to prevent
>> mis-recognition of tokens, but doesn't explicitly define a token.
>
> token, terminal, whatever ;)
>
>>
>> If the parsing Turtle depends on tokenizing, then there needs to be
>> an explicit definition of what a token is, and, further, what
>> mis-recognizing a token means.
>
> Preposed new language:
>
> White space (production WS) is used to separate two terminals which
> would otherwise be (mis-)recognized as one terminal. Rule names
> below in capitals indicate where white space is significant; these
> form a possible choice of terminals for constructing a Turtle
> parser.
>
> White space is significant in terminal IRIREF and the production
> String.
>
> ---
>
> Also, split that grammar table and identify all terminals with more
> than just all caps.
>
>>
>> For example, is
>>
>> @prefixprefix:<foo>.
>>
>> a valid Turtle statement?
>
> Yes.

Err - somewhat tricky to integrate in with common tokenizer/parser
approaches given the wonders of language tags. Please ban it ("it" = a
lack of WS after @prefix).

[disclosure - my parsers don't care but, for speed, they are handwritten
and that means content sensitive tokenizing or messing with pushback
onto the input stream are doable - using tokenizer/parser toolkits may
make this messy]

I note that the Turtle submission bans it.

[4] prefixID  ::=  '@prefix' ws+ prefixName? ':' uriref

>
>>
>> Things would get even worse if the @ was allowed to be dropped,
>> which is a good reason to vote against allowing dropping of @.

Not true!  The present of the ":" is enough to distinguish bareword 
keywords and prefixes.

If @ were not also used for language tags, single leading char might
make a parser writers life easier.  But we have language tags starting
with @ and @prefix is a legal language tag (just unregistered).

Pragmatically, WS after @prefix and @base.  Then can have a token type
that is "@alpha-alphanumericsanddash".

 Andy

> Yes, while it is possible to create a grammar that allows this in
> general removing @ is likely to reduce human readability.


>
>>
>> peter
>>
>>
>

Received on Tuesday, 15 May 2012 15:17:51 UTC