Re: whitespace in Turtle

On Tue, May 15, 2012 at 8:17 AM, Andy Seaborne
<andy.seaborne@epimorphics.com> wrote:
>
>
> On 15/05/12 16:01, Gavin Carothers wrote:
>>
>> On Tue, May 15, 2012 at 7:46 AM, Peter F. Patel-Schneider
>> <pfpschneider@gmail.com>  wrote:
>>>
>>> The Turtle editor's draft says that WS is needed to prevent
>>> mis-recognition of tokens, but doesn't explicitly define a token.
>>
>>
>> token, terminal, whatever ;)
>>
>>>
>>> If the parsing Turtle depends on tokenizing, then there needs to be
>>> an explicit definition of what a token is, and, further, what
>>> mis-recognizing a token means.
>>
>>
>> Preposed new language:
>>
>> White space (production WS) is used to separate two terminals which
>> would otherwise be (mis-)recognized as one terminal. Rule names
>> below in capitals indicate where white space is significant; these
>> form a possible choice of terminals for constructing a Turtle
>> parser.
>>
>> White space is significant in terminal IRIREF and the production
>> String.
>>
>> ---
>>
>> Also, split that grammar table and identify all terminals with more
>> than just all caps.
>>
>>>
>>> For example, is
>>>
>>> @prefixprefix:<foo>.
>>>
>>> a valid Turtle statement?
>>
>>
>> Yes.
>
>
> Err - somewhat tricky to integrate in with common tokenizer/parser
> approaches given the wonders of language tags. Please ban it ("it" = a
> lack of WS after @prefix).
>
> [disclosure - my parsers don't care but, for speed, they are handwritten
> and that means content sensitive tokenizing or messing with pushback
> onto the input stream are doable - using tokenizer/parser toolkits may
> make this messy]
>
> I note that the Turtle submission bans it.
>
> [4]     prefixID        ::=     '@prefix' ws+ prefixName? ':' uriref
>
>
>>
>>>
>>> Things would get even worse if the @ was allowed to be dropped,
>>> which is a good reason to vote against allowing dropping of @.
>
>
> Not true!  The present of the ":" is enough to distinguish bareword keywords
> and prefixes.
>
> If @ were not also used for language tags, single leading char might
> make a parser writers life easier.  But we have language tags starting
> with @ and @prefix is a legal language tag (just unregistered).
>
> Pragmatically, WS after @prefix and @base.  Then can have a token type
> that is "@alpha-alphanumericsanddash".

What you don't like:

[19]  LANGTAG  ::= (BASE | PREFIX | '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)

? ;)

Yes, I think requiring white space between @prefix and the prefix name
is a very good idea. More human readable.

So much for last call review this week... sigh...

--Gavin

>
>        Andy
>
>
>> Yes, while it is possible to create a grammar that allows this in
>> general removing @ is likely to reduce human readability.
>
>
>
>>
>>>
>>> peter
>>>
>>>
>>
>

Received on Tuesday, 15 May 2012 15:28:32 UTC