Re: whitespace in Turtle

Hmm, part of the "rules" that I understand for having built-up tokens in a 
grammar is that there is a deterministic break between the token rules and the 
higher-level rules, so that the parser does not have to re-call the tokenizer.

I don't think that Andy's solution would work, as a tokenizer could produce 
LANGTAG IRIREF from
@base <foo>

The usual solution to this sort of problem is to make @base be *only* BASE, 
which is probably not permissible here.

peter


On 05/15/2012 11:41 AM, Andy Seaborne wrote:
>>> Pragmatically, WS after @prefix and @base.  Then can have a token type
>>> that is "@alpha-alphanumericsanddash".
>>
>> What you don't like:
>>
>> [19]        LANGTAG        ::=    (BASE | PREFIX | '@' ([a-zA-Z])+ ('-' 
>> ([a-zA-Z0-9])+)
>>
>> ? ;)
>
> (to the casual reader : BASE is '@base' and PREFIX is '@prefix'
>
> Which is ambiguous - as it says:
>
> LANGTAG ::= ('@base' | '@prefix' | '@' ([a-zA-Z])+ ('-' ([a-zA-Z0-9])+)
>
> so the string "@base" matches two ways.
>
> But even if sorted out ... it means a tokenizer may well generate the token 
> LANGTAG ... and then:
>
> [5]        base        ::=     BASE IRIREF
>
> does not match as the token is LANGTAG, not BASE.  Oops.
>
>> Yes, I think requiring white space between @prefix and the prefix name
>> is a very good idea. More human readable.
>>
>> So much for last call review this week... sigh...
>>
>> --Gavin
>
> A simple fix that would be acceptable (to me at least) is:
>
> 1/ Remove BASE and PREFIX rules.
> 2/ Write explicit '@base' and '@prefix'
>
> [5]        base        ::=     '@base' IRIREF
> etc
>
> I think this makes it clear what is intended and communicates that the 
> LANGTAG tokenization does not apply (this exploits a common feature in 
> parser generates that read this as testing for a the string '@base' not 
> LANGTAG - I suspect this is what happens in yacker).
>
>     Andy
>
>

Received on Tuesday, 15 May 2012 16:11:25 UTC