Re: Keeping PrEfIx and BaSe Proposals

On May 30, 2013, at 8:19 AM, Gregg Kellogg <gregg@greggkellogg.net> wrote:

> On May 30, 2013, at 2:53 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
> 
>> On 30/05/13 03:28, Eric Prud'hommeaux wrote:
>>> * Sandro Hawke <sandro@w3.org> [2013-05-29 20:26-0400]
>>>> On 05/29/2013 12:29 PM, Gavin Carothers wrote:
>> ...
>> 
>>>>> Example grammar change from gkellog:
>>>>> 
>>>>> [4] prefixID ::= '@'? [Pp][Rr][Ee][Ff][Ii][Xx] PNAME_NS IRIREF "."?
>>>>> [5] base ::= '@'? [Bb][Aa][Ss][Ee] IRIREF "."?
>>>>> 
>>>> 
>>>> There's a lot to be said for that, yes.
>>> 
>>> Is the intention that these all be valid:?
>>>  prefix : <> PREfix : <>
>>>  prefix : <> . PREfix : <> .
>>>  @ prefix : <> @ PREfix : <>
>>>  @ prefix : <> . @
>>>  PREFIX : <>
>>>  .
>>> 
>>> Grammar nit: I like that SPARQL separates tokenizing from parsing (as
>>> does Turtle). We could follow suite with:
>>> 
>>>    prefixID ::= '@'? PREFIX PNAME_NS IRIREF "."?
>>>    base ::= '@'? BASE IRIREF "."?
>>>    Terminals:
>>>    PREFIF ::= [Pp][Rr][Ee][Ff][Ii][Xx]
>>>    BASE ::= [Bb][Aa][Ss][Ee]
>>> 
>>> or we use our current approach:
>>> 
>>>    Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
>>> 
>>> by striking the '@base', '@prefix'.
>> 
>> 1/ The wider range of valid input (Eric's point) + a worse case.
>> 2/ Problems with token LANGTAG
>> 
>> 1/ ==>
>> Eric - good catch.
>> 
>> It's more than a grammar nit.
>> 
>> Being a grammar rule and not a token rule, this allows whitespace between @ and prefix, rather than @prefix being a token, no whitespace. -1 to that; I'm not sure Gregg intended that.
> 
> Indeed, my implementation creates terminals for BASE and PREFIX to solve this:
> 
> PREFIX ::= "@"? [Pp][Rr][Ff][Ii][Xx]
> BASE ::= "@"?[Bb][Aa][Ss][Ee]
> 
> By having them as terminals, they are implemented as regular expressions, and thus avoid the LL(1) conflict involved with having two rules start with "@"?. By ordering them before LANGTAG, they are tokenized before the LANGTAG expression is matched.
> 
> Definitely shouldn't allow for any spaces between these letters.
> 
> In general, my interpretation of the EBNF is that terminals are treated as regular expressions, where the whitespace rule doesn't apply, and non-terminals use the whitespace rule.
> 
> Gregg

I went ahead and updated my Turtle processor based on this discussion allowing case-insenstive "@"?base/prefix with optional trailing '.'. I also made "a" be case-insensitive to match "A" as well. As a result, i now fail three tests which specifically look for these to be syntax errors:

* turtle-syntax-bad-base-02
* turtle-syntax-bad-base-03
* turtle-syntax-bad-kw-01

Even if the group does not decide to add this to the spec I plan to retain it in my implementation, as it does no harm and makes things more consistent.

Gregg

>> I really don't like
>> ---------
>> @
>> prefix : <http://example/> .
>> ---------
>> 
>> Actually, as gregg/gavin originally wrote as a grammar rule whitespace can occurs between any tokens and each letter is a token, it allows
>> 
>> @ p r e f i x : <http://example/>.
>> 
>> and
>> 
>> @ p r e
>> f i x : <http://example/>.
>> 
>> and I'm fairly confident that was not intended.
>> 
>> 2/ ==>
>> 
>> Technical point:
>> 
>> LANGTAG is a token:
>> 
>> [144s] 	LANGTAG 	::= 	'@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>> 
>> and so tokenization will grab '@prefix'
>> 
>> The LC grammar called out '@prefix' as a specific token which means it is not a problem, neither allowing internal white space, horizontal or vertical, nor having the LANGTAG token accept it.
>> 
>> The grammar is supposed to be simple for easy implementation in handwritten, LL, and LALR styles.
>> 
>> Eric's existing design (token for @prefix and @base) is better.
>> 
>> 	Andy
>> 
>>> 
>>> 
>>>>      -s
>>>>> Cheers,
>>>>> Gavin
>>>> 
>>>> 
>>> 
>> 
> 

Received on Thursday, 30 May 2013 22:04:29 UTC