Re: Keeping PrEfIx and BaSe Proposals

On May 30, 2013, at 2:53 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:

> On 30/05/13 03:28, Eric Prud'hommeaux wrote:
>> * Sandro Hawke <sandro@w3.org> [2013-05-29 20:26-0400]
>>> On 05/29/2013 12:29 PM, Gavin Carothers wrote:
> ...
> 
>>>> Example grammar change from gkellog:
>>>> 
>>>> [4] prefixID ::= '@'? [Pp][Rr][Ee][Ff][Ii][Xx] PNAME_NS IRIREF "."?
>>>> [5] base ::= '@'? [Bb][Aa][Ss][Ee] IRIREF "."?
>>>> 
>>> 
>>> There's a lot to be said for that, yes.
>> 
>> Is the intention that these all be valid:?
>>   prefix : <> PREfix : <>
>>   prefix : <> . PREfix : <> .
>>   @ prefix : <> @ PREfix : <>
>>   @ prefix : <> . @
>>   PREFIX : <>
>>   .
>> 
>> Grammar nit: I like that SPARQL separates tokenizing from parsing (as
>> does Turtle). We could follow suite with:
>> 
>>     prefixID ::= '@'? PREFIX PNAME_NS IRIREF "."?
>>     base ::= '@'? BASE IRIREF "."?
>>     Terminals:
>>     PREFIF ::= [Pp][Rr][Ee][Ff][Ii][Xx]
>>     BASE ::= [Bb][Aa][Ss][Ee]
>> 
>> or we use our current approach:
>> 
>>     Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
>> 
>> by striking the '@base', '@prefix'.
> 
> 1/ The wider range of valid input (Eric's point) + a worse case.
> 2/ Problems with token LANGTAG
> 
> 1/ ==>
> Eric - good catch.
> 
> It's more than a grammar nit.
> 
> Being a grammar rule and not a token rule, this allows whitespace between @ and prefix, rather than @prefix being a token, no whitespace. -1 to that; I'm not sure Gregg intended that.

Indeed, my implementation creates terminals for BASE and PREFIX to solve this:

PREFIX ::= "@"? [Pp][Rr][Ff][Ii][Xx]
BASE ::= "@"?[Bb][Aa][Ss][Ee]

By having them as terminals, they are implemented as regular expressions, and thus avoid the LL(1) conflict involved with having two rules start with "@"?. By ordering them before LANGTAG, they are tokenized before the LANGTAG expression is matched.

Definitely shouldn't allow for any spaces between these letters.

In general, my interpretation of the EBNF is that terminals are treated as regular expressions, where the whitespace rule doesn't apply, and non-terminals use the whitespace rule.

Gregg

> I really don't like
> ---------
> @
> prefix : <http://example/> .
> ---------
> 
> Actually, as gregg/gavin originally wrote as a grammar rule whitespace can occurs between any tokens and each letter is a token, it allows
> 
> @ p r e f i x : <http://example/>.
> 
> and
> 
> @ p r e
> f i x : <http://example/>.
> 
> and I'm fairly confident that was not intended.
> 
> 2/ ==>
> 
> Technical point:
> 
> LANGTAG is a token:
> 
> [144s] 	LANGTAG 	::= 	'@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
> 
> and so tokenization will grab '@prefix'
> 
> The LC grammar called out '@prefix' as a specific token which means it is not a problem, neither allowing internal white space, horizontal or vertical, nor having the LANGTAG token accept it.
> 
> The grammar is supposed to be simple for easy implementation in handwritten, LL, and LALR styles.
> 
> Eric's existing design (token for @prefix and @base) is better.
> 
> 	Andy
> 
>> 
>> 
>>>       -s
>>>> Cheers,
>>>> Gavin
>>> 
>>> 
>> 
> 

Received on Thursday, 30 May 2013 15:19:40 UTC