Re: Keeping PrEfIx and BaSe Proposals

On 30/05/13 16:19, Gregg Kellogg wrote:
> On May 30, 2013, at 2:53 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>
>> On 30/05/13 03:28, Eric Prud'hommeaux wrote:
>>> * Sandro Hawke <sandro@w3.org> [2013-05-29 20:26-0400]
>>>> On 05/29/2013 12:29 PM, Gavin Carothers wrote:
>> ...
>>
>>>>> Example grammar change from gkellog:
>>>>>
>>>>> [4] prefixID ::= '@'? [Pp][Rr][Ee][Ff][Ii][Xx] PNAME_NS IRIREF "."?
>>>>> [5] base ::= '@'? [Bb][Aa][Ss][Ee] IRIREF "."?
>>>>>
>>>>
>>>> There's a lot to be said for that, yes.
>>>
>>> Is the intention that these all be valid:?
>>>    prefix : <> PREfix : <>
>>>    prefix : <> . PREfix : <> .
>>>    @ prefix : <> @ PREfix : <>
>>>    @ prefix : <> . @
>>>    PREFIX : <>
>>>    .
>>>
>>> Grammar nit: I like that SPARQL separates tokenizing from parsing (as
>>> does Turtle). We could follow suite with:
>>>
>>>      prefixID ::= '@'? PREFIX PNAME_NS IRIREF "."?
>>>      base ::= '@'? BASE IRIREF "."?
>>>      Terminals:
>>>      PREFIF ::= [Pp][Rr][Ee][Ff][Ii][Xx]
>>>      BASE ::= [Bb][Aa][Ss][Ee]
>>>
>>> or we use our current approach:
>>>
>>>      Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
>>>
>>> by striking the '@base', '@prefix'.
>>
>> 1/ The wider range of valid input (Eric's point) + a worse case.
>> 2/ Problems with token LANGTAG
>>
>> 1/ ==>
>> Eric - good catch.
>>
>> It's more than a grammar nit.
>>
>> Being a grammar rule and not a token rule, this allows whitespace between @ and prefix, rather than @prefix being a token, no whitespace. -1 to that; I'm not sure Gregg intended that.
>
> Indeed, my implementation creates terminals for BASE and PREFIX to solve this:
>
> PREFIX ::= "@"? [Pp][Rr][Ff][Ii][Xx]
> BASE ::= "@"?[Bb][Aa][Ss][Ee]
>
> By having them as terminals, they are implemented as regular expressions, and thus avoid the LL(1) conflict involved with having two rules start with "@"?. By ordering them before LANGTAG, they are tokenized before the LANGTAG expression is matched.
>
> Definitely shouldn't allow for any spaces between these letters.

I suspected it wasn't intended :-)

>
> In general, my interpretation of the EBNF is that terminals are treated as regular expressions, where the whitespace rule doesn't apply, and non-terminals use the whitespace rule.

That's my understanding.

>
> Gregg
>
>> I really don't like
>> ---------
>> @
>> prefix : <http://example/> .
>> ---------
>>
>> Actually, as gregg/gavin originally wrote as a grammar rule whitespace can occurs between any tokens and each letter is a token, it allows
>>
>> @ p r e f i x : <http://example/>.
>>
>> and
>>
>> @ p r e
>> f i x : <http://example/>.
>>
>> and I'm fairly confident that was not intended.
>>
>> 2/ ==>
>>
>> Technical point:
>>
>> LANGTAG is a token:
>>
>> [144s] 	LANGTAG 	::= 	'@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>>
>> and so tokenization will grab '@prefix'
>>
>> The LC grammar called out '@prefix' as a specific token which means it is not a problem, neither allowing internal white space, horizontal or vertical, nor having the LANGTAG token accept it.
>>
>> The grammar is supposed to be simple for easy implementation in handwritten, LL, and LALR styles.
>>
>> Eric's existing design (token for @prefix and @base) is better.
>>
>> 	Andy
>>
>>>
>>>
>>>>        -s
>>>>> Cheers,
>>>>> Gavin
>>>>
>>>>
>>>
>>
>
>

Received on Thursday, 30 May 2013 18:54:24 UTC