- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Thu, 30 May 2013 15:03:58 -0700
- To: "public-rdf-wg@w3.org WG" <public-rdf-wg@w3.org>
On May 30, 2013, at 8:19 AM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
> On May 30, 2013, at 2:53 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>
>> On 30/05/13 03:28, Eric Prud'hommeaux wrote:
>>> * Sandro Hawke <sandro@w3.org> [2013-05-29 20:26-0400]
>>>> On 05/29/2013 12:29 PM, Gavin Carothers wrote:
>> ...
>>
>>>>> Example grammar change from gkellog:
>>>>>
>>>>> [4] prefixID ::= '@'? [Pp][Rr][Ee][Ff][Ii][Xx] PNAME_NS IRIREF "."?
>>>>> [5] base ::= '@'? [Bb][Aa][Ss][Ee] IRIREF "."?
>>>>>
>>>>
>>>> There's a lot to be said for that, yes.
>>>
>>> Is the intention that these all be valid:?
>>> prefix : <> PREfix : <>
>>> prefix : <> . PREfix : <> .
>>> @ prefix : <> @ PREfix : <>
>>> @ prefix : <> . @
>>> PREFIX : <>
>>> .
>>>
>>> Grammar nit: I like that SPARQL separates tokenizing from parsing (as
>>> does Turtle). We could follow suite with:
>>>
>>> prefixID ::= '@'? PREFIX PNAME_NS IRIREF "."?
>>> base ::= '@'? BASE IRIREF "."?
>>> Terminals:
>>> PREFIF ::= [Pp][Rr][Ee][Ff][Ii][Xx]
>>> BASE ::= [Bb][Aa][Ss][Ee]
>>>
>>> or we use our current approach:
>>>
>>> Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive.
>>>
>>> by striking the '@base', '@prefix'.
>>
>> 1/ The wider range of valid input (Eric's point) + a worse case.
>> 2/ Problems with token LANGTAG
>>
>> 1/ ==>
>> Eric - good catch.
>>
>> It's more than a grammar nit.
>>
>> Being a grammar rule and not a token rule, this allows whitespace between @ and prefix, rather than @prefix being a token, no whitespace. -1 to that; I'm not sure Gregg intended that.
>
> Indeed, my implementation creates terminals for BASE and PREFIX to solve this:
>
> PREFIX ::= "@"? [Pp][Rr][Ff][Ii][Xx]
> BASE ::= "@"?[Bb][Aa][Ss][Ee]
>
> By having them as terminals, they are implemented as regular expressions, and thus avoid the LL(1) conflict involved with having two rules start with "@"?. By ordering them before LANGTAG, they are tokenized before the LANGTAG expression is matched.
>
> Definitely shouldn't allow for any spaces between these letters.
>
> In general, my interpretation of the EBNF is that terminals are treated as regular expressions, where the whitespace rule doesn't apply, and non-terminals use the whitespace rule.
>
> Gregg
I went ahead and updated my Turtle processor based on this discussion allowing case-insenstive "@"?base/prefix with optional trailing '.'. I also made "a" be case-insensitive to match "A" as well. As a result, i now fail three tests which specifically look for these to be syntax errors:
* turtle-syntax-bad-base-02
* turtle-syntax-bad-base-03
* turtle-syntax-bad-kw-01
Even if the group does not decide to add this to the spec I plan to retain it in my implementation, as it does no harm and makes things more consistent.
Gregg
>> I really don't like
>> ---------
>> @
>> prefix : <http://example/> .
>> ---------
>>
>> Actually, as gregg/gavin originally wrote as a grammar rule whitespace can occurs between any tokens and each letter is a token, it allows
>>
>> @ p r e f i x : <http://example/>.
>>
>> and
>>
>> @ p r e
>> f i x : <http://example/>.
>>
>> and I'm fairly confident that was not intended.
>>
>> 2/ ==>
>>
>> Technical point:
>>
>> LANGTAG is a token:
>>
>> [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>>
>> and so tokenization will grab '@prefix'
>>
>> The LC grammar called out '@prefix' as a specific token which means it is not a problem, neither allowing internal white space, horizontal or vertical, nor having the LANGTAG token accept it.
>>
>> The grammar is supposed to be simple for easy implementation in handwritten, LL, and LALR styles.
>>
>> Eric's existing design (token for @prefix and @base) is better.
>>
>> Andy
>>
>>>
>>>
>>>> -s
>>>>> Cheers,
>>>>> Gavin
>>>>
>>>>
>>>
>>
>
Received on Thursday, 30 May 2013 22:04:29 UTC