- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Thu, 30 May 2013 15:03:58 -0700
- To: "public-rdf-wg@w3.org WG" <public-rdf-wg@w3.org>
On May 30, 2013, at 8:19 AM, Gregg Kellogg <gregg@greggkellogg.net> wrote: > On May 30, 2013, at 2:53 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote: > >> On 30/05/13 03:28, Eric Prud'hommeaux wrote: >>> * Sandro Hawke <sandro@w3.org> [2013-05-29 20:26-0400] >>>> On 05/29/2013 12:29 PM, Gavin Carothers wrote: >> ... >> >>>>> Example grammar change from gkellog: >>>>> >>>>> [4] prefixID ::= '@'? [Pp][Rr][Ee][Ff][Ii][Xx] PNAME_NS IRIREF "."? >>>>> [5] base ::= '@'? [Bb][Aa][Ss][Ee] IRIREF "."? >>>>> >>>> >>>> There's a lot to be said for that, yes. >>> >>> Is the intention that these all be valid:? >>> prefix : <> PREfix : <> >>> prefix : <> . PREfix : <> . >>> @ prefix : <> @ PREfix : <> >>> @ prefix : <> . @ >>> PREFIX : <> >>> . >>> >>> Grammar nit: I like that SPARQL separates tokenizing from parsing (as >>> does Turtle). We could follow suite with: >>> >>> prefixID ::= '@'? PREFIX PNAME_NS IRIREF "."? >>> base ::= '@'? BASE IRIREF "."? >>> Terminals: >>> PREFIF ::= [Pp][Rr][Ee][Ff][Ii][Xx] >>> BASE ::= [Bb][Aa][Ss][Ee] >>> >>> or we use our current approach: >>> >>> Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in double quotes ("BASE", "PREFIX") are case-insensitive. >>> >>> by striking the '@base', '@prefix'. >> >> 1/ The wider range of valid input (Eric's point) + a worse case. >> 2/ Problems with token LANGTAG >> >> 1/ ==> >> Eric - good catch. >> >> It's more than a grammar nit. >> >> Being a grammar rule and not a token rule, this allows whitespace between @ and prefix, rather than @prefix being a token, no whitespace. -1 to that; I'm not sure Gregg intended that. > > Indeed, my implementation creates terminals for BASE and PREFIX to solve this: > > PREFIX ::= "@"? [Pp][Rr][Ff][Ii][Xx] > BASE ::= "@"?[Bb][Aa][Ss][Ee] > > By having them as terminals, they are implemented as regular expressions, and thus avoid the LL(1) conflict involved with having two rules start with "@"?. By ordering them before LANGTAG, they are tokenized before the LANGTAG expression is matched. > > Definitely shouldn't allow for any spaces between these letters. > > In general, my interpretation of the EBNF is that terminals are treated as regular expressions, where the whitespace rule doesn't apply, and non-terminals use the whitespace rule. > > Gregg I went ahead and updated my Turtle processor based on this discussion allowing case-insenstive "@"?base/prefix with optional trailing '.'. I also made "a" be case-insensitive to match "A" as well. As a result, i now fail three tests which specifically look for these to be syntax errors: * turtle-syntax-bad-base-02 * turtle-syntax-bad-base-03 * turtle-syntax-bad-kw-01 Even if the group does not decide to add this to the spec I plan to retain it in my implementation, as it does no harm and makes things more consistent. Gregg >> I really don't like >> --------- >> @ >> prefix : <http://example/> . >> --------- >> >> Actually, as gregg/gavin originally wrote as a grammar rule whitespace can occurs between any tokens and each letter is a token, it allows >> >> @ p r e f i x : <http://example/>. >> >> and >> >> @ p r e >> f i x : <http://example/>. >> >> and I'm fairly confident that was not intended. >> >> 2/ ==> >> >> Technical point: >> >> LANGTAG is a token: >> >> [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* >> >> and so tokenization will grab '@prefix' >> >> The LC grammar called out '@prefix' as a specific token which means it is not a problem, neither allowing internal white space, horizontal or vertical, nor having the LANGTAG token accept it. >> >> The grammar is supposed to be simple for easy implementation in handwritten, LL, and LALR styles. >> >> Eric's existing design (token for @prefix and @base) is better. >> >> Andy >> >>> >>> >>>> -s >>>>> Cheers, >>>>> Gavin >>>> >>>> >>> >> >
Received on Thursday, 30 May 2013 22:04:29 UTC