- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Sun, 26 Feb 2012 15:05:04 +0000
- To: public-rdf-comments@w3.org
On 24/02/12 14:45, Alex Hall wrote: > On Fri, Feb 24, 2012 at 7:34 AM, Henry Story <henry.story@bblfish.net > <mailto:henry.story@bblfish.net>> wrote: > > In the current editors draft and spec we find > > http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/turtle.bnf > > LANGTAG ::= BASE > | PREFIX > | "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* > > BASE ::= "@base" > > PREFIX ::= "@prefix" > > RDFLiteral ::= String ( LANGTAG | ( "^^" IRIref ) )? > > > Interesting... Note that a language tag in Turtle (as in SPARQL) is > defined simply as '@' followed by one or more letters, with optional > hyphenated alphanumeric segments. Under this definition, '@base' and > '@prefix' are both valid language tags regardless of whether they are > explicitly included in the LANGTAG production using their BASE and > PREFIX rules. > > Now, I agree that it is confusing to have them included this way in the > LANGTAG definition. They aren't there in SPARQL, and they probably > shouldn't be in Turtle. My guess would be that this was transcribed > directly from the input grammar for some parser generator, and BASE and > PREFIX were added to LANGTAG to quiet some warnings about ambiguous tokens. Yes - there would be a fight over @base as directive and as a language tag. The other way round would work - define directives as langtags (!!!) and only allows two particular ones. OK for machines, less so for people reading the grammar and still be BNF. Parser generators do often allow literal "@base" to used and it means that string at that point but it's not BNF. They aren't in SPARQL because @base and @prefix aren't keywords elsewhere. You could write a single token for a literal with LANGTAG and/or datatype but it would be horrible (both prefix name and URI for the datatype would need to be spelt out). Putting the pieces in the tokens and assembling the whole literal in the grammar is easier for machine and person. A trick would be to make the end of the lexical for form tokens "@ or "^^ (+ internal whitespace) ... but then "a" is a problem. BCP47 is a tricky pile of rules because of the lengths of subitems affects their meaning and the parsing rules. But the language part: language = 2*3ALPHA ["-" extlang] / 4ALPHA / 5*8ALPHA does allow "base" and "prefix" (reserved and registered language subtag respectively). Andy
Received on Sunday, 26 February 2012 15:05:31 UTC