- From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Date: Wed, 18 Jan 2012 12:03:24 +0100
- To: Gavin Carothers <gavin@carothers.name>
- CC: RDF-WG WG <public-rdf-wg@w3.org>
Hi, I personnaly agree with Gavin's suggestion and would vote for the WG to endorse it. However, I find one of Gavin's arguments a bit too strong. See comment below. On 01/17/2012 07:50 PM, Gavin Carothers wrote: > I'm not sure if this should go to the RDFa WG as a personal comment, > or as part of the RDF WG's feed back. > > > Alignment with SPARQL and Turtle > > Both Turtle and SPARQL provide a mechanism for writing shortened IRIs. > The stated design goals of CURIEs are: > > CURIEs are designed from the ground up to be used in attribute > values. QNames are designed for unambiguously naming elements and > attributes. > CURIEs expand to IRIs, and any IRI can be represented by such an > expansion. QNames are treated as value pairs, but even if those pairs > are combined into a string, only a subset of IRIs can be represented. > CURIEs can be used in non-XML grammars, and can even be used in > XML languages that do not support XML Namespaces. QNames are limited > to XML Namespace-aware XML Applications. > > These exact same goals are met in Turtle and SPARQL using the concept > of Prefixed Names. Exactly how Prefixed Names and CURIEs are different > to end users is not very clear. Both use simple concatenation, both > work outside of XML, both are not value pairs. The given example > isbn:0321154991 is a perfectly valid prefixed name in both Turtle and > SPARQL. In fact all the example CURIEs in RDFa Core 1.1 and the RDFa > Primer are valid Prefixed Names (Some would require language specific > escaping in SPARQL or Turtle). However there are some differences. > > > CURIE grammar > > The grammar for CURIEs provided in RDFa Core 1.1 > > prefix ::= NCName > > reference ::= irelative-ref (as defined in [RFC3987]) > > curie ::= [ [ prefix ] ':' ] reference > > safe_curie ::= '[' [ [ prefix ] ':' ] reference ']' > > The grammars for prefixed names are well tested and have many > implementations. The CURIE grammar does not seem to have ANY > implementations. In fact even implementing the seemingly simple > grammar from RDFa Core 1.1 is very complicated. The grammar references > two other grammars. First the XML Namespaces grammar for NCName, which > allows a wider range of tokens then prefixed names do. Exactly what > uses cases those additional tokens are needed for is not clear. Some > examples: > > _1: is a valid CURIE prefix but NOT a valid Prefixed Named prefix. > ______: is a valid CURIE prefix but NOT a valid Prefixed Named prefix. > > In fact, the only clearly allowed set of tokens allowed by CURIE and > not Prefixed Names is prefixes containing _ as the first character. > Given that _ in the first character is used in RDFa, Turtle, and > SPARQL to reference blank nodes it seems unlikely that anyone uses _ > as the first character in their prefix names. Well, for the record, rdflib (python RDF library) *does* use _1: , _2: , _3: ... prefixes (prefices?) for generated namespace in Turtle. Although it is obviously a bug as it is illegal turtle and causes interoperability problems with other parsers (like Jena), I can see why the developpers came to this. In fact, before Jena refused to parse my rdflib-generated turtle, I never realized this was invalid turtle, and it never struck me as incoherent (although I realize it makes the parser's life a little more complicated). So I would not qualify this as "unlikely"... pa > > Moving on to the right hand side of the CURIE the grammar gets much > more amusing. First off the referenced grammar is not in the same form > as CURIE or XML which use W3C EBNF. The iri RFC use ABNF. This leads > to complicated reading for humans, and no clear way to use any > automated tool to build a CURIE a grammar. While the name > "irelative-ref" sounds like a relative IRI reference, the rule in > question is NOT limited to relative references. Host parts, IPv4 and > IPv6 segments are allowed as part of irelative-refs. This is not > exactly expected. Again what use case is served in allowing CURIEs > like: > > {'prefix' : 'http://purl.org/example/'} > > prefix://user:password[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/ > > These are very easy to confuse with normal IRIs. In general it seems > that the intent of CURIEs was to limit the right hand side to relative > references but that is not accomplished by using the "irelative-ref" > production from the IRI RFC. > > Recommendation: > > Align RDFa with SPARQL and Turtle prefix names. It should be possible > to create a simpler grammar for "CURIEs"/Prefixed Names based on the > SPARQL and Turtle productions bellow: > > [s157] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | > [#x0300-#x036F] | [#x203F-#x2040] > [s158] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? > [s159] PN_LOCAL ::= (PN_CHARS_U | [0-9] | PLX ) ( ( PN_CHARS | > '.' | PLX )* ( PN_CHARS | PLX ) ) ? > > [s160] PLX ::= PERCENT | PN_LOCAL_ESC > [s161] PERCENT ::= '%' HEX HEX > [s162] HEX ::= [0-9] | [A-F] | [a-f] > [s163] PN_LOCAL_ESC ::= '\' ( '_' | '~' | '.' | '-' | '!' | '$' > | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' | > '?' | '#' | '@' | '%' ) > > Main differences are likely to be that most of the PN_LOCAL_ESC > characters should be allowed in RDFa grammar directly, as the host > languages (XML and HTML) provide for escaping mechanisms already. > While this would be a backwards incompatible change the real effect on > deployed data and software is likely to be low. I am unaware of any > RDFa implementation that uses the CURIE grammar as specified, and have > not ever encountered RDFa data in the wild that uses the odder values > the current grammar productions allow. > > --Gavin >
Received on Wednesday, 18 January 2012 11:04:14 UTC