- From: Gavin Carothers <gavin@carothers.name>
- Date: Wed, 18 Jan 2012 08:27:27 -0800
- To: public-rdfa-wg@w3.org
Alignment with SPARQL and Turtle Both Turtle and SPARQL provide a mechanism for writing shortened IRIs. The stated design goals of CURIEs are: CURIEs are designed from the ground up to be used in attribute values. QNames are designed for unambiguously naming elements and attributes. CURIEs expand to IRIs, and any IRI can be represented by such an expansion. QNames are treated as value pairs, but even if those pairs are combined into a string, only a subset of IRIs can be represented. CURIEs can be used in non-XML grammars, and can even be used in XML languages that do not support XML Namespaces. QNames are limited to XML Namespace-aware XML Applications. These exact same goals are met in Turtle and SPARQL using the concept of Prefixed Names. Exactly how Prefixed Names and CURIEs are different to end users is not very clear. Both use simple concatenation, both work outside of XML, both are not value pairs. The given example isbn:0321154991 is a perfectly valid prefixed name in both Turtle and SPARQL. In fact all the example CURIEs in RDFa Core 1.1 and the RDFa Primer are valid Prefixed Names (Some would require language specific escaping in SPARQL or Turtle). However there are some differences. CURIE grammar The grammar for CURIEs provided in RDFa Core 1.1 prefix ::= NCName reference ::= irelative-ref (as defined in [RFC3987]) curie ::= [ [ prefix ] ':' ] reference safe_curie ::= '[' [ [ prefix ] ':' ] reference ']' The grammars for prefixed names are well tested and have many implementations. The CURIE grammar does not seem to have ANY implementations. In fact even implementing the seemingly simple grammar from RDFa Core 1.1 is very complicated. The grammar references two other grammars. First the XML Namespaces grammar for NCName, which allows a wider range of tokens then prefixed names do. Exactly what uses cases those additional tokens are needed for is not clear. Some examples: _1: is a valid CURIE prefix but NOT a valid Prefixed Named prefix. ______: is a valid CURIE prefix but NOT a valid Prefixed Named prefix. In fact, the only clearly allowed set of tokens allowed by CURIE and not Prefixed Names is prefixes containing _ as the first character. Given that _ in the first character is used in RDFa, Turtle, and SPARQL to reference blank nodes it seems unlikely that anyone uses _ as the first character in their prefix names. Moving on to the right hand side of the CURIE the grammar gets much more amusing. First off the referenced grammar is not in the same form as CURIE or XML which use W3C EBNF. The iri RFC use ABNF. This leads to complicated reading for humans, and no clear way to use any automated tool to build a CURIE a grammar. While the name "irelative-ref" sounds like a relative IRI reference, the rule in question is NOT limited to relative references. Host parts, IPv4 and IPv6 segments are allowed as part of irelative-refs. This is not exactly expected. Again what use case is served in allowing CURIEs like: {'prefix' : 'http://purl.org/example/'} prefix://user:password[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/ These are very easy to confuse with normal IRIs. In general it seems that the intent of CURIEs was to limit the right hand side to relative references but that is not accomplished by using the "irelative-ref" production from the IRI RFC. Recommendation: Align RDFa with SPARQL and Turtle prefix names. It should be possible to create a simpler grammar for "CURIEs"/Prefixed Names based on the SPARQL and Turtle productions bellow: [s157] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] [s158] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? [s159] PN_LOCAL ::= (PN_CHARS_U | [0-9] | PLX ) ( ( PN_CHARS | '.' | PLX )* ( PN_CHARS | PLX ) ) ? > [s160] PLX ::= PERCENT | PN_LOCAL_ESC [s161] PERCENT ::= '%' HEX HEX [s162] HEX ::= [0-9] | [A-F] | [a-f] [s163] PN_LOCAL_ESC ::= '\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' | '?' | '#' | '@' | '%' ) Main differences are likely to be that most of the PN_LOCAL_ESC characters should be allowed in RDFa grammar directly, as the host languages (XML and HTML) provide for escaping mechanisms already. While this would be a backwards incompatible change the real effect on deployed data and software is likely to be low. I am unaware of any RDFa implementation that uses the CURIE grammar as specified, and have not ever encountered RDFa data in the wild that uses the odder values the current grammar productions allow. --Gavin
Received on Wednesday, 18 January 2012 16:27:56 UTC