- From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
- Date: Wed, 18 Jan 2012 12:03:24 +0100
- To: Gavin Carothers <gavin@carothers.name>
- CC: RDF-WG WG <public-rdf-wg@w3.org>
Hi,
I personnaly agree with Gavin's suggestion and would vote for the WG to
endorse it.
However, I find one of Gavin's arguments a bit too strong. See comment
below.
On 01/17/2012 07:50 PM, Gavin Carothers wrote:
> I'm not sure if this should go to the RDFa WG as a personal comment,
> or as part of the RDF WG's feed back.
>
>
> Alignment with SPARQL and Turtle
>
> Both Turtle and SPARQL provide a mechanism for writing shortened IRIs.
> The stated design goals of CURIEs are:
>
> CURIEs are designed from the ground up to be used in attribute
> values. QNames are designed for unambiguously naming elements and
> attributes.
> CURIEs expand to IRIs, and any IRI can be represented by such an
> expansion. QNames are treated as value pairs, but even if those pairs
> are combined into a string, only a subset of IRIs can be represented.
> CURIEs can be used in non-XML grammars, and can even be used in
> XML languages that do not support XML Namespaces. QNames are limited
> to XML Namespace-aware XML Applications.
>
> These exact same goals are met in Turtle and SPARQL using the concept
> of Prefixed Names. Exactly how Prefixed Names and CURIEs are different
> to end users is not very clear. Both use simple concatenation, both
> work outside of XML, both are not value pairs. The given example
> isbn:0321154991 is a perfectly valid prefixed name in both Turtle and
> SPARQL. In fact all the example CURIEs in RDFa Core 1.1 and the RDFa
> Primer are valid Prefixed Names (Some would require language specific
> escaping in SPARQL or Turtle). However there are some differences.
>
>
> CURIE grammar
>
> The grammar for CURIEs provided in RDFa Core 1.1
>
> prefix ::= NCName
>
> reference ::= irelative-ref (as defined in [RFC3987])
>
> curie ::= [ [ prefix ] ':' ] reference
>
> safe_curie ::= '[' [ [ prefix ] ':' ] reference ']'
>
> The grammars for prefixed names are well tested and have many
> implementations. The CURIE grammar does not seem to have ANY
> implementations. In fact even implementing the seemingly simple
> grammar from RDFa Core 1.1 is very complicated. The grammar references
> two other grammars. First the XML Namespaces grammar for NCName, which
> allows a wider range of tokens then prefixed names do. Exactly what
> uses cases those additional tokens are needed for is not clear. Some
> examples:
>
> _1: is a valid CURIE prefix but NOT a valid Prefixed Named prefix.
> ______: is a valid CURIE prefix but NOT a valid Prefixed Named prefix.
>
> In fact, the only clearly allowed set of tokens allowed by CURIE and
> not Prefixed Names is prefixes containing _ as the first character.
> Given that _ in the first character is used in RDFa, Turtle, and
> SPARQL to reference blank nodes it seems unlikely that anyone uses _
> as the first character in their prefix names.
Well, for the record, rdflib (python RDF library) *does* use _1: , _2: ,
_3: ... prefixes (prefices?) for generated namespace in Turtle. Although
it is obviously a bug as it is illegal turtle and causes
interoperability problems with other parsers (like Jena), I can see why
the developpers came to this.
In fact, before Jena refused to parse my rdflib-generated turtle, I
never realized this was invalid turtle, and it never struck me as
incoherent (although I realize it makes the parser's life a little more
complicated).
So I would not qualify this as "unlikely"...
pa
>
> Moving on to the right hand side of the CURIE the grammar gets much
> more amusing. First off the referenced grammar is not in the same form
> as CURIE or XML which use W3C EBNF. The iri RFC use ABNF. This leads
> to complicated reading for humans, and no clear way to use any
> automated tool to build a CURIE a grammar. While the name
> "irelative-ref" sounds like a relative IRI reference, the rule in
> question is NOT limited to relative references. Host parts, IPv4 and
> IPv6 segments are allowed as part of irelative-refs. This is not
> exactly expected. Again what use case is served in allowing CURIEs
> like:
>
> {'prefix' : 'http://purl.org/example/'}
>
> prefix://user:password[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/
>
> These are very easy to confuse with normal IRIs. In general it seems
> that the intent of CURIEs was to limit the right hand side to relative
> references but that is not accomplished by using the "irelative-ref"
> production from the IRI RFC.
>
> Recommendation:
>
> Align RDFa with SPARQL and Turtle prefix names. It should be possible
> to create a simpler grammar for "CURIEs"/Prefixed Names based on the
> SPARQL and Turtle productions bellow:
>
> [s157] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 |
> [#x0300-#x036F] | [#x203F-#x2040]
> [s158] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
> [s159] PN_LOCAL ::= (PN_CHARS_U | [0-9] | PLX ) ( ( PN_CHARS |
> '.' | PLX )* ( PN_CHARS | PLX ) ) ? >
> [s160] PLX ::= PERCENT | PN_LOCAL_ESC
> [s161] PERCENT ::= '%' HEX HEX
> [s162] HEX ::= [0-9] | [A-F] | [a-f]
> [s163] PN_LOCAL_ESC ::= '\' ( '_' | '~' | '.' | '-' | '!' | '$'
> | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' |
> '?' | '#' | '@' | '%' )
>
> Main differences are likely to be that most of the PN_LOCAL_ESC
> characters should be allowed in RDFa grammar directly, as the host
> languages (XML and HTML) provide for escaping mechanisms already.
> While this would be a backwards incompatible change the real effect on
> deployed data and software is likely to be low. I am unaware of any
> RDFa implementation that uses the CURIE grammar as specified, and have
> not ever encountered RDFa data in the wild that uses the odder values
> the current grammar productions allow.
>
> --Gavin
>
Received on Wednesday, 18 January 2012 11:04:14 UTC