Re: Comments on RDFa Core 1.1 (CURIE only) from Pierre-Antoine Champin on 2012-01-18 (public-rdf-wg@w3.org from January 2012)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Wed, 18 Jan 2012 12:03:24 +0100
To: Gavin Carothers <gavin@carothers.name>
CC: RDF-WG WG <public-rdf-wg@w3.org>
Message-ID: <4F16A6FC.3040001@liris.cnrs.fr>
Hi,

I personnaly agree with Gavin's suggestion and would vote for the WG to
endorse it.

However, I find one of Gavin's arguments a bit too strong. See comment
below.

On 01/17/2012 07:50 PM, Gavin Carothers wrote:
> I'm not sure if this should go to the RDFa WG as a personal comment,
> or as part of the RDF WG's feed back.
> 
> 
> Alignment with SPARQL and Turtle
> 
> Both Turtle and SPARQL provide a mechanism for writing shortened IRIs.
> The stated design goals of CURIEs are:
> 
>     CURIEs are designed from the ground up to be used in attribute
> values. QNames are designed for unambiguously naming elements and
> attributes.
>     CURIEs expand to IRIs, and any IRI can be represented by such an
> expansion. QNames are treated as value pairs, but even if those pairs
> are combined into a string, only a subset of IRIs can be represented.
>     CURIEs can be used in non-XML grammars, and can even be used in
> XML languages that do not support XML Namespaces. QNames are limited
> to XML Namespace-aware XML Applications.
> 
> These exact same goals are met in Turtle and SPARQL using the concept
> of Prefixed Names. Exactly how Prefixed Names and CURIEs are different
> to end users is not very clear. Both use simple concatenation, both
> work outside of XML, both are not value pairs. The given example
> isbn:0321154991 is a perfectly valid prefixed name in both Turtle and
> SPARQL. In fact all the example CURIEs in RDFa Core 1.1 and the RDFa
> Primer are valid Prefixed Names (Some would require language specific
> escaping in SPARQL or Turtle). However there are some differences.
> 
> 
> CURIE grammar
> 
> The grammar for CURIEs provided in RDFa Core 1.1
> 
> prefix      ::=   NCName
> 
> reference   ::=   irelative-ref (as defined in [RFC3987])
> 
> curie       ::=   [ [ prefix ] ':' ] reference
> 
> safe_curie  ::=   '[' [ [ prefix ] ':' ] reference ']'
> 
> The grammars for prefixed names are well tested and have many
> implementations. The CURIE grammar does not seem to have ANY
> implementations. In fact even implementing the seemingly simple
> grammar from RDFa Core 1.1 is very complicated. The grammar references
> two other grammars. First the XML Namespaces grammar for NCName, which
> allows a wider range of tokens then prefixed names do. Exactly what
> uses cases those additional tokens are needed for is not clear. Some
> examples:
> 
> _1: is a valid CURIE prefix but NOT a valid Prefixed Named prefix.
> ______: is a valid CURIE prefix but NOT a valid Prefixed Named prefix.
> 
> In fact, the only clearly allowed set of tokens allowed by CURIE and
> not Prefixed Names is prefixes containing _ as the first character.
> Given that _ in the first character is used in RDFa, Turtle, and
> SPARQL to reference blank nodes it seems unlikely that anyone uses _
> as the first character in their prefix names.

Well, for the record, rdflib (python RDF library) *does* use _1: , _2: ,
_3: ... prefixes (prefices?) for generated namespace in Turtle. Although
it is obviously a bug as it is illegal turtle and causes
interoperability problems with other parsers (like Jena), I can see why
the developpers came to this.

In fact, before Jena refused to parse my rdflib-generated turtle, I
never realized this was invalid turtle, and it never struck me as
incoherent (although I realize it makes the parser's life a little more
complicated).

So I would not qualify this as "unlikely"...

  pa


> 
> Moving on to the right hand side of the CURIE the grammar gets much
> more amusing. First off the referenced grammar is not in the same form
> as CURIE or XML which use W3C EBNF. The iri RFC use ABNF. This leads
> to complicated reading for humans, and no clear way to use any
> automated tool to build a CURIE a grammar. While the name
> "irelative-ref" sounds like a relative IRI reference, the rule in
> question is NOT limited to relative references. Host parts, IPv4 and
> IPv6 segments are allowed as part of irelative-refs. This is not
> exactly expected. Again what use case is served in allowing CURIEs
> like:
> 
> {'prefix' : 'http://purl.org/example/'}
> 
> prefix://user:password[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/
> 
> These are very easy to confuse with normal IRIs. In general it seems
> that the intent of CURIEs was to limit the right hand side to relative
> references but that is not accomplished by using the "irelative-ref"
> production from the IRI RFC.
> 
> Recommendation:
> 
> Align RDFa with SPARQL and Turtle prefix names. It should be possible
> to create a simpler grammar for "CURIEs"/Prefixed Names based on the
> SPARQL and Turtle productions bellow:
> 
> [s157]   PN_CHARS   ::=   PN_CHARS_U | '-' | [0-9] | #x00B7 |
> [#x0300-#x036F] | [#x203F-#x2040]
> [s158]   PN_PREFIX   ::=   PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
> [s159]   PN_LOCAL   ::=   (PN_CHARS_U | [0-9] | PLX ) ( ( PN_CHARS |
> '.' | PLX )* ( PN_CHARS | PLX ) ) ? >
> [s160]   PLX   ::=   PERCENT | PN_LOCAL_ESC
> [s161]   PERCENT   ::=   '%' HEX HEX
> [s162]   HEX   ::=   [0-9] | [A-F] | [a-f]
> [s163]   PN_LOCAL_ESC   ::=   '\' ( '_' | '~' | '.' | '-' | '!' | '$'
> | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' |
> '?' | '#' | '@' | '%' )
> 
> Main differences are likely to be that most of the PN_LOCAL_ESC
> characters should be allowed in RDFa grammar directly, as the host
> languages (XML and HTML) provide for escaping mechanisms already.
> While this would be a backwards incompatible change the real effect on
> deployed data and software is likely to be low. I am unaware of any
> RDFa implementation that uses the CURIE grammar as specified, and have
> not ever encountered RDFa data in the wild that uses the odder values
> the current grammar productions allow.
> 
> --Gavin
>
Received on Wednesday, 18 January 2012 11:04:14 UTC