Re: Comment from RDF WG on CURIE Alignment with SPARQL and Turtle from Niklas Lindström on 2012-01-19 (public-rdfa-wg@w3.org from January 2012)

From: Niklas Lindström <lindstream@gmail.com>
Date: Thu, 19 Jan 2012 02:01:49 +0100
To: Gavin Carothers <gavin@carothers.name>
Cc: public-rdfa-wg@w3.org
Message-ID: <CADjV5jcPeEm7X9Fh4tCs5F6KOGnqQcEQmwMAaP_3XknqXO-2jg@mail.gmail.com>
Gavin, all,

I am basically in favor of this. At least, I am in favor of reducing
or removing the risk of confusing CURIEs with normal IRIs. It should
be noted that the concern for that has come up before as ISSUE-90 [1].

I explicitly did not object the closing of that issue, but at times
I've felt an inkling of discomfort about it (at least on a theoretical
design level). See e.g. my final reply to the official response [2]. I
suggested that articulating the reasonably minor risk of
protocol/CURIE collision would be good, as well as coordinating with
the other WGs. It seems that you to some extent have the same concern.

(For context, my basic worry has been that *if*, at some point in the
future, a new protocol becomes popular on the web, we can only keep
our fingers crossed that this doesn't happen to be the same as any RDF
prefix in use (or worse, happen to be in the default context). Or else
any publishing system using that prefix which start using that new
protocol (in @about or @resource) will suddenly express the wrong
IRIs. But as I've admitted, this is probably a minute risk. And it can
be worked around, with some effort, were the worst scenario to
actually happen.)

At that time I wasn't aware of the PName definition, but I've since
come across it. I certainly see the value of technical coordination
and possible simplification to be gained by using it.

For all involved in this I really suggest reading (well, skimming) the
related emails/threads linked to from [1]. There are various use
cases, concerns and perspectives articulated there which we'd have to
take into account if we were to change the use of or syntax for
CURIEs. We may need to consider a compromise between what CURIEs are
today and PNames.

Best regards,
Niklas

[1]: http://www.w3.org/2010/02/rdfa/track/issues/90
[2]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Jun/0014.html


On Wed, Jan 18, 2012 at 5:27 PM, Gavin Carothers <gavin@carothers.name> wrote:
> Alignment with SPARQL and Turtle
>
> Both Turtle and SPARQL provide a mechanism for writing shortened IRIs.
> The stated design goals of CURIEs are:
>
>   CURIEs are designed from the ground up to be used in attribute
> values. QNames are designed for unambiguously naming elements and
> attributes.
>   CURIEs expand to IRIs, and any IRI can be represented by such an
> expansion. QNames are treated as value pairs, but even if those pairs
> are combined into a string, only a subset of IRIs can be represented.
>   CURIEs can be used in non-XML grammars, and can even be used in
> XML languages that do not support XML Namespaces. QNames are limited
> to XML Namespace-aware XML Applications.
>
> These exact same goals are met in Turtle and SPARQL using the concept
> of Prefixed Names. Exactly how Prefixed Names and CURIEs are different
> to end users is not very clear. Both use simple concatenation, both
> work outside of XML, both are not value pairs. The given example
> isbn:0321154991 is a perfectly valid prefixed name in both Turtle and
> SPARQL. In fact all the example CURIEs in RDFa Core 1.1 and the RDFa
> Primer are valid Prefixed Names (Some would require language specific
> escaping in SPARQL or Turtle). However there are some differences.
>
>
> CURIE grammar
>
> The grammar for CURIEs provided in RDFa Core 1.1
>
> prefix      ::=   NCName
>
> reference   ::=   irelative-ref (as defined in [RFC3987])
>
> curie       ::=   [ [ prefix ] ':' ] reference
>
> safe_curie  ::=   '[' [ [ prefix ] ':' ] reference ']'
>
> The grammars for prefixed names are well tested and have many
> implementations. The CURIE grammar does not seem to have ANY
> implementations. In fact even implementing the seemingly simple
> grammar from RDFa Core 1.1 is very complicated. The grammar references
> two other grammars. First the XML Namespaces grammar for NCName, which
> allows a wider range of tokens then prefixed names do. Exactly what
> uses cases those additional tokens are needed for is not clear. Some
> examples:
>
> _1: is a valid CURIE prefix but NOT a valid Prefixed Named prefix.
> ______: is a valid CURIE prefix but NOT a valid Prefixed Named prefix.
>
> In fact, the only clearly allowed set of tokens allowed by CURIE and
> not Prefixed Names is prefixes containing _ as the first character.
> Given that _ in the first character is used in RDFa, Turtle, and
> SPARQL to reference blank nodes it seems unlikely that anyone uses _
> as the first character in their prefix names.
>
> Moving on to the right hand side of the CURIE the grammar gets much
> more amusing. First off the referenced grammar is not in the same form
> as CURIE or XML which use W3C EBNF. The iri RFC use ABNF. This leads
> to complicated reading for humans, and no clear way to use any
> automated tool to build a CURIE a grammar. While the name
> "irelative-ref" sounds like a relative IRI reference, the rule in
> question is NOT limited to relative references. Host parts, IPv4 and
> IPv6 segments are allowed as part of irelative-refs. This is not
> exactly expected. Again what use case is served in allowing CURIEs
> like:
>
> {'prefix' : 'http://purl.org/example/'}
>
> prefix://user:password[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/
>
> These are very easy to confuse with normal IRIs. In general it seems
> that the intent of CURIEs was to limit the right hand side to relative
> references but that is not accomplished by using the "irelative-ref"
> production from the IRI RFC.
>
> Recommendation:
>
> Align RDFa with SPARQL and Turtle prefix names. It should be possible
> to create a simpler grammar for "CURIEs"/Prefixed Names based on the
> SPARQL and Turtle productions bellow:
>
> [s157]          PN_CHARS          ::=   PN_CHARS_U | '-' | [0-9] | #x00B7 |
> [#x0300-#x036F] | [#x203F-#x2040]
> [s158]          PN_PREFIX         ::=   PN_CHARS_BASE ((PN_CHARS|'.')*
> PN_CHARS)?
> [s159]          PN_LOCAL          ::=   (PN_CHARS_U | [0-9] | PLX ) (
> ( PN_CHARS |
> '.' | PLX )* ( PN_CHARS | PLX ) ) ? >
> [s160]          PLX       ::=   PERCENT | PN_LOCAL_ESC
> [s161]          PERCENT   ::=   '%' HEX HEX
> [s162]          HEX       ::=   [0-9] | [A-F] | [a-f]
> [s163]          PN_LOCAL_ESC      ::=   '\' ( '_' | '~' | '.' | '-' | '!' | '$'
> | '&' | "'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' |
> '?' | '#' | '@' | '%' )
>
> Main differences are likely to be that most of the PN_LOCAL_ESC
> characters should be allowed in RDFa grammar directly, as the host
> languages (XML and HTML) provide for escaping mechanisms already.
> While this would be a backwards incompatible change the real effect on
> deployed data and software is likely to be low. I am unaware of any
> RDFa implementation that uses the CURIE grammar as specified, and have
> not ever encountered RDFa data in the wild that uses the odder values
> the current grammar productions allow.
>
> --Gavin
>
Received on Thursday, 19 January 2012 01:02:49 UTC