- From: Niklas Lindström <lindstream@gmail.com>
- Date: Tue, 24 Jan 2012 04:01:42 +0100
- To: public-rdfa-wg <public-rdfa-wg@w3.org>
Hello, I've been investigating some of the minute details and issues surrounding CURIEs, based on the discussion that recently cropped up with ISSUE-125 [1]. It seems to me that the definition we currently have is flawed in one more way, and quite crucially so. ## The Problem ## As we already know, a bunch of Facebook OpenGraph properties are expressed with CURIEs where the parts after the prefix themselves contain colons. For instance, "video:actor:role", and "my-og-app:podcast:url" as seen in the examples at [2]. (There are also 13 such properties defined in <http://ogp.me/ns#>, e.g. "og:image:width" and "og:video:height".) We currently define CURIEs as: curie ::= [ [ prefix ] ':' ] reference reference ::= irelative-ref ; (as defined in [RFC3987]) Now, I may be too tired to see clearly, but if I read the definition of irelative-ref in section 2.2 of RFC 3987 [3] correctly, it actually prohibits such CURIEs! Let me explain. I find these to be the relevant definitions in RFC 3987: irelative-ref = irelative-part [ "?" iquery ] [ "#" ifragment ] irelative-part = "//" iauthority ipath-abempty / ipath-absolute / ipath-noscheme / ipath-empty ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ] ipath-noscheme = isegment-nz-nc *( "/" isegment ) ipath-empty = 0<ipchar> isegment-nz-nc = 1*( iunreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" If I interpret the ABNF [4] properly, given "og:image:width", I get the following: * "og:" matches the prefix and ":", so we match "image:width" against irelative-ref; * there is no "?" or "#" in that, so only irelative-part is considered; * it does not start with "//", so we skip the following (iauthority ipath-abempty) of the first alternative; * it does not start with "/", so it is not an ipath-absolute; * it contains a colon ":", so it is not an ipath-noscheme (does not match isegment-nz-nc *( "/" isegment )); * it is not empty, so it is not an ipath-empty. With no more alternatives in irelative-part, I conclude that "og:image:width" is not a valid CURIE! Please correct me if I'm wrong here! If not, it is quite evident that we have to fix this (lest we accept to break a widely deployed de-facto usage). Ironically, we *do* allow for CURIEs to begin with "//". This makes it possible to use CURIEs *indistinguishable* from "normal" IRIs (using authority and paths), as explained in ISSUE-125 (and in my old (dead horse) ISSUE-90 [5]). ## The Proposal ## We have the opportunity here to fix a lot of things. I propose to define CURIEs along the lines of: curie = [ prefix ] ':' local prefix = PN_PREFIX; as defined in SPARQL 1.1 [6] local = (ipath-rootless / ipath-empty) [ "?" iquery ] [ "#" ifragment ] ipath-rootless = isegment-nz *( "/" isegment ) isegment = *ipchar isegment-nz = 1*ipchar ipchar = iunreserved / pct-encoded / sub-delims / ":" / "@ .. For comparison, this is the definition of the full IRI: IRI = scheme ":" ihier-part [ "?" iquery ] [ "#" ifragment ] ihier-part = "//" iauthority ipath-abempty / ipath-absolute / ipath-rootless / ipath-empty ## The Consequences ## This (if I'm awake enough) stills allow for *all* the use cases that have hitherto been put forward as needed. E.g.: schema:Person/Doctor og:video:height db:resource/Albert_Einstein ex:some?very=special#thing (While it is true that it would prevent the "hack" once presented as a means of using full IRIs where RDFa 1.0 only allows CURIEs (by using @xmlns:http="http:"), isn't that moot? Any processor affected by this change in RDFa 1.1 should reasonably use RDFa 1.1 rules, where we now allow such IRIs anywhere CURIEs are allowed. (And for that matter, I don't recall any reports of actual usage of that.)) Most importantly, this completely eliminates the risk of confusing CURIEs with normal IRIs. That is, IRIs with a scheme followed by "//", an authority, and a path of segments (separated with "/"), followed by optional "?" query and "#" fragment parts. These are the kinds of IRIs that can be expressed in various relative forms and resolved against a base IRI. Looking at the list of official and common URI schemes at [7], I find that of the 137 schemes, 71 (52%) are in the authority+path form. As we know, the prevalent two on the web, http and https, are of this kind (arguably the only relevant ones). I'd wager that we can expect this form to stay prevalent on the web *even* if "http" we're to be eventually superseded. (I say so because relative paths are immensely usable, and there is an abundance of code dealing with hierarchical URL/URI resolution. Combined with the DNS-based authority model it's reasonably here to stay.) Note also the fact that "http" used as prefix has already turned up in the wild, due to the HTTP Vocabulary Working Draft [8]. This has even been used in the RDFa 1.1 Core spec itself (as I recently reported in my review). To my knowledge, we have asked the ERT WG to change this, but this has not yet happened. With this change, such as prefix would no longer be a (technical) problem. The other form is of the "opaque" IRIs (without an authority part and possibly no "/" separated segments (i.e. "non-relativizable")). Seemingly we've hitherto *unintentionally* prevented some of them (e.g. urn: and tag: URIs); but at the price of the OpenGraph CURIEs. There are some fairly well-known schemes in this group (official or not), e.g.: mailto, tag, urn, doi, geo, tel, callto, news, xmpp, sip, sms, bitcoin, gtalk, skype, spotify. Of these, "tag" and "geo" can be found in prefix.cc. (I've previously mentioned that "geo" may be of some concern for certain RDFa users [9].) But as we've already concluded when resolving ISSUE-90, we argue that these will probably not be used as prefixes, and will be quite uncommon as schemes of subject or object IRIs in RDFa. Also, given that many IRIs using these schemes already are reminiscent of CURIEs, and are of a rather specialized nature, I'd imagine that it's easier for anyone coming across such oddities to recognize the collision risk, should it ever happen. We should still be very clear in the section about CURIEs though, that prefixes overshadow schemes in IRIs of these forms, and that we advice users to monitor the in-scope prefixes for any such collision (along with the workaround accomplishable by using e.g. @prefix="geo: geo:"). ## Summary ## I sincerely hope that I have interpreted the ABNF correctly and haven't raised the issue of OpenGraph CURIEs in error. And that I have made a clear and satisfactory draft proposal for fixing both this and the problems raised in ISSUE-125 (primarily the risk of confusing CURIEs with normal IRIs). Best regards, Niklas [1]: http://www.w3.org/2010/02/rdfa/track/issues/125 [2]: http://developers.facebook.com/docs/opengraph/objects/builtin/ [3]: http://tools.ietf.org/html/rfc3987#section-2.2 [4]: http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form [5]: http://www.w3.org/2010/02/rdfa/track/issues/90 [6]: http://www.w3.org/TR/2012/WD-sparql11-query-20120105/#rPNAME_LN [7]: http://en.wikipedia.org/wiki/URI_scheme [8]: http://www.w3.org/TR/HTTP-in-RDF10/ [9]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Aug/0039.html
Received on Tuesday, 24 January 2012 03:02:51 UTC