- From: Ivan Herman <ivan@w3.org>
- Date: Tue, 24 Jan 2012 12:47:03 +0100
- To: Niklas Lindström <lindstream@gmail.com>
- Cc: public-rdfa-wg <public-rdfa-wg@w3.org>
- Message-Id: <D99314BC-3B7D-4077-9F3A-8507E53C17C4@w3.org>
Niklas, I think your analysis on the Open Graph protocol issue is correct. My issue, however, is: if we go along the lines you propose, we are getting even further away from a compatibility with Turtle/SPARQL, an issue that has already been raised by the RDF WG. I am not sure what the best forum is for that. Manu: will you be at the Coordination Group tomorrow? Maybe worth raising the issue there? ivan On Jan 24, 2012, at 04:01 , Niklas Lindström wrote: > Hello, > > I've been investigating some of the minute details and issues > surrounding CURIEs, based on the discussion that recently cropped up > with ISSUE-125 [1]. > > It seems to me that the definition we currently have is flawed in one > more way, and quite crucially so. > > > ## The Problem ## > > As we already know, a bunch of Facebook OpenGraph properties are > expressed with CURIEs where the parts after the prefix themselves > contain colons. For instance, "video:actor:role", and > "my-og-app:podcast:url" as seen in the examples at [2]. (There are > also 13 such properties defined in <http://ogp.me/ns#>, e.g. > "og:image:width" and "og:video:height".) > > We currently define CURIEs as: > > curie ::= [ [ prefix ] ':' ] reference > reference ::= irelative-ref ; (as defined in [RFC3987]) > > Now, I may be too tired to see clearly, but if I read the definition > of irelative-ref in section 2.2 of RFC 3987 [3] correctly, it actually > prohibits such CURIEs! > > Let me explain. I find these to be the relevant definitions in RFC 3987: > > irelative-ref = irelative-part [ "?" iquery ] [ "#" ifragment ] > > irelative-part = "//" iauthority ipath-abempty > / ipath-absolute > / ipath-noscheme > / ipath-empty > > ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ] > ipath-noscheme = isegment-nz-nc *( "/" isegment ) > ipath-empty = 0<ipchar> > > isegment-nz-nc = 1*( iunreserved / pct-encoded / sub-delims > / "@" ) > ; non-zero-length segment without any colon ":" > > If I interpret the ABNF [4] properly, given "og:image:width", I get > the following: > > * "og:" matches the prefix and ":", so we match "image:width" against > irelative-ref; > * there is no "?" or "#" in that, so only irelative-part is considered; > * it does not start with "//", so we skip the following (iauthority > ipath-abempty) of the first alternative; > * it does not start with "/", so it is not an ipath-absolute; > * it contains a colon ":", so it is not an ipath-noscheme (does not > match isegment-nz-nc *( "/" isegment )); > * it is not empty, so it is not an ipath-empty. > > With no more alternatives in irelative-part, I conclude that > "og:image:width" is not a valid CURIE! > > Please correct me if I'm wrong here! If not, it is quite evident that > we have to fix this (lest we accept to break a widely deployed > de-facto usage). > > Ironically, we *do* allow for CURIEs to begin with "//". This makes it > possible to use CURIEs *indistinguishable* from "normal" IRIs (using > authority and paths), as explained in ISSUE-125 (and in my old (dead > horse) ISSUE-90 [5]). > > > ## The Proposal ## > > We have the opportunity here to fix a lot of things. I propose to > define CURIEs along the lines of: > > curie = [ prefix ] ':' local > prefix = PN_PREFIX; as defined in SPARQL 1.1 [6] > local = (ipath-rootless / ipath-empty) > [ "?" iquery ] [ "#" ifragment ] > > ipath-rootless = isegment-nz *( "/" isegment ) > isegment = *ipchar > isegment-nz = 1*ipchar > ipchar = iunreserved / pct-encoded / sub-delims / ":" > / "@ > > .. For comparison, this is the definition of the full IRI: > > IRI = scheme ":" ihier-part [ "?" iquery ] > [ "#" ifragment ] > > ihier-part = "//" iauthority ipath-abempty > / ipath-absolute > / ipath-rootless > / ipath-empty > > > ## The Consequences ## > > This (if I'm awake enough) stills allow for *all* the use cases that > have hitherto been put forward as needed. E.g.: > > schema:Person/Doctor > og:video:height > db:resource/Albert_Einstein > ex:some?very=special#thing > > (While it is true that it would prevent the "hack" once presented as a > means of using full IRIs where RDFa 1.0 only allows CURIEs (by using > @xmlns:http="http:"), isn't that moot? Any processor affected by this > change in RDFa 1.1 should reasonably use RDFa 1.1 rules, where we now > allow such IRIs anywhere CURIEs are allowed. (And for that matter, I > don't recall any reports of actual usage of that.)) > > Most importantly, this completely eliminates the risk of confusing > CURIEs with normal IRIs. That is, IRIs with a scheme followed by "//", > an authority, and a path of segments (separated with "/"), followed by > optional "?" query and "#" fragment parts. These are the kinds of IRIs > that can be expressed in various relative forms and resolved against a > base IRI. > > Looking at the list of official and common URI schemes at [7], I find > that of the 137 schemes, 71 (52%) are in the authority+path form. As > we know, the prevalent two on the web, http and https, are of this > kind (arguably the only relevant ones). I'd wager that we can expect > this form to stay prevalent on the web *even* if "http" we're to be > eventually superseded. (I say so because relative paths are immensely > usable, and there is an abundance of code dealing with hierarchical > URL/URI resolution. Combined with the DNS-based authority model it's > reasonably here to stay.) > > Note also the fact that "http" used as prefix has already turned up in > the wild, due to the HTTP Vocabulary Working Draft [8]. This has even > been used in the RDFa 1.1 Core spec itself (as I recently reported in > my review). To my knowledge, we have asked the ERT WG to change this, > but this has not yet happened. With this change, such as prefix would > no longer be a (technical) problem. > > The other form is of the "opaque" IRIs (without an authority part and > possibly no "/" separated segments (i.e. "non-relativizable")). > Seemingly we've hitherto *unintentionally* prevented some of them > (e.g. urn: and tag: URIs); but at the price of the OpenGraph CURIEs. > There are some fairly well-known schemes in this group (official or > not), e.g.: mailto, tag, urn, doi, geo, tel, callto, news, xmpp, sip, > sms, bitcoin, gtalk, skype, spotify. Of these, "tag" and "geo" can be > found in prefix.cc. (I've previously mentioned that "geo" may be of > some concern for certain RDFa users [9].) But as we've already > concluded when resolving ISSUE-90, we argue that these will probably > not be used as prefixes, and will be quite uncommon as schemes of > subject or object IRIs in RDFa. Also, given that many IRIs using these > schemes already are reminiscent of CURIEs, and are of a rather > specialized nature, I'd imagine that it's easier for anyone coming > across such oddities to recognize the collision risk, should it ever > happen. We should still be very clear in the section about CURIEs > though, that prefixes overshadow schemes in IRIs of these forms, and > that we advice users to monitor the in-scope prefixes for any such > collision (along with the workaround accomplishable by using e.g. > @prefix="geo: geo:"). > > > ## Summary ## > > I sincerely hope that I have interpreted the ABNF correctly and > haven't raised the issue of OpenGraph CURIEs in error. And that I have > made a clear and satisfactory draft proposal for fixing both this and > the problems raised in ISSUE-125 (primarily the risk of confusing > CURIEs with normal IRIs). > > Best regards, > Niklas > > [1]: http://www.w3.org/2010/02/rdfa/track/issues/125 > [2]: http://developers.facebook.com/docs/opengraph/objects/builtin/ > [3]: http://tools.ietf.org/html/rfc3987#section-2.2 > [4]: http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form > [5]: http://www.w3.org/2010/02/rdfa/track/issues/90 > [6]: http://www.w3.org/TR/2012/WD-sparql11-query-20120105/#rPNAME_LN > [7]: http://en.wikipedia.org/wiki/URI_scheme > [8]: http://www.w3.org/TR/HTTP-in-RDF10/ > [9]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Aug/0039.html > ---- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments
- application/pkcs7-signature attachment: smime.p7s
Received on Tuesday, 24 January 2012 11:45:36 UTC