Re: CURIEorURI Value Space Collisions from Niklas Lindström on 2011-04-12 (public-rdfa-wg@w3.org from April 2011)

From: Niklas Lindström <lindstream@gmail.com>
Date: Tue, 12 Apr 2011 14:24:48 +0200
To: Ivan Herman <ivan@w3.org>
Cc: public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <BANLkTimUUj70epotn2XcD0pArrcaThBiXA@mail.gmail.com>
Hi Ivan!

2011/4/12 Ivan Herman <ivan@w3.org>:
> Niklas,
>
> (This are my private musing, not an official answer from the WG.)
>
> We cannot deny that there is a danger of mixing here, and the WG had to put different issues into the balance. One of the concerns that clearly came up (witness also the discussion in other groups) that some part of the user base felt uneasy about using CURIEs altogether, especially when a specific property appears only once or twice in a larger context (in RDFa 1.0 even if a property appeared only once in a @rel, one has to define a prefix for it). So the issue is really to choose between not-100%-satisfactory options...

Yes, I've read some of that, and I understand that there seems to be a
desire for using full URIs for properties in certain scenarios. (And
I'm absolutely sure that many will want to continue to use CURIEs for
brevity. It keeps RDFa readable.)

But I can't see how providing the possibility of mixing CURIEs and
URIs legitimates the danger of prefixes overriding what could clearly
be indented as the scheme part of a URI (no matter how careless the
prefix management is or how obscure the scheme).


> However... see also some comments below
>
>
> On Apr 12, 2011, at 24:02 , Niklas Lindström wrote:
>
>> Hi all!
>>
>> Is it correct that the RDFa WG is currently recommending letting
>> CURIEs share the same value space as regular URIs, and so that any
>> prefix defined with the same value as a scheme, like "http", "https",
>> "news" etc. will change the URI for any absolute URI using those
>> schemes?
>
> I am not sure I follow exactly how you formulated this, so just to make it sure: any prefix definition in RDFa 1.1 will take precedence over an absolute URI.

Yes, that is how I understood it. And if that is the case, and any
kind of CURIE is allowed -- even ones with the exact same syntactic
form as an absolute URI -- then this is bad! I presume that this
consequence is never wanted?

I suspect the case is rare when someone will use a prefix to shorten a
full URI "just a bit" -- I have never seen it or heard about it other
than in theory (when CURIEs were designed). If some people are weary
of CURIEs altogether, CURIES that *look like URIs* seems a worst case
scenario. (I mean, URIs could suddenly become CURIEs if a poorly
chosen prefix is declared.) I suspect that the regular use of CURIEs
is just prefix + name-without-slashes-or-colons. I.e. very much like
to the venerable old QName construct.


>>
>> I remember worrying about this last year, but I haven't followed the
>> decision process in detail since then. It just worries me that letting
>> these things collide will blow up for anyone who happens to use at
>> least "http" or "https" as prefixes (perhaps rendering prefixes using
>> a tool, or getting them from a profile out of their control). Or
>> perhaps worse, people believing it safe to use anything but "http(s)"
>> as prefixes, which will work until something other than those two
>> comes along in the next 10 years or so. It might happen; and if it
>> does, it may quite probably be beyond the controls of RDFa specs and
>> tools.
>>
>> (An example: some vocabulary "Wide Exceptional Graphs" becomes
>> popular, using "wxg" as a prefix. Then Google comes along with a new
>> wxg scheme ("Web Extended by Google"), and soon lots of resources are
>> linked with that instead of old "http". Or for that matter, that some
>> other scheme [3] becomes popular again for whatever reason.)
>>
>
>
> And yes, it is not only http(s) but any existing URI scheme. So it has to be made very clear that using prefixes like urn, ftp, etc, is also bad practice.

This sounds very brittle, if not outright broken. (It's like an open
can of worms which is rather empty right now, but were the can owner
is not in control of the evolution nor growth of worms.. ;P)


> However, if I have an RDFa source where I use, today, the 'wxg' prefix for something, and I define it as part of my profile or in a @prefix, and then Google has a new URI scheme wxg:, then my RDFa content remains valid and unchanged. Indeed, the CURIE resolution takes precedence, ie, the old usage remains. Of course, if I wanted to _use_ the new URI scheme in my old code, then I would have to make changes, that is correct. But that is not very frequent.

That is only true if your page source is *static*. But I expect most
publishing systems will generate RDFa, where the web master manages
prefixes and any number of users contribute all kinds of URIs over
time. We do not know how this will evolve, and for that reason we have
to make sure that RDFa is as fool-proof as possible. Avoiding naming
collisions is what RDF is all about, and allowing RDFa to compromise
that stance with this prefix/scheme collision risk (one we are aware
of at that) sounds very bad to me.

And we do not know how frequently in the future new schemes might be
used in RDFa documents with lots of prefixes. The history of the web
shows that http and https are rather stable, but that all kinds of
other schemes come and go.


>> I vaguely recall the WG saying something about defining "http" as a
>> prefix is bad practise. But this turns up here and there, not least
>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
>> recommend it as a prefix. And I just ran across "http" as a prefix in
>> the Tabulator source as well [2].
>
> The tabulator source is unaffected by this, of course. As for the HTTP vocabulary draft: they will change that, and they will use a different prefix. (To be clear, this is not due to RDFa, they have received many comments on that from elsewhere, hence their decision to change that in future.)

It is true that the tabulator source bears no consequence on this at
it stands. But these cases show that there *already is* documentation
and code on the web using a dangerous prefix. Such actual usage can
very well seep into someones collection of prefixes. I object to the
notion that care has to be taken by *authors* to monitor the
scheme/prefix collision risk, however limited it may be currently. The
very notion of a "dangerous" prefix make me cringe.

To continue, the mechanism of allowing a CURIEorURI is likely
propagate to other domains. Currently, it is used in the RDFa API,
where it may potentially cause the same or even worse consequences.
(And I have used it in a similar API, which is when this discomfort
became more acute.)

Again, I'm talking about risks which may be small today. But the fact
that is built in by design is unsettling. I strongly recommend that
the consequence is mitigated somehow, say by limiting which forms of
CURIEs are allowed in the same value space as URIs. I'd suggest a
QName construct, or isegment-nz-nc from RFC 3987, or at least not
allowing "/", ":", which would make this case *much* safer.

In fact, *if* people actually want to use CURIEs in a CURIEorURI space
who do not follow the QName restrictions, they should use the
SafeCURIE form which is already defined. The value space could then be
QNameOrSafeCURIEorURI. I'm not sure that that would be needed, but it
is a nice and *safe* form of choice for authors.

Am I really the only one concerned about this?

Best regards,
Niklas



> Sincerely
>
> Ivan
>
>>
>> While I understand that it is confusing to use it as a prefix, I am
>> not convinced that it is safe to combine the CURIE and URI value space
>> like this. At least not without a limit on the CURIEs allowed in the
>> joint CURIEorURI space. For instance, not allowing CURIEs in that
>> space to use anything after the prefix+':' other than say an
>> isegment-nz-nc from RFC 3987, or something to that effect (like a
>> "[A-Za-z0-9_-.]+" regexp).
>>
>> If there was such a restriction on the format of CURIEs are allowed in
>> the CURIEorURI mix (and that anything not matching it would be
>> considered a full URI), I would definitely sleep better. :)
>>
>> Am I missing something crucial, or overly worried about the risk of collisions?
>>
>> Best regards,
>> Niklas
>>
>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
>> [3]: http://en.wikipedia.org/wiki/URI_scheme
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
Received on Tuesday, 12 April 2011 12:25:37 UTC