CURIEorURI Value Space Collisions

From: Niklas Lindström <lindstream@gmail.com>
Date: Tue, 12 Apr 2011 00:02:39 +0200
Hi all!

Is it correct that the RDFa WG is currently recommending letting
CURIEs share the same value space as regular URIs, and so that any
prefix defined with the same value as a scheme, like "http", "https",
"news" etc. will change the URI for any absolute URI using those

I remember worrying about this last year, but I haven't followed the
decision process in detail since then. It just worries me that letting
these things collide will blow up for anyone who happens to use at
least "http" or "https" as prefixes (perhaps rendering prefixes using
a tool, or getting them from a profile out of their control). Or
perhaps worse, people believing it safe to use anything but "http(s)"
as prefixes, which will work until something other than those two
comes along in the next 10 years or so. It might happen; and if it
does, it may quite probably be beyond the controls of RDFa specs and

(An example: some vocabulary "Wide Exceptional Graphs" becomes
popular, using "wxg" as a prefix. Then Google comes along with a new
wxg scheme ("Web Extended by Google"), and soon lots of resources are
linked with that instead of old "http". Or for that matter, that some
other scheme [3] becomes popular again for whatever reason.)

I vaguely recall the WG saying something about defining "http" as a
prefix is bad practise. But this turns up here and there, not least
since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
recommend it as a prefix. And I just ran across "http" as a prefix in
the Tabulator source as well [2].

While I understand that it is confusing to use it as a prefix, I am
not convinced that it is safe to combine the CURIE and URI value space
like this. At least not without a limit on the CURIEs allowed in the
joint CURIEorURI space. For instance, not allowing CURIEs in that
space to use anything after the prefix+':' other than say an
isegment-nz-nc from RFC 3987, or something to that effect (like a
"[A-Za-z0-9_-.]+" regexp).

If there was such a restriction on the format of CURIEs are allowed in
the CURIEorURI mix (and that anything not matching it would be
considered a full URI), I would definitely sleep better. :)

Am I missing something crucial, or overly worried about the risk of collisions?

Best regards,

[1]: http://www.w3.org/TR/HTTP-in-RDF10/
[2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
[3]: http://en.wikipedia.org/wiki/URI_scheme
