Re: CURIEorURI Value Space Collisions from Mark Birbeck on 2011-04-13 (public-rdfa-wg@w3.org from April 2011)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Wed, 13 Apr 2011 11:03:27 +0100
To: Niklas Lindström <lindstream@gmail.com>
Cc: public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <BANLkTim33-Cc37EZwkULm7XrBxRLkPzbmQ@mail.gmail.com>
Hi Niklas,

Everything you say is true. :)

However, the big change in the working group's thinking came when we
decided that it was impossible to guarantee correct interpretation of
strings of text based solely on their format, and so instead we should
rely on the strings' contexts.

By using context to aid in the interpretation of a string we get a lot
more flexibility, and we can unambiguously work out what things like
this mean:

  foaf:Agent

Without context it *looks* like all of the following:

  * a string of text with no particular meaning;
  * a QName;
  * a CURIE;
  * a relative URI using the 'foaf' scheme.

However, we decided in the working group that if no prefix mapping for
'foaf' was defined in the context for this string, then the string was
*by definition* not a CURIE.

Whether it therefore becomes a string of text or a URI is a separate
processing step, and nothing to do with CURIE processing, but by
taking the approach we did in the CURIE processing layer we at least
made it possible for 'foaf:Agent' to be interpreted as a URI.

The converse also holds; if a mapping for 'foaf' is defined, then the
string above is *by definition* a CURIE. Now whether some host
language decides to interpret the string as a CURIE above a URI is up
to that host language, but RDFa does so.

Personally I was very pleased when we took the step to take context
into account when interpreting strings. Until that point we were
trying to achieve the impossible -- imagining that a string on its own
could tell you everything about what it was. Now it's very easy to
interpret both of these strings correctly:

  foaf:Agent
  http://www.w3.org/

simply by using the context.

Best regards,

Mark


2011/4/11 Niklas Lindström <lindstream@gmail.com>:
> Hi all!
>
> Is it correct that the RDFa WG is currently recommending letting
> CURIEs share the same value space as regular URIs, and so that any
> prefix defined with the same value as a scheme, like "http", "https",
> "news" etc. will change the URI for any absolute URI using those
> schemes?
>
> I remember worrying about this last year, but I haven't followed the
> decision process in detail since then. It just worries me that letting
> these things collide will blow up for anyone who happens to use at
> least "http" or "https" as prefixes (perhaps rendering prefixes using
> a tool, or getting them from a profile out of their control). Or
> perhaps worse, people believing it safe to use anything but "http(s)"
> as prefixes, which will work until something other than those two
> comes along in the next 10 years or so. It might happen; and if it
> does, it may quite probably be beyond the controls of RDFa specs and
> tools.
>
> (An example: some vocabulary "Wide Exceptional Graphs" becomes
> popular, using "wxg" as a prefix. Then Google comes along with a new
> wxg scheme ("Web Extended by Google"), and soon lots of resources are
> linked with that instead of old "http". Or for that matter, that some
> other scheme [3] becomes popular again for whatever reason.)
>
> I vaguely recall the WG saying something about defining "http" as a
> prefix is bad practise. But this turns up here and there, not least
> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
> recommend it as a prefix. And I just ran across "http" as a prefix in
> the Tabulator source as well [2].
>
> While I understand that it is confusing to use it as a prefix, I am
> not convinced that it is safe to combine the CURIE and URI value space
> like this. At least not without a limit on the CURIEs allowed in the
> joint CURIEorURI space. For instance, not allowing CURIEs in that
> space to use anything after the prefix+':' other than say an
> isegment-nz-nc from RFC 3987, or something to that effect (like a
> "[A-Za-z0-9_-.]+" regexp).
>
> If there was such a restriction on the format of CURIEs are allowed in
> the CURIEorURI mix (and that anything not matching it would be
> considered a full URI), I would definitely sleep better. :)
>
> Am I missing something crucial, or overly worried about the risk of collisions?
>
> Best regards,
> Niklas
>
> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
> [3]: http://en.wikipedia.org/wiki/URI_scheme
>
>
Received on Wednesday, 13 April 2011 10:04:41 UTC