Re: CURIEorURI Value Space Collisions from Niklas Lindström on 2011-04-15 (public-rdfa-wg@w3.org from April 2011)

From: Niklas Lindström <lindstream@gmail.com>
Date: Fri, 15 Apr 2011 11:51:14 +0200
To: Mark Birbeck <mark.birbeck@webbackplane.com>
Cc: public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <BANLkTimtJGgy9-20=8P4JRaYOzCyc4+KRg@mail.gmail.com>
Hi Mark!

That the interpretation of the lexical representation within an RDFa
attribute is dependent on context -- i.e. the base URI and prefixes
within scope -- is perfectly fine by me. But that the value space
mixes CURIEs and URIs so that a prefix declaration irrevocably will
override what could very well be intended as a scheme on a URI is
quite problematic.

Consider this example, with the premise that the value of @about is
provided by somone (e.g. the news group owner) who *does not* control
the template behind the RDFa markup:

   <html prefix="news: http://example.org/def/news# rdfs:
http://www.w3.org/2000/01/rdf-schema#">
    <body rel="news:hasGroup">
      <div about="news://news.server.example/example.group.this/"
property="rdfs:label">The Example News Group</div>

This can be fixed, while retaining the feature of not always having to
use SafeCURIEs, by not allowing @about and @resource to contain unsafe
CURIEs, i.e containing characters which makes them confusable with
absolute URIs.

I therefore suggest that the use of SafeCURIEorCURIEorURI is changed
to RestrictedCURIEOrSafeCURIEorURI, where RestrictedCURIE means QName,
or "isegment-nz-nc", or "reference ::= ipath-absolute / ipath-noscheme
/ ipath-empty" (which Nathan suggested).

Best regards,
Niklas



2011/4/13 Mark Birbeck <mark.birbeck@webbackplane.com>:
> Hi Niklas,
>
> Everything you say is true. :)
>
> However, the big change in the working group's thinking came when we
> decided that it was impossible to guarantee correct interpretation of
> strings of text based solely on their format, and so instead we should
> rely on the strings' contexts.
>
> By using context to aid in the interpretation of a string we get a lot
> more flexibility, and we can unambiguously work out what things like
> this mean:
>
>  foaf:Agent
>
> Without context it *looks* like all of the following:
>
>  * a string of text with no particular meaning;
>  * a QName;
>  * a CURIE;
>  * a relative URI using the 'foaf' scheme.
>
> However, we decided in the working group that if no prefix mapping for
> 'foaf' was defined in the context for this string, then the string was
> *by definition* not a CURIE.
>
> Whether it therefore becomes a string of text or a URI is a separate
> processing step, and nothing to do with CURIE processing, but by
> taking the approach we did in the CURIE processing layer we at least
> made it possible for 'foaf:Agent' to be interpreted as a URI.
>
> The converse also holds; if a mapping for 'foaf' is defined, then the
> string above is *by definition* a CURIE. Now whether some host
> language decides to interpret the string as a CURIE above a URI is up
> to that host language, but RDFa does so.
>
> Personally I was very pleased when we took the step to take context
> into account when interpreting strings. Until that point we were
> trying to achieve the impossible -- imagining that a string on its own
> could tell you everything about what it was. Now it's very easy to
> interpret both of these strings correctly:
>
>  foaf:Agent
>  http://www.w3.org/
>
> simply by using the context.
>
> Best regards,
>
> Mark
>
>
> 2011/4/11 Niklas Lindström <lindstream@gmail.com>:
>> Hi all!
>>
>> Is it correct that the RDFa WG is currently recommending letting
>> CURIEs share the same value space as regular URIs, and so that any
>> prefix defined with the same value as a scheme, like "http", "https",
>> "news" etc. will change the URI for any absolute URI using those
>> schemes?
>>
>> I remember worrying about this last year, but I haven't followed the
>> decision process in detail since then. It just worries me that letting
>> these things collide will blow up for anyone who happens to use at
>> least "http" or "https" as prefixes (perhaps rendering prefixes using
>> a tool, or getting them from a profile out of their control). Or
>> perhaps worse, people believing it safe to use anything but "http(s)"
>> as prefixes, which will work until something other than those two
>> comes along in the next 10 years or so. It might happen; and if it
>> does, it may quite probably be beyond the controls of RDFa specs and
>> tools.
>>
>> (An example: some vocabulary "Wide Exceptional Graphs" becomes
>> popular, using "wxg" as a prefix. Then Google comes along with a new
>> wxg scheme ("Web Extended by Google"), and soon lots of resources are
>> linked with that instead of old "http". Or for that matter, that some
>> other scheme [3] becomes popular again for whatever reason.)
>>
>> I vaguely recall the WG saying something about defining "http" as a
>> prefix is bad practise. But this turns up here and there, not least
>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
>> recommend it as a prefix. And I just ran across "http" as a prefix in
>> the Tabulator source as well [2].
>>
>> While I understand that it is confusing to use it as a prefix, I am
>> not convinced that it is safe to combine the CURIE and URI value space
>> like this. At least not without a limit on the CURIEs allowed in the
>> joint CURIEorURI space. For instance, not allowing CURIEs in that
>> space to use anything after the prefix+':' other than say an
>> isegment-nz-nc from RFC 3987, or something to that effect (like a
>> "[A-Za-z0-9_-.]+" regexp).
>>
>> If there was such a restriction on the format of CURIEs are allowed in
>> the CURIEorURI mix (and that anything not matching it would be
>> considered a full URI), I would definitely sleep better. :)
>>
>> Am I missing something crucial, or overly worried about the risk of collisions?
>>
>> Best regards,
>> Niklas
>>
>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
>> [3]: http://en.wikipedia.org/wiki/URI_scheme
>>
>>
>
Received on Friday, 15 April 2011 09:52:03 UTC