Re: CURIEorURI Value Space Collisions from Niklas Lindström on 2011-04-15 (public-rdfa-wg@w3.org from April 2011)

From: Niklas Lindström <lindstream@gmail.com>
Date: Fri, 15 Apr 2011 11:53:58 +0200
To: Ivan Herman <ivan@w3.org>
Cc: nathan@webr3.org, Mark Birbeck <mark.birbeck@webbackplane.com>, public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <BANLkTi=AW_YWFPvFDPHitU1GQzkUaXD2XQ@mail.gmail.com>
Hi Ivan!

2011/4/15 Ivan Herman <ivan@w3.org>:
> True. But we would also loose possibly very useful features.
>
> I recently realized, to take an example, that the DBPedia concepts' ontology has some sort of a hierarchy. The use
>
> http://dbpedia.org/ontology/
> http://dbpedia.org/ontology/Artist/
> http://dbpedia.org/ontology/Film/
> etc.
>
> which would then be used, for example, on types. At the moment, one can define a prefix for ../ontology/ a then use something like dbp-ont:Artist/XXX instead of being forced to define a separate prefix for each sub-hierarchy.

Yes, I've thought about that too. But that would still be possible if
the CURIEorURI is changed to the RestrictedCURIEOrSafeCURIEorURI
(which I just suggested in reply to Mark -- i.e. where RestrictedCURIE
is defined as one of QName, or "isegment-nz-nc", or Nathan's
"path-absolute / ipath-noscheme / ipath-empty").


> B.t.w., on a separate comment: in my implementation I actually generate a warning if a URI is used with an unusual (ie, non-registered) scheme. In most cases this is the result of a misspelling in the prefix. I am not sure it is worth adding that RDFa Core as a requirement, or just have this as a good practice for RDFa processors...

Warnings are useful, but I definitely don't think that an RDFa parser
should have to worry about the scheme registry. Neither should
authors. It will evolve independently of the implementation and use of
RDFa, and of the (very much decentralized) definition of prefixes for
vocabularies.


Best regards,
Niklas



> Ivan
>
>
>
> On Apr 13, 2011, at 19:09 , Nathan wrote:
>
>> That said, it would be a lot less ambiguous if CURIE didn't use irelative-ref and instead used:
>>
>>  reference ::= ipath-absolute / ipath-noscheme / ipath-empty
>>
>> then at least, http://example.org/ would never be a CURIE, and a prefix mapping for http: would never apply / confuse.
>>
>> Best,
>>
>> Nathan
>>
>> Mark Birbeck wrote:
>>> Hi Niklas,
>>> Everything you say is true. :)
>>> However, the big change in the working group's thinking came when we
>>> decided that it was impossible to guarantee correct interpretation of
>>> strings of text based solely on their format, and so instead we should
>>> rely on the strings' contexts.
>>> By using context to aid in the interpretation of a string we get a lot
>>> more flexibility, and we can unambiguously work out what things like
>>> this mean:
>>>  foaf:Agent
>>> Without context it *looks* like all of the following:
>>>  * a string of text with no particular meaning;
>>>  * a QName;
>>>  * a CURIE;
>>>  * a relative URI using the 'foaf' scheme.
>>> However, we decided in the working group that if no prefix mapping for
>>> 'foaf' was defined in the context for this string, then the string was
>>> *by definition* not a CURIE.
>>> Whether it therefore becomes a string of text or a URI is a separate
>>> processing step, and nothing to do with CURIE processing, but by
>>> taking the approach we did in the CURIE processing layer we at least
>>> made it possible for 'foaf:Agent' to be interpreted as a URI.
>>> The converse also holds; if a mapping for 'foaf' is defined, then the
>>> string above is *by definition* a CURIE. Now whether some host
>>> language decides to interpret the string as a CURIE above a URI is up
>>> to that host language, but RDFa does so.
>>> Personally I was very pleased when we took the step to take context
>>> into account when interpreting strings. Until that point we were
>>> trying to achieve the impossible -- imagining that a string on its own
>>> could tell you everything about what it was. Now it's very easy to
>>> interpret both of these strings correctly:
>>>  foaf:Agent
>>>  http://www.w3.org/
>>> simply by using the context.
>>> Best regards,
>>> Mark
>>> 2011/4/11 Niklas Lindström <lindstream@gmail.com>:
>>>> Hi all!
>>>>
>>>> Is it correct that the RDFa WG is currently recommending letting
>>>> CURIEs share the same value space as regular URIs, and so that any
>>>> prefix defined with the same value as a scheme, like "http", "https",
>>>> "news" etc. will change the URI for any absolute URI using those
>>>> schemes?
>>>>
>>>> I remember worrying about this last year, but I haven't followed the
>>>> decision process in detail since then. It just worries me that letting
>>>> these things collide will blow up for anyone who happens to use at
>>>> least "http" or "https" as prefixes (perhaps rendering prefixes using
>>>> a tool, or getting them from a profile out of their control). Or
>>>> perhaps worse, people believing it safe to use anything but "http(s)"
>>>> as prefixes, which will work until something other than those two
>>>> comes along in the next 10 years or so. It might happen; and if it
>>>> does, it may quite probably be beyond the controls of RDFa specs and
>>>> tools.
>>>>
>>>> (An example: some vocabulary "Wide Exceptional Graphs" becomes
>>>> popular, using "wxg" as a prefix. Then Google comes along with a new
>>>> wxg scheme ("Web Extended by Google"), and soon lots of resources are
>>>> linked with that instead of old "http". Or for that matter, that some
>>>> other scheme [3] becomes popular again for whatever reason.)
>>>>
>>>> I vaguely recall the WG saying something about defining "http" as a
>>>> prefix is bad practise. But this turns up here and there, not least
>>>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
>>>> recommend it as a prefix. And I just ran across "http" as a prefix in
>>>> the Tabulator source as well [2].
>>>>
>>>> While I understand that it is confusing to use it as a prefix, I am
>>>> not convinced that it is safe to combine the CURIE and URI value space
>>>> like this. At least not without a limit on the CURIEs allowed in the
>>>> joint CURIEorURI space. For instance, not allowing CURIEs in that
>>>> space to use anything after the prefix+':' other than say an
>>>> isegment-nz-nc from RFC 3987, or something to that effect (like a
>>>> "[A-Za-z0-9_-.]+" regexp).
>>>>
>>>> If there was such a restriction on the format of CURIEs are allowed in
>>>> the CURIEorURI mix (and that anything not matching it would be
>>>> considered a full URI), I would definitely sleep better. :)
>>>>
>>>> Am I missing something crucial, or overly worried about the risk of collisions?
>>>>
>>>> Best regards,
>>>> Niklas
>>>>
>>>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
>>>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
>>>> [3]: http://en.wikipedia.org/wiki/URI_scheme
>>>>
>>>>
>>
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Friday, 15 April 2011 09:54:46 UTC