Re: CURIEorURI Value Space Collisions from Ivan Herman on 2011-04-15 (public-rdfa-wg@w3.org from April 2011)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 15 Apr 2011 10:14:52 +0200
To: nathan@webr3.org
Cc: Mark Birbeck <mark.birbeck@webbackplane.com>, Niklas Lindström <lindstream@gmail.com>, public-rdfa-wg <public-rdfa-wg@w3.org>
Message-Id: <F4E9E32D-DEDD-4B1E-BEBF-C2B012FDA9AF@w3.org>
True. But we would also loose possibly very useful features.

I recently realized, to take an example, that the DBPedia concepts' ontology has some sort of a hierarchy. The use 

http://dbpedia.org/ontology/
http://dbpedia.org/ontology/Artist/
http://dbpedia.org/ontology/Film/
etc.

which would then be used, for example, on types. At the moment, one can define a prefix for ../ontology/ a then use something like dbp-ont:Artist/XXX instead of being forced to define a separate prefix for each sub-hierarchy.

B.t.w., on a separate comment: in my implementation I actually generate a warning if a URI is used with an unusual (ie, non-registered) scheme. In most cases this is the result of a misspelling in the prefix. I am not sure it is worth adding that RDFa Core as a requirement, or just have this as a good practice for RDFa processors...

Ivan


 
On Apr 13, 2011, at 19:09 , Nathan wrote:

> That said, it would be a lot less ambiguous if CURIE didn't use irelative-ref and instead used:
> 
>  reference ::= ipath-absolute / ipath-noscheme / ipath-empty
> 
> then at least, http://example.org/ would never be a CURIE, and a prefix mapping for http: would never apply / confuse.
> 
> Best,
> 
> Nathan
> 
> Mark Birbeck wrote:
>> Hi Niklas,
>> Everything you say is true. :)
>> However, the big change in the working group's thinking came when we
>> decided that it was impossible to guarantee correct interpretation of
>> strings of text based solely on their format, and so instead we should
>> rely on the strings' contexts.
>> By using context to aid in the interpretation of a string we get a lot
>> more flexibility, and we can unambiguously work out what things like
>> this mean:
>>  foaf:Agent
>> Without context it *looks* like all of the following:
>>  * a string of text with no particular meaning;
>>  * a QName;
>>  * a CURIE;
>>  * a relative URI using the 'foaf' scheme.
>> However, we decided in the working group that if no prefix mapping for
>> 'foaf' was defined in the context for this string, then the string was
>> *by definition* not a CURIE.
>> Whether it therefore becomes a string of text or a URI is a separate
>> processing step, and nothing to do with CURIE processing, but by
>> taking the approach we did in the CURIE processing layer we at least
>> made it possible for 'foaf:Agent' to be interpreted as a URI.
>> The converse also holds; if a mapping for 'foaf' is defined, then the
>> string above is *by definition* a CURIE. Now whether some host
>> language decides to interpret the string as a CURIE above a URI is up
>> to that host language, but RDFa does so.
>> Personally I was very pleased when we took the step to take context
>> into account when interpreting strings. Until that point we were
>> trying to achieve the impossible -- imagining that a string on its own
>> could tell you everything about what it was. Now it's very easy to
>> interpret both of these strings correctly:
>>  foaf:Agent
>>  http://www.w3.org/
>> simply by using the context.
>> Best regards,
>> Mark
>> 2011/4/11 Niklas Lindström <lindstream@gmail.com>:
>>> Hi all!
>>> 
>>> Is it correct that the RDFa WG is currently recommending letting
>>> CURIEs share the same value space as regular URIs, and so that any
>>> prefix defined with the same value as a scheme, like "http", "https",
>>> "news" etc. will change the URI for any absolute URI using those
>>> schemes?
>>> 
>>> I remember worrying about this last year, but I haven't followed the
>>> decision process in detail since then. It just worries me that letting
>>> these things collide will blow up for anyone who happens to use at
>>> least "http" or "https" as prefixes (perhaps rendering prefixes using
>>> a tool, or getting them from a profile out of their control). Or
>>> perhaps worse, people believing it safe to use anything but "http(s)"
>>> as prefixes, which will work until something other than those two
>>> comes along in the next 10 years or so. It might happen; and if it
>>> does, it may quite probably be beyond the controls of RDFa specs and
>>> tools.
>>> 
>>> (An example: some vocabulary "Wide Exceptional Graphs" becomes
>>> popular, using "wxg" as a prefix. Then Google comes along with a new
>>> wxg scheme ("Web Extended by Google"), and soon lots of resources are
>>> linked with that instead of old "http". Or for that matter, that some
>>> other scheme [3] becomes popular again for whatever reason.)
>>> 
>>> I vaguely recall the WG saying something about defining "http" as a
>>> prefix is bad practise. But this turns up here and there, not least
>>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
>>> recommend it as a prefix. And I just ran across "http" as a prefix in
>>> the Tabulator source as well [2].
>>> 
>>> While I understand that it is confusing to use it as a prefix, I am
>>> not convinced that it is safe to combine the CURIE and URI value space
>>> like this. At least not without a limit on the CURIEs allowed in the
>>> joint CURIEorURI space. For instance, not allowing CURIEs in that
>>> space to use anything after the prefix+':' other than say an
>>> isegment-nz-nc from RFC 3987, or something to that effect (like a
>>> "[A-Za-z0-9_-.]+" regexp).
>>> 
>>> If there was such a restriction on the format of CURIEs are allowed in
>>> the CURIEorURI mix (and that anything not matching it would be
>>> considered a full URI), I would definitely sleep better. :)
>>> 
>>> Am I missing something crucial, or overly worried about the risk of collisions?
>>> 
>>> Best regards,
>>> Niklas
>>> 
>>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
>>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
>>> [3]: http://en.wikipedia.org/wiki/URI_scheme
>>> 
>>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Friday, 15 April 2011 08:14:27 UTC