Re: CURIEorURI Value Space Collisions from Nathan on 2011-04-15 (public-rdfa-wg@w3.org from April 2011)

From: Nathan <nathan@webr3.org>
Date: Fri, 15 Apr 2011 12:05:46 +0100
To: Ivan Herman <ivan@w3.org>
CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Niklas Lindström <lindstream@gmail.com>, public-rdfa-wg <public-rdfa-wg@w3.org>
Message-ID: <4DA8268A.4040201@webr3.org>
Hi Iva,

That would still all work :) it would only remove the possibility of 
references starting with // - everything else would still be fine (like 
../foo /foo / bar bar#foo etc)

However, I note I made a mistake earlier! We currently use irelative-ref 
which is:

   irelative-ref  = irelative-part [ "?" iquery ] [ "#" ifragment ]

   irelative-part = "//" iauthority ipath-abempty
                       / ipath-absolute
                       / ipath-noscheme
                       / ipath-empty

and we'd need something like

   curie-ref  = curie-part [ "?" iquery ] [ "#" ifragment ]

   curie-part = ipath-absolute / ipath-noscheme / ipath-empty

Which would mean that http://example.org/foo?bar#baz would /not/ be a 
valid CURIE whilst db-example:/foo?bar#baz would.

On a related note.. I thought CURIEs were formed by a simple 
concatenation of the reference to the value the prefix is mapped to. So 
if you had:

   <div prefix="ex: http://example.org/ont/"
        about="ex:../resource/Albert_Einstein"

then the resulting IRI would be (no normalization of course):

   <http://example.org/ont/../resource/Albert_Einstein>

(and not as some may expect, <http://example.org/resource/Albert_Einstein> )

Best,

Nathan

Ivan Herman wrote:
> True. But we would also loose possibly very useful features.
> 
> I recently realized, to take an example, that the DBPedia concepts' ontology has some sort of a hierarchy. The use 
> 
> http://dbpedia.org/ontology/
> http://dbpedia.org/ontology/Artist/
> http://dbpedia.org/ontology/Film/
> etc.
> 
> which would then be used, for example, on types. At the moment, one can define a prefix for ../ontology/ a then use something like dbp-ont:Artist/XXX instead of being forced to define a separate prefix for each sub-hierarchy.
> 
> B.t.w., on a separate comment: in my implementation I actually generate a warning if a URI is used with an unusual (ie, non-registered) scheme. In most cases this is the result of a misspelling in the prefix. I am not sure it is worth adding that RDFa Core as a requirement, or just have this as a good practice for RDFa processors...
> 
> Ivan
> 
> 
>  
> On Apr 13, 2011, at 19:09 , Nathan wrote:
> 
>> That said, it would be a lot less ambiguous if CURIE didn't use irelative-ref and instead used:
>>
>>  reference ::= ipath-absolute / ipath-noscheme / ipath-empty
>>
>> then at least, http://example.org/ would never be a CURIE, and a prefix mapping for http: would never apply / confuse.
>>
>> Best,
>>
>> Nathan
>>
>> Mark Birbeck wrote:
>>> Hi Niklas,
>>> Everything you say is true. :)
>>> However, the big change in the working group's thinking came when we
>>> decided that it was impossible to guarantee correct interpretation of
>>> strings of text based solely on their format, and so instead we should
>>> rely on the strings' contexts.
>>> By using context to aid in the interpretation of a string we get a lot
>>> more flexibility, and we can unambiguously work out what things like
>>> this mean:
>>>  foaf:Agent
>>> Without context it *looks* like all of the following:
>>>  * a string of text with no particular meaning;
>>>  * a QName;
>>>  * a CURIE;
>>>  * a relative URI using the 'foaf' scheme.
>>> However, we decided in the working group that if no prefix mapping for
>>> 'foaf' was defined in the context for this string, then the string was
>>> *by definition* not a CURIE.
>>> Whether it therefore becomes a string of text or a URI is a separate
>>> processing step, and nothing to do with CURIE processing, but by
>>> taking the approach we did in the CURIE processing layer we at least
>>> made it possible for 'foaf:Agent' to be interpreted as a URI.
>>> The converse also holds; if a mapping for 'foaf' is defined, then the
>>> string above is *by definition* a CURIE. Now whether some host
>>> language decides to interpret the string as a CURIE above a URI is up
>>> to that host language, but RDFa does so.
>>> Personally I was very pleased when we took the step to take context
>>> into account when interpreting strings. Until that point we were
>>> trying to achieve the impossible -- imagining that a string on its own
>>> could tell you everything about what it was. Now it's very easy to
>>> interpret both of these strings correctly:
>>>  foaf:Agent
>>>  http://www.w3.org/
>>> simply by using the context.
>>> Best regards,
>>> Mark
>>> 2011/4/11 Niklas Lindström <lindstream@gmail.com>:
>>>> Hi all!
>>>>
>>>> Is it correct that the RDFa WG is currently recommending letting
>>>> CURIEs share the same value space as regular URIs, and so that any
>>>> prefix defined with the same value as a scheme, like "http", "https",
>>>> "news" etc. will change the URI for any absolute URI using those
>>>> schemes?
>>>>
>>>> I remember worrying about this last year, but I haven't followed the
>>>> decision process in detail since then. It just worries me that letting
>>>> these things collide will blow up for anyone who happens to use at
>>>> least "http" or "https" as prefixes (perhaps rendering prefixes using
>>>> a tool, or getting them from a profile out of their control). Or
>>>> perhaps worse, people believing it safe to use anything but "http(s)"
>>>> as prefixes, which will work until something other than those two
>>>> comes along in the next 10 years or so. It might happen; and if it
>>>> does, it may quite probably be beyond the controls of RDFa specs and
>>>> tools.
>>>>
>>>> (An example: some vocabulary "Wide Exceptional Graphs" becomes
>>>> popular, using "wxg" as a prefix. Then Google comes along with a new
>>>> wxg scheme ("Web Extended by Google"), and soon lots of resources are
>>>> linked with that instead of old "http". Or for that matter, that some
>>>> other scheme [3] becomes popular again for whatever reason.)
>>>>
>>>> I vaguely recall the WG saying something about defining "http" as a
>>>> prefix is bad practise. But this turns up here and there, not least
>>>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
>>>> recommend it as a prefix. And I just ran across "http" as a prefix in
>>>> the Tabulator source as well [2].
>>>>
>>>> While I understand that it is confusing to use it as a prefix, I am
>>>> not convinced that it is safe to combine the CURIE and URI value space
>>>> like this. At least not without a limit on the CURIEs allowed in the
>>>> joint CURIEorURI space. For instance, not allowing CURIEs in that
>>>> space to use anything after the prefix+':' other than say an
>>>> isegment-nz-nc from RFC 3987, or something to that effect (like a
>>>> "[A-Za-z0-9_-.]+" regexp).
>>>>
>>>> If there was such a restriction on the format of CURIEs are allowed in
>>>> the CURIEorURI mix (and that anything not matching it would be
>>>> considered a full URI), I would definitely sleep better. :)
>>>>
>>>> Am I missing something crucial, or overly worried about the risk of collisions?
>>>>
>>>> Best regards,
>>>> Niklas
>>>>
>>>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
>>>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
>>>> [3]: http://en.wikipedia.org/wiki/URI_scheme
>>>>
>>>>
>>
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
>
Received on Friday, 15 April 2011 11:06:54 UTC