Re: CURIEorURI Value Space Collisions from Shane McCarron on 2011-04-15 (public-rdfa-wg@w3.org from April 2011)

From: Shane McCarron <shane@aptest.com>
Date: Fri, 15 Apr 2011 17:07:39 -0500
To: public-rdfa-wg@w3.org
Message-ID: <4DA8C1AB.8020708@aptest.com>
I guess I fail to appreciate the core problem here.  Are you worried 
that there will be a prefix declared in a framework (e.g., news:) that 
in some distant future becomes a real scheme, and that @about values 
that use that real scheme as a full URI will be introduced as content 
within that framework?  I can see this as a remote possibility, but I 
can't get too worked up about it.  There are lots of things that are 
going to evolve over time on the Internet. We cannot predict all of them.

I would be open to a few things that could tighten this stuff up - 
reducing the possibility of misinterpretation.  In no particular order, 
we could do none or all of the following:

   1. Restrict the use of schemes that are well known *at publication
      time* from prefix declarations. E.g., declare that http, https,
      mailto, etc. are all illegal as prefix names, and require that
      conforming processors ignore them (or issue an error or issue a
      warning - don't really mind).
   2. Restrict the 'reference' portion pattern further, such that it
      prohibits leading '//'.  There is no need I can imagine to permit
      '//' at the beginning of a reference.  So if there were a string
      like 'foaf://some/reference' it would not be treated as a CURIE,
      but 'foaf:some/reference' or 'foaf:/some/reference' would still be
      a CURIE.
   3. Encourage content authors to use prefix names that are unlikely to
      ever be a scheme name (not sure this makes sense).  We still have
      the NCNAME restriction on prefix names, but myprefix0 is an NCNAME
      and my-prefix is an NCNAME.  And I can't imagine those ever being
      a real scheme.
   4. Encourage content authors to eschew the use of prefixes and just
      use full URIs (not sure this makes sense either).


I don't think any of these steps ELIMINATE the possibility of 
misinterpretation.  But they surely won't hurt, and they are all 
completely consistent with the *intent* of CURIEs.

Thoughts?

On 4/15/2011 4:53 AM, Niklas Lindström wrote:
> Hi Ivan!
>
> 2011/4/15 Ivan Herman<ivan@w3.org>:
>> True. But we would also loose possibly very useful features.
>>
>> I recently realized, to take an example, that the DBPedia concepts' ontology has some sort of a hierarchy. The use
>>
>> http://dbpedia.org/ontology/
>> http://dbpedia.org/ontology/Artist/
>> http://dbpedia.org/ontology/Film/
>> etc.
>>
>> which would then be used, for example, on types. At the moment, one can define a prefix for ../ontology/ a then use something like dbp-ont:Artist/XXX instead of being forced to define a separate prefix for each sub-hierarchy.
> Yes, I've thought about that too. But that would still be possible if
> the CURIEorURI is changed to the RestrictedCURIEOrSafeCURIEorURI
> (which I just suggested in reply to Mark -- i.e. where RestrictedCURIE
> is defined as one of QName, or "isegment-nz-nc", or Nathan's
> "path-absolute / ipath-noscheme / ipath-empty").
>
>
>> B.t.w., on a separate comment: in my implementation I actually generate a warning if a URI is used with an unusual (ie, non-registered) scheme. In most cases this is the result of a misspelling in the prefix. I am not sure it is worth adding that RDFa Core as a requirement, or just have this as a good practice for RDFa processors...
> Warnings are useful, but I definitely don't think that an RDFa parser
> should have to worry about the scheme registry. Neither should
> authors. It will evolve independently of the implementation and use of
> RDFa, and of the (very much decentralized) definition of prefixes for
> vocabularies.
>
>
> Best regards,
> Niklas
>
>
>
>> Ivan
>>
>>
>>
>> On Apr 13, 2011, at 19:09 , Nathan wrote:
>>
>>> That said, it would be a lot less ambiguous if CURIE didn't use irelative-ref and instead used:
>>>
>>>   reference ::= ipath-absolute / ipath-noscheme / ipath-empty
>>>
>>> then at least, http://example.org/ would never be a CURIE, and a prefix mapping for http: would never apply / confuse.
>>>
>>> Best,
>>>
>>> Nathan
>>>
>>> Mark Birbeck wrote:
>>>> Hi Niklas,
>>>> Everything you say is true. :)
>>>> However, the big change in the working group's thinking came when we
>>>> decided that it was impossible to guarantee correct interpretation of
>>>> strings of text based solely on their format, and so instead we should
>>>> rely on the strings' contexts.
>>>> By using context to aid in the interpretation of a string we get a lot
>>>> more flexibility, and we can unambiguously work out what things like
>>>> this mean:
>>>>   foaf:Agent
>>>> Without context it *looks* like all of the following:
>>>>   * a string of text with no particular meaning;
>>>>   * a QName;
>>>>   * a CURIE;
>>>>   * a relative URI using the 'foaf' scheme.
>>>> However, we decided in the working group that if no prefix mapping for
>>>> 'foaf' was defined in the context for this string, then the string was
>>>> *by definition* not a CURIE.
>>>> Whether it therefore becomes a string of text or a URI is a separate
>>>> processing step, and nothing to do with CURIE processing, but by
>>>> taking the approach we did in the CURIE processing layer we at least
>>>> made it possible for 'foaf:Agent' to be interpreted as a URI.
>>>> The converse also holds; if a mapping for 'foaf' is defined, then the
>>>> string above is *by definition* a CURIE. Now whether some host
>>>> language decides to interpret the string as a CURIE above a URI is up
>>>> to that host language, but RDFa does so.
>>>> Personally I was very pleased when we took the step to take context
>>>> into account when interpreting strings. Until that point we were
>>>> trying to achieve the impossible -- imagining that a string on its own
>>>> could tell you everything about what it was. Now it's very easy to
>>>> interpret both of these strings correctly:
>>>>   foaf:Agent
>>>>   http://www.w3.org/
>>>> simply by using the context.
>>>> Best regards,
>>>> Mark
>>>> 2011/4/11 Niklas Lindström<lindstream@gmail.com>:
>>>>> Hi all!
>>>>>
>>>>> Is it correct that the RDFa WG is currently recommending letting
>>>>> CURIEs share the same value space as regular URIs, and so that any
>>>>> prefix defined with the same value as a scheme, like "http", "https",
>>>>> "news" etc. will change the URI for any absolute URI using those
>>>>> schemes?
>>>>>
>>>>> I remember worrying about this last year, but I haven't followed the
>>>>> decision process in detail since then. It just worries me that letting
>>>>> these things collide will blow up for anyone who happens to use at
>>>>> least "http" or "https" as prefixes (perhaps rendering prefixes using
>>>>> a tool, or getting them from a profile out of their control). Or
>>>>> perhaps worse, people believing it safe to use anything but "http(s)"
>>>>> as prefixes, which will work until something other than those two
>>>>> comes along in the next 10 years or so. It might happen; and if it
>>>>> does, it may quite probably be beyond the controls of RDFa specs and
>>>>> tools.
>>>>>
>>>>> (An example: some vocabulary "Wide Exceptional Graphs" becomes
>>>>> popular, using "wxg" as a prefix. Then Google comes along with a new
>>>>> wxg scheme ("Web Extended by Google"), and soon lots of resources are
>>>>> linked with that instead of old "http". Or for that matter, that some
>>>>> other scheme [3] becomes popular again for whatever reason.)
>>>>>
>>>>> I vaguely recall the WG saying something about defining "http" as a
>>>>> prefix is bad practise. But this turns up here and there, not least
>>>>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>)
>>>>> recommend it as a prefix. And I just ran across "http" as a prefix in
>>>>> the Tabulator source as well [2].
>>>>>
>>>>> While I understand that it is confusing to use it as a prefix, I am
>>>>> not convinced that it is safe to combine the CURIE and URI value space
>>>>> like this. At least not without a limit on the CURIEs allowed in the
>>>>> joint CURIEorURI space. For instance, not allowing CURIEs in that
>>>>> space to use anything after the prefix+':' other than say an
>>>>> isegment-nz-nc from RFC 3987, or something to that effect (like a
>>>>> "[A-Za-z0-9_-.]+" regexp).
>>>>>
>>>>> If there was such a restriction on the format of CURIEs are allowed in
>>>>> the CURIEorURI mix (and that anything not matching it would be
>>>>> considered a full URI), I would definitely sleep better. :)
>>>>>
>>>>> Am I missing something crucial, or overly worried about the risk of collisions?
>>>>>
>>>>> Best regards,
>>>>> Niklas
>>>>>
>>>>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/
>>>>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644
>>>>> [3]: http://en.wikipedia.org/wiki/URI_scheme
>>>>>
>>>>>
>>>
>>
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>
>>
>>
>>
>>
>>

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Friday, 15 April 2011 22:08:12 UTC