- From: Shane McCarron <shane@aptest.com>
- Date: Fri, 15 Apr 2011 17:07:39 -0500
- To: public-rdfa-wg@w3.org
I guess I fail to appreciate the core problem here. Are you worried that there will be a prefix declared in a framework (e.g., news:) that in some distant future becomes a real scheme, and that @about values that use that real scheme as a full URI will be introduced as content within that framework? I can see this as a remote possibility, but I can't get too worked up about it. There are lots of things that are going to evolve over time on the Internet. We cannot predict all of them. I would be open to a few things that could tighten this stuff up - reducing the possibility of misinterpretation. In no particular order, we could do none or all of the following: 1. Restrict the use of schemes that are well known *at publication time* from prefix declarations. E.g., declare that http, https, mailto, etc. are all illegal as prefix names, and require that conforming processors ignore them (or issue an error or issue a warning - don't really mind). 2. Restrict the 'reference' portion pattern further, such that it prohibits leading '//'. There is no need I can imagine to permit '//' at the beginning of a reference. So if there were a string like 'foaf://some/reference' it would not be treated as a CURIE, but 'foaf:some/reference' or 'foaf:/some/reference' would still be a CURIE. 3. Encourage content authors to use prefix names that are unlikely to ever be a scheme name (not sure this makes sense). We still have the NCNAME restriction on prefix names, but myprefix0 is an NCNAME and my-prefix is an NCNAME. And I can't imagine those ever being a real scheme. 4. Encourage content authors to eschew the use of prefixes and just use full URIs (not sure this makes sense either). I don't think any of these steps ELIMINATE the possibility of misinterpretation. But they surely won't hurt, and they are all completely consistent with the *intent* of CURIEs. Thoughts? On 4/15/2011 4:53 AM, Niklas Lindström wrote: > Hi Ivan! > > 2011/4/15 Ivan Herman<ivan@w3.org>: >> True. But we would also loose possibly very useful features. >> >> I recently realized, to take an example, that the DBPedia concepts' ontology has some sort of a hierarchy. The use >> >> http://dbpedia.org/ontology/ >> http://dbpedia.org/ontology/Artist/ >> http://dbpedia.org/ontology/Film/ >> etc. >> >> which would then be used, for example, on types. At the moment, one can define a prefix for ../ontology/ a then use something like dbp-ont:Artist/XXX instead of being forced to define a separate prefix for each sub-hierarchy. > Yes, I've thought about that too. But that would still be possible if > the CURIEorURI is changed to the RestrictedCURIEOrSafeCURIEorURI > (which I just suggested in reply to Mark -- i.e. where RestrictedCURIE > is defined as one of QName, or "isegment-nz-nc", or Nathan's > "path-absolute / ipath-noscheme / ipath-empty"). > > >> B.t.w., on a separate comment: in my implementation I actually generate a warning if a URI is used with an unusual (ie, non-registered) scheme. In most cases this is the result of a misspelling in the prefix. I am not sure it is worth adding that RDFa Core as a requirement, or just have this as a good practice for RDFa processors... > Warnings are useful, but I definitely don't think that an RDFa parser > should have to worry about the scheme registry. Neither should > authors. It will evolve independently of the implementation and use of > RDFa, and of the (very much decentralized) definition of prefixes for > vocabularies. > > > Best regards, > Niklas > > > >> Ivan >> >> >> >> On Apr 13, 2011, at 19:09 , Nathan wrote: >> >>> That said, it would be a lot less ambiguous if CURIE didn't use irelative-ref and instead used: >>> >>> reference ::= ipath-absolute / ipath-noscheme / ipath-empty >>> >>> then at least, http://example.org/ would never be a CURIE, and a prefix mapping for http: would never apply / confuse. >>> >>> Best, >>> >>> Nathan >>> >>> Mark Birbeck wrote: >>>> Hi Niklas, >>>> Everything you say is true. :) >>>> However, the big change in the working group's thinking came when we >>>> decided that it was impossible to guarantee correct interpretation of >>>> strings of text based solely on their format, and so instead we should >>>> rely on the strings' contexts. >>>> By using context to aid in the interpretation of a string we get a lot >>>> more flexibility, and we can unambiguously work out what things like >>>> this mean: >>>> foaf:Agent >>>> Without context it *looks* like all of the following: >>>> * a string of text with no particular meaning; >>>> * a QName; >>>> * a CURIE; >>>> * a relative URI using the 'foaf' scheme. >>>> However, we decided in the working group that if no prefix mapping for >>>> 'foaf' was defined in the context for this string, then the string was >>>> *by definition* not a CURIE. >>>> Whether it therefore becomes a string of text or a URI is a separate >>>> processing step, and nothing to do with CURIE processing, but by >>>> taking the approach we did in the CURIE processing layer we at least >>>> made it possible for 'foaf:Agent' to be interpreted as a URI. >>>> The converse also holds; if a mapping for 'foaf' is defined, then the >>>> string above is *by definition* a CURIE. Now whether some host >>>> language decides to interpret the string as a CURIE above a URI is up >>>> to that host language, but RDFa does so. >>>> Personally I was very pleased when we took the step to take context >>>> into account when interpreting strings. Until that point we were >>>> trying to achieve the impossible -- imagining that a string on its own >>>> could tell you everything about what it was. Now it's very easy to >>>> interpret both of these strings correctly: >>>> foaf:Agent >>>> http://www.w3.org/ >>>> simply by using the context. >>>> Best regards, >>>> Mark >>>> 2011/4/11 Niklas Lindström<lindstream@gmail.com>: >>>>> Hi all! >>>>> >>>>> Is it correct that the RDFa WG is currently recommending letting >>>>> CURIEs share the same value space as regular URIs, and so that any >>>>> prefix defined with the same value as a scheme, like "http", "https", >>>>> "news" etc. will change the URI for any absolute URI using those >>>>> schemes? >>>>> >>>>> I remember worrying about this last year, but I haven't followed the >>>>> decision process in detail since then. It just worries me that letting >>>>> these things collide will blow up for anyone who happens to use at >>>>> least "http" or "https" as prefixes (perhaps rendering prefixes using >>>>> a tool, or getting them from a profile out of their control). Or >>>>> perhaps worse, people believing it safe to use anything but "http(s)" >>>>> as prefixes, which will work until something other than those two >>>>> comes along in the next 10 years or so. It might happen; and if it >>>>> does, it may quite probably be beyond the controls of RDFa specs and >>>>> tools. >>>>> >>>>> (An example: some vocabulary "Wide Exceptional Graphs" becomes >>>>> popular, using "wxg" as a prefix. Then Google comes along with a new >>>>> wxg scheme ("Web Extended by Google"), and soon lots of resources are >>>>> linked with that instead of old "http". Or for that matter, that some >>>>> other scheme [3] becomes popular again for whatever reason.) >>>>> >>>>> I vaguely recall the WG saying something about defining "http" as a >>>>> prefix is bad practise. But this turns up here and there, not least >>>>> since the HTTP Vocabulary Draft [1] (<http://www.w3.org/2006/http#>) >>>>> recommend it as a prefix. And I just ran across "http" as a prefix in >>>>> the Tabulator source as well [2]. >>>>> >>>>> While I understand that it is confusing to use it as a prefix, I am >>>>> not convinced that it is safe to combine the CURIE and URI value space >>>>> like this. At least not without a limit on the CURIEs allowed in the >>>>> joint CURIEorURI space. For instance, not allowing CURIEs in that >>>>> space to use anything after the prefix+':' other than say an >>>>> isegment-nz-nc from RFC 3987, or something to that effect (like a >>>>> "[A-Za-z0-9_-.]+" regexp). >>>>> >>>>> If there was such a restriction on the format of CURIEs are allowed in >>>>> the CURIEorURI mix (and that anything not matching it would be >>>>> considered a full URI), I would definitely sleep better. :) >>>>> >>>>> Am I missing something crucial, or overly worried about the risk of collisions? >>>>> >>>>> Best regards, >>>>> Niklas >>>>> >>>>> [1]: http://www.w3.org/TR/HTTP-in-RDF10/ >>>>> [2]: http://dig.csail.mit.edu/hg/tabulator/file/9a135feff10f/chrome/content/js/rdf/rdflib.js#l5644 >>>>> [3]: http://en.wikipedia.org/wiki/URI_scheme >>>>> >>>>> >>> >> >> ---- >> Ivan Herman, W3C Semantic Web Activity Lead >> Home: http://www.w3.org/People/Ivan/ >> mobile: +31-641044153 >> PGP Key: http://www.ivan-herman.net/pgpkey.html >> FOAF: http://www.ivan-herman.net/foaf.rdf >> >> >> >> >> >> -- Shane P. McCarron Phone: +1 763 786-8160 x120 Managing Director Fax: +1 763 786-8180 ApTest Minnesota Inet: shane@aptest.com
Received on Friday, 15 April 2011 22:08:12 UTC