- From: Michael Kifer <kifer@cs.sunysb.edu>
- Date: Mon, 16 Apr 2007 14:18:19 -0400
- To: Sandro Hawke <sandro@w3.org>
- Cc: Jeremy Carroll <jjc@hpl.hp.com>, Dave Reynolds <der@hplb.hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
The following might be a naive question due to my inadequate familiarity with RFCs. Are symbols like ~ allowed in IRIs? My understanding is that only a-z, A-Z, 0-9, ., -, *, and _ are allowed as is and the rest are encoded. So, since ~ is supposed to be encoded, something like http://www.cs.sunysb.edu/~kifer/ is not a URI; it has to be encoded as http://www.cs.sunysb.edu/%7Ekifer/ If the above is correct (i.e., if http://www.cs.sunysb.edu/~kifer/~kifer is an IRI but not a URI), then I see a problem. If I write http://www.cs.sunysb.edu/%7Ekifer/ then as a URI it really represents http://www.cs.sunysb.edu/~kifer/, but as an IRI it is different from http://www.cs.sunysb.edu/~kifer/. Or, am I wrong and the %-encodings mean the same as in IRIs as they do in URIs? --michael > > Yes: IRIs are a superset of URIs. > ... > > The set of letters used for URIs is a subset of that used for IRIs (and > > a small subset!) > > Agreed. RFC 3987 states simply, "Every URI is by definition an IRI". > > It's a little confusing, though, because some URIs are the result of > mapping a non-URI IRI into a URI, and some are not. > > Let me give an example. Here's an IRI: > > (a) http://www.w3.org/International/articles/idn-and-iri/JPǼƦ/°ú¤³ä¤êǼƦ.html > > If our mailers are all working, it should look like a URI which has some > Kanji in it. It's from a test page if you want to check how it appears > [1]. (I tested my mailer, and this should at least look correct in our > web archives.) > > Now here is that IRI mapped into a URI, following the process defined > by RFC 3987, section 3.1 ("Mapping of IRIs to URIs"): > > (b) http://www.w3.org/International/articles/idn-and-iri/JP%E7%B4%8D%E8%B1%86/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A%E7%B4%8D%E8%B1%86.html > > Both (a) and (b) are IRIs, but only (b) is a URI. Note that if you > apply the mapping algorithm to (b), you get (b) again, and that there is > an inverse mapping algorithm defined to get from (b) to (a). > > Here's a third URI: > > (c) http://www.w3.org > > This URI is, of course, also an IRI. But unlike (b), it wont be changed > by applying the inverse mapping. We can think of (c) is a "natural URI" > and (b) is a "carrier URI", a URI which exists only to carry an IRI. > Users should only be presented with natural URIs and IRIs -- they should > never be presented with carrier URIs. So, while carrier URIs are > *technically* IRIs already, we talk about converting them into IRIs, > which means converting them into their "natural" state. A "natural IRI" > then is any IRI which is not a carrier. > > So, in this sense, lots of (carrier) URIs are not (natural) IRIs. > Right? In common usage we don't think of (b) as an IRI; we specifically > contrast it with IRIs. Hopefully my natural/carrier terminology makes > this clear: > > - Technically, every URI is an IRI. > - But only some URIs (the natural ones) are natural IRIs. > > All that said: > > Because RIF is not intended for human consumption, I think we *could* > limit it to handling only URIs, knowing that translators will convert > to/from IRIs as necessary. However, since RIF will be an XML format, I > think it's reasonable to expect and allow for some human consumption. > Since XML is already safe for IRIs, it's no additional work. I think > RIF should just use IRIs. > > On the naming question -- do we call them IRIs or say "URI" even though > we really mean IRI? -- I note that the SPARQL Last Call draft calls them > IRIs [2], but SWEO (the Semantic Web Education and Outreach Interest > Group) still seems to call them URIs. I've suggested to its chair that > SWEO talk about it with the relevant WGs (including us) If they're > willing to switch to IRI in their documents, that should clear the path > for us. > > -- Sandro > > [1] http://www.w3.org/International/tests/sec-iri-3 > [2] http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#QSynIRI > >
Received on Monday, 16 April 2007 18:30:55 UTC