Re: IRIs from Michael Kifer on 2007-04-16 (public-rif-wg@w3.org from April 2007)

From: Michael Kifer <kifer@cs.sunysb.edu>
Date: Mon, 16 Apr 2007 14:18:19 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: Jeremy Carroll <jjc@hpl.hp.com>, Dave Reynolds <der@hplb.hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
Message-ID: <8584.1176747499@cs.sunysb.edu>
The following might be a naive question due to my inadequate familiarity
with RFCs.

Are symbols like ~ allowed in IRIs? My understanding is that only
a-z, A-Z, 0-9, ., -, *, and _ are allowed as is and the rest are encoded.
So, since ~ is supposed to be encoded, something like

    http://www.cs.sunysb.edu/~kifer/

is not a URI; it has to be encoded as

    http://www.cs.sunysb.edu/%7Ekifer/

If the above is correct (i.e., if http://www.cs.sunysb.edu/~kifer/~kifer is
an IRI but not a URI), then I see a problem.

If I write http://www.cs.sunysb.edu/%7Ekifer/ then as a URI it really
represents http://www.cs.sunysb.edu/~kifer/, but as an IRI it is different from
http://www.cs.sunysb.edu/~kifer/.

Or, am I wrong and the %-encodings mean the same as in IRIs as they do in URIs?


	--michael  

> > Yes: IRIs are a superset of URIs.
> ...
> > The set of letters used for URIs is a subset of that used for IRIs (and 
> > a small subset!)
> 
> Agreed.   RFC 3987 states simply, "Every URI is by definition an IRI".  
> 
> It's a little confusing, though, because some URIs are the result of
> mapping a non-URI IRI into a URI, and some are not.       
> 
> Let me give an example.  Here's an IRI:
> 
> (a)  http://www.w3.org/International/articles/idn-and-iri/JPÇ¼Æ¦/°ú¤³ä¤êÇ¼Æ¦.html
> 
> If our mailers are all working, it should look like a URI which has some
> Kanji in it.  It's from a test page if you want to check how it appears
> [1].  (I tested my mailer, and this should at least look correct in our
> web archives.)
> 
> Now here is that IRI mapped into a URI, following the process defined
> by RFC 3987, section 3.1 ("Mapping of IRIs to URIs"):
> 
> (b)  http://www.w3.org/International/articles/idn-and-iri/JP%E7%B4%8D%E8%B1%86/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A%E7%B4%8D%E8%B1%86.html
> 
> Both (a) and (b) are IRIs, but only (b) is a URI.  Note that if you
> apply the mapping algorithm to (b), you get (b) again, and that there is
> an inverse mapping algorithm defined to get from (b) to (a).
> 
> Here's a third URI:
> 
> (c)  http://www.w3.org
> 
> This URI is, of course, also an IRI.  But unlike (b), it wont be changed
> by applying the inverse mapping.  We can think of (c) is a "natural URI"
> and (b) is a "carrier URI", a URI which exists only to carry an IRI.
> Users should only be presented with natural URIs and IRIs -- they should
> never be presented with carrier URIs.  So, while carrier URIs are
> *technically* IRIs already, we talk about converting them into IRIs,
> which means converting them into their "natural" state.  A "natural IRI"
> then is any IRI which is not a carrier.  
> 
> So, in this sense, lots of (carrier) URIs are not (natural) IRIs.
> Right?  In common usage we don't think of (b) as an IRI; we specifically
> contrast it with IRIs.  Hopefully my natural/carrier terminology makes
> this clear:
> 
>   - Technically, every URI is an IRI.
>   - But only some URIs (the natural ones) are natural IRIs.
> 
> All that said:
> 
> Because RIF is not intended for human consumption, I think we *could*
> limit it to handling only URIs, knowing that translators will convert
> to/from IRIs as necessary.  However, since RIF will be an XML format, I
> think it's reasonable to expect and allow for some human consumption.
> Since XML is already safe for IRIs, it's no additional work.  I think
> RIF should just use IRIs.
> 
> On the naming question -- do we call them IRIs or say "URI" even though
> we really mean IRI? -- I note that the SPARQL Last Call draft calls them
> IRIs [2], but SWEO (the Semantic Web Education and Outreach Interest
> Group) still seems to call them URIs.  I've suggested to its chair that
> SWEO talk about it with the relevant WGs (including us) If they're
> willing to switch to IRI in their documents, that should clear the path
> for us.
> 
>     -- Sandro
> 
> [1] http://www.w3.org/International/tests/sec-iri-3
> [2] http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#QSynIRI
> 
>
Received on Monday, 16 April 2007 18:30:55 UTC