IRIs from Sandro Hawke on 2007-04-16 (public-rif-wg@w3.org from April 2007)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 16 Apr 2007 13:12:57 -0400
To: Jeremy Carroll <jjc@hpl.hp.com>
Cc: Michael Kifer <kifer@cs.sunysb.edu>, Dave Reynolds <der@hplb.hpl.hp.com>, Christian de Sainte Marie <csma@ilog.fr>, RIF WG <public-rif-wg@w3.org>
Message-Id: <20070416171324.A6AE94EEC7@homer.w3.org>

> Yes: IRIs are a superset of URIs.
...
> The set of letters used for URIs is a subset of that used for IRIs (and 
> a small subset!)

Agreed.   RFC 3987 states simply, "Every URI is by definition an IRI".  

It's a little confusing, though, because some URIs are the result of
mapping a non-URI IRI into a URI, and some are not.       

Let me give an example.  Here's an IRI:

(a)  http://www.w3.org/International/articles/idn-and-iri/JP納豆/引き割り納豆.html

If our mailers are all working, it should look like a URI which has some
Kanji in it.  It's from a test page if you want to check how it appears
[1].  (I tested my mailer, and this should at least look correct in our
web archives.)

Now here is that IRI mapped into a URI, following the process defined
by RFC 3987, section 3.1 ("Mapping of IRIs to URIs"):

(b)  http://www.w3.org/International/articles/idn-and-iri/JP%E7%B4%8D%E8%B1%86/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A%E7%B4%8D%E8%B1%86.html

Both (a) and (b) are IRIs, but only (b) is a URI.  Note that if you
apply the mapping algorithm to (b), you get (b) again, and that there is
an inverse mapping algorithm defined to get from (b) to (a).

Here's a third URI:

(c)  http://www.w3.org

This URI is, of course, also an IRI.  But unlike (b), it wont be changed
by applying the inverse mapping.  We can think of (c) is a "natural URI"
and (b) is a "carrier URI", a URI which exists only to carry an IRI.
Users should only be presented with natural URIs and IRIs -- they should
never be presented with carrier URIs.  So, while carrier URIs are
*technically* IRIs already, we talk about converting them into IRIs,
which means converting them into their "natural" state.  A "natural IRI"
then is any IRI which is not a carrier.  

So, in this sense, lots of (carrier) URIs are not (natural) IRIs.
Right?  In common usage we don't think of (b) as an IRI; we specifically
contrast it with IRIs.  Hopefully my natural/carrier terminology makes
this clear:

  - Technically, every URI is an IRI.
  - But only some URIs (the natural ones) are natural IRIs.

All that said:

Because RIF is not intended for human consumption, I think we *could*
limit it to handling only URIs, knowing that translators will convert
to/from IRIs as necessary.  However, since RIF will be an XML format, I
think it's reasonable to expect and allow for some human consumption.
Since XML is already safe for IRIs, it's no additional work.  I think
RIF should just use IRIs.

On the naming question -- do we call them IRIs or say "URI" even though
we really mean IRI? -- I note that the SPARQL Last Call draft calls them
IRIs [2], but SWEO (the Semantic Web Education and Outreach Interest
Group) still seems to call them URIs.  I've suggested to its chair that
SWEO talk about it with the relevant WGs (including us) If they're
willing to switch to IRI in their documents, that should clear the path
for us.

    -- Sandro

[1] http://www.w3.org/International/tests/sec-iri-3
[2] http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#QSynIRI

Received on Monday, 16 April 2007 17:13:26 UTC