Re: IRI guidance from Eric Prud'hommeaux on 2011-04-28 (public-rdf-wg@w3.org from April 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 28 Apr 2011 17:59:00 -0400
To: Alex Hall <alexhall@revelytix.com>
Cc: nathan@webr3.org, RDF WG <public-rdf-wg@w3.org>
Message-ID: <20110428215900.GG8795@w3.org>

* Alex Hall <alexhall@revelytix.com> [2011-04-28 16:16-0400]
> On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote:
> 
> > just noticed a nice bit of text in the activity streams spec:
> >
> > [[
> > This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is
> > also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that
> > is not also a URI is given for dereferencing, it MUST be mapped to a URI
> > using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an
> > identifier, it MUST NOT be so mapped.
> > ]]
> >
> >
> This corresponds nicely with how I think IRIs should work in RDF.  When used
> as an identifier, an IRI is simply a sequence of Unicode characters.  That
> character sequence conforms with the grammar defined in RFC3987, but many
> applications don't care about that; they're only interested in knowing that
> an RDF term is an IRI, and whether two RDF terms which are IRIs are the
> same.  A simple string comparison of the Unicode characters should be
> sufficient to determine equivalence of resources identified by IRIs.
> 
> If an IRI happens to be dereferenceable, and an application chooses to
> dereference it, then they map it as a URI.  If, as part of this mapping, the
> application encodes an IDN and finds that the encoded URI is the same as
> another resource IRI, then it might conclude that those identify the same
> resource.  But it should do so with the understanding that this is an
> extension of the semantics of RDF, assuming we define IRI equivalence as a
> simple string comparison.
> 
> Unfortunately this can lead to unexpected consequences, such as an
> application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure
> how GMail will escape that -- that's the punycode version) and getting a
> document with a description of some resource with IRI
> http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode
> version).  To help prevent this, we could discourage the use of IRIs with
> encoded IDNs in RDF, similar to how the existing spec discourages the use of
> URI Refs with percent-escaped characters.

I think this leads down the path of not using IRIs. When dereferencing
an HTTP IRI, one has to punyify the domain name and percentulate the
path, mapping http://伝言.example/?user=أكرم to
http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI
with characters outside of the legal URI characters will map to a
differently spelled URI, necessitating some typing of these respective
strings. If we're taking away the sharp knives, we'll have to take
away non-ascii characters and díäcrìtïcâl markç.


> -Alex

-- 
-ericP

Received on Thursday, 28 April 2011 21:59:35 UTC