W3C home > Mailing lists > Public > public-rdf-wg@w3.org > April 2011

Re: IRI guidance

From: Alex Hall <alexhall@revelytix.com>
Date: Thu, 28 Apr 2011 16:16:09 -0400
Message-ID: <BANLkTin-HCA8xN5bFTqZTUvCC5GrUJ++2A@mail.gmail.com>
To: nathan@webr3.org
Cc: RDF WG <public-rdf-wg@w3.org>
On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote:

> just noticed a nice bit of text in the activity streams spec:
> [[
> This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is
> also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that
> is not also a URI is given for dereferencing, it MUST be mapped to a URI
> using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an
> identifier, it MUST NOT be so mapped.
> ]]
This corresponds nicely with how I think IRIs should work in RDF.  When used
as an identifier, an IRI is simply a sequence of Unicode characters.  That
character sequence conforms with the grammar defined in RFC3987, but many
applications don't care about that; they're only interested in knowing that
an RDF term is an IRI, and whether two RDF terms which are IRIs are the
same.  A simple string comparison of the Unicode characters should be
sufficient to determine equivalence of resources identified by IRIs.

If an IRI happens to be dereferenceable, and an application chooses to
dereference it, then they map it as a URI.  If, as part of this mapping, the
application encodes an IDN and finds that the encoded URI is the same as
another resource IRI, then it might conclude that those identify the same
resource.  But it should do so with the understanding that this is an
extension of the semantics of RDF, assuming we define IRI equivalence as a
simple string comparison.

Unfortunately this can lead to unexpected consequences, such as an
application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure
how GMail will escape that -- that's the punycode version) and getting a
document with a description of some resource with IRI
http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode
version).  To help prevent this, we could discourage the use of IRIs with
encoded IDNs in RDF, similar to how the existing spec discourages the use of
URI Refs with percent-escaped characters.

Received on Thursday, 28 April 2011 20:16:36 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:01:58 UTC