Re: IRI guidance

On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote:

> just noticed a nice bit of text in the activity streams spec:
>
> [[
> This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is
> also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that
> is not also a URI is given for dereferencing, it MUST be mapped to a URI
> using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an
> identifier, it MUST NOT be so mapped.
> ]]
>
>
This corresponds nicely with how I think IRIs should work in RDF.  When used
as an identifier, an IRI is simply a sequence of Unicode characters.  That
character sequence conforms with the grammar defined in RFC3987, but many
applications don't care about that; they're only interested in knowing that
an RDF term is an IRI, and whether two RDF terms which are IRIs are the
same.  A simple string comparison of the Unicode characters should be
sufficient to determine equivalence of resources identified by IRIs.

If an IRI happens to be dereferenceable, and an application chooses to
dereference it, then they map it as a URI.  If, as part of this mapping, the
application encodes an IDN and finds that the encoded URI is the same as
another resource IRI, then it might conclude that those identify the same
resource.  But it should do so with the understanding that this is an
extension of the semantics of RDF, assuming we define IRI equivalence as a
simple string comparison.

Unfortunately this can lead to unexpected consequences, such as an
application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure
how GMail will escape that -- that's the punycode version) and getting a
document with a description of some resource with IRI
http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode
version).  To help prevent this, we could discourage the use of IRIs with
encoded IDNs in RDF, similar to how the existing spec discourages the use of
URI Refs with percent-escaped characters.

-Alex

Received on Thursday, 28 April 2011 20:16:36 UTC