- From: Alex Hall <alexhall@revelytix.com>
- Date: Thu, 28 Apr 2011 16:16:09 -0400
- To: nathan@webr3.org
- Cc: RDF WG <public-rdf-wg@w3.org>
- Message-ID: <BANLkTin-HCA8xN5bFTqZTUvCC5GrUJ++2A@mail.gmail.com>
On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote: > just noticed a nice bit of text in the activity streams spec: > > [[ > This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is > also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that > is not also a URI is given for dereferencing, it MUST be mapped to a URI > using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an > identifier, it MUST NOT be so mapped. > ]] > > This corresponds nicely with how I think IRIs should work in RDF. When used as an identifier, an IRI is simply a sequence of Unicode characters. That character sequence conforms with the grammar defined in RFC3987, but many applications don't care about that; they're only interested in knowing that an RDF term is an IRI, and whether two RDF terms which are IRIs are the same. A simple string comparison of the Unicode characters should be sufficient to determine equivalence of resources identified by IRIs. If an IRI happens to be dereferenceable, and an application chooses to dereference it, then they map it as a URI. If, as part of this mapping, the application encodes an IDN and finds that the encoded URI is the same as another resource IRI, then it might conclude that those identify the same resource. But it should do so with the understanding that this is an extension of the semantics of RDF, assuming we define IRI equivalence as a simple string comparison. Unfortunately this can lead to unexpected consequences, such as an application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure how GMail will escape that -- that's the punycode version) and getting a document with a description of some resource with IRI http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode version). To help prevent this, we could discourage the use of IRIs with encoded IDNs in RDF, similar to how the existing spec discourages the use of URI Refs with percent-escaped characters. -Alex
Received on Thursday, 28 April 2011 20:16:36 UTC