- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 28 Apr 2011 17:59:00 -0400
- To: Alex Hall <alexhall@revelytix.com>
- Cc: nathan@webr3.org, RDF WG <public-rdf-wg@w3.org>
* Alex Hall <alexhall@revelytix.com> [2011-04-28 16:16-0400] > On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote: > > > just noticed a nice bit of text in the activity streams spec: > > > > [[ > > This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is > > also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that > > is not also a URI is given for dereferencing, it MUST be mapped to a URI > > using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an > > identifier, it MUST NOT be so mapped. > > ]] > > > > > This corresponds nicely with how I think IRIs should work in RDF. When used > as an identifier, an IRI is simply a sequence of Unicode characters. That > character sequence conforms with the grammar defined in RFC3987, but many > applications don't care about that; they're only interested in knowing that > an RDF term is an IRI, and whether two RDF terms which are IRIs are the > same. A simple string comparison of the Unicode characters should be > sufficient to determine equivalence of resources identified by IRIs. > > If an IRI happens to be dereferenceable, and an application chooses to > dereference it, then they map it as a URI. If, as part of this mapping, the > application encodes an IDN and finds that the encoded URI is the same as > another resource IRI, then it might conclude that those identify the same > resource. But it should do so with the understanding that this is an > extension of the semantics of RDF, assuming we define IRI equivalence as a > simple string comparison. > > Unfortunately this can lead to unexpected consequences, such as an > application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure > how GMail will escape that -- that's the punycode version) and getting a > document with a description of some resource with IRI > http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode > version). To help prevent this, we could discourage the use of IRIs with > encoded IDNs in RDF, similar to how the existing spec discourages the use of > URI Refs with percent-escaped characters. I think this leads down the path of not using IRIs. When dereferencing an HTTP IRI, one has to punyify the domain name and percentulate the path, mapping http://伝言.example/?user=أكرم to http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI with characters outside of the legal URI characters will map to a differently spelled URI, necessitating some typing of these respective strings. If we're taking away the sharp knives, we'll have to take away non-ascii characters and díäcrìtïcâl markç. > -Alex -- -ericP
Received on Thursday, 28 April 2011 21:59:35 UTC