- From: Nathan <nathan@webr3.org>
- Date: Thu, 28 Apr 2011 23:09:08 +0100
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Alex Hall <alexhall@revelytix.com>, RDF WG <public-rdf-wg@w3.org>
Eric Prud'hommeaux wrote: > * Alex Hall <alexhall@revelytix.com> [2011-04-28 16:16-0400] >> On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote: >> >>> just noticed a nice bit of text in the activity streams spec: >>> >>> [[ >>> This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is >>> also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that >>> is not also a URI is given for dereferencing, it MUST be mapped to a URI >>> using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an >>> identifier, it MUST NOT be so mapped. >>> ]] >>> >>> >> This corresponds nicely with how I think IRIs should work in RDF. When used >> as an identifier, an IRI is simply a sequence of Unicode characters. That >> character sequence conforms with the grammar defined in RFC3987, but many >> applications don't care about that; they're only interested in knowing that >> an RDF term is an IRI, and whether two RDF terms which are IRIs are the >> same. A simple string comparison of the Unicode characters should be >> sufficient to determine equivalence of resources identified by IRIs. >> >> If an IRI happens to be dereferenceable, and an application chooses to >> dereference it, then they map it as a URI. If, as part of this mapping, the >> application encodes an IDN and finds that the encoded URI is the same as >> another resource IRI, then it might conclude that those identify the same >> resource. But it should do so with the understanding that this is an >> extension of the semantics of RDF, assuming we define IRI equivalence as a >> simple string comparison. >> >> Unfortunately this can lead to unexpected consequences, such as an >> application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure >> how GMail will escape that -- that's the punycode version) and getting a >> document with a description of some resource with IRI >> http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode >> version). To help prevent this, we could discourage the use of IRIs with >> encoded IDNs in RDF, similar to how the existing spec discourages the use of >> URI Refs with percent-escaped characters. > > I think this leads down the path of not using IRIs. When dereferencing > an HTTP IRI, one has to punyify the domain name and percentulate the > path, mapping http://伝言.example/?user=أكرم to > http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI > with characters outside of the legal URI characters will map to a > differently spelled URI, necessitating some typing of these respective > strings. If we're taking away the sharp knives, we'll have to take > away non-ascii characters and díäcrìtïcâl markç. I wonder if it's safe to think of dereferencing as a black box and as none of our concern? Personally I have to confess that I'd much rather encounter http://伝言. example/?user=أكرم in some RDF than http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 best, nathan
Received on Thursday, 28 April 2011 22:10:01 UTC