- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 28 Apr 2011 20:22:57 -0400
- To: Nathan <nathan@webr3.org>
- Cc: Alex Hall <alexhall@revelytix.com>, RDF WG <public-rdf-wg@w3.org>
* Nathan <nathan@webr3.org> [2011-04-28 23:09+0100] > Eric Prud'hommeaux wrote: > >* Alex Hall <alexhall@revelytix.com> [2011-04-28 16:16-0400] > >>On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote: > >> > >>>just noticed a nice bit of text in the activity streams spec: > >>> > >>>[[ > >>>This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is > >>>also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that > >>>is not also a URI is given for dereferencing, it MUST be mapped to a URI > >>>using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an > >>>identifier, it MUST NOT be so mapped. > >>>]] > >>> > >>> > >>This corresponds nicely with how I think IRIs should work in RDF. When used > >>as an identifier, an IRI is simply a sequence of Unicode characters. That > >>character sequence conforms with the grammar defined in RFC3987, but many > >>applications don't care about that; they're only interested in knowing that > >>an RDF term is an IRI, and whether two RDF terms which are IRIs are the > >>same. A simple string comparison of the Unicode characters should be > >>sufficient to determine equivalence of resources identified by IRIs. > >> > >>If an IRI happens to be dereferenceable, and an application chooses to > >>dereference it, then they map it as a URI. If, as part of this mapping, the > >>application encodes an IDN and finds that the encoded URI is the same as > >>another resource IRI, then it might conclude that those identify the same > >>resource. But it should do so with the understanding that this is an > >>extension of the semantics of RDF, assuming we define IRI equivalence as a > >>simple string comparison. > >> > >>Unfortunately this can lead to unexpected consequences, such as an > >>application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure > >>how GMail will escape that -- that's the punycode version) and getting a > >>document with a description of some resource with IRI > >>http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode > >>version). To help prevent this, we could discourage the use of IRIs with > >>encoded IDNs in RDF, similar to how the existing spec discourages the use of > >>URI Refs with percent-escaped characters. > > > >I think this leads down the path of not using IRIs. When dereferencing > >an HTTP IRI, one has to punyify the domain name and percentulate the > >path, mapping http://伝言.example/?user=أكرم to > >http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI > >with characters outside of the legal URI characters will map to a > >differently spelled URI, necessitating some typing of these respective > >strings. If we're taking away the sharp knives, we'll have to take > >away non-ascii characters and díäcrìtïcâl markç. > > I wonder if it's safe to think of dereferencing as a black box and > as none of our concern? I'd say that's safe unless we proselytize Linked Data dereferencing ideals in the core spec. Currently, RDF Concepts¹ says it uses URI references, which the DAWG decided really meant IRIs², HTTP tells us how to GET URIs³, and the IRI spec tells us how to map from IRIs to URIs⁴. It's currently up to the user to concieve of pasting an RDF node identifier into a browser. Apart from that, it's all spelled out. ¹ http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref ² http://www.w3.org/TR/rdf-sparql-query/#QSynIRI ³ http://tools.ietf.org/html/rfc2616#section-5.1.2 ⁴ http://tools.ietf.org/html/rfc3987#section-3.1 > Personally I have to confess that I'd much rather encounter > http://伝言. example/?user=أكرم in some RDF than > http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 > > best, nathan -- -ericP
Received on Friday, 29 April 2011 00:23:26 UTC