Re: IRI guidance

* Nathan <nathan@webr3.org> [2011-04-28 23:09+0100]
> Eric Prud'hommeaux wrote:
> >* Alex Hall <alexhall@revelytix.com> [2011-04-28 16:16-0400]
> >>On Wed, Apr 27, 2011 at 3:52 PM, Nathan <nathan@webr3.org> wrote:
> >>
> >>>just noticed a nice bit of text in the activity streams spec:
> >>>
> >>>[[
> >>>This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is
> >>>also an IRI, so a URI MAY be used wherever an IRI is named. When an IRI that
> >>>is not also a URI is given for dereferencing, it MUST be mapped to a URI
> >>>using the steps in Section 3.1 of [RFC3987]. When an IRI is serving as an
> >>>identifier, it MUST NOT be so mapped.
> >>>]]
> >>>
> >>>
> >>This corresponds nicely with how I think IRIs should work in RDF.  When used
> >>as an identifier, an IRI is simply a sequence of Unicode characters.  That
> >>character sequence conforms with the grammar defined in RFC3987, but many
> >>applications don't care about that; they're only interested in knowing that
> >>an RDF term is an IRI, and whether two RDF terms which are IRIs are the
> >>same.  A simple string comparison of the Unicode characters should be
> >>sufficient to determine equivalence of resources identified by IRIs.
> >>
> >>If an IRI happens to be dereferenceable, and an application chooses to
> >>dereference it, then they map it as a URI.  If, as part of this mapping, the
> >>application encodes an IDN and finds that the encoded URI is the same as
> >>another resource IRI, then it might conclude that those identify the same
> >>resource.  But it should do so with the understanding that this is an
> >>extension of the semantics of RDF, assuming we define IRI equivalence as a
> >>simple string comparison.
> >>
> >>Unfortunately this can lead to unexpected consequences, such as an
> >>application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure
> >>how GMail will escape that -- that's the punycode version) and getting a
> >>document with a description of some resource with IRI
> >>http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode
> >>version).  To help prevent this, we could discourage the use of IRIs with
> >>encoded IDNs in RDF, similar to how the existing spec discourages the use of
> >>URI Refs with percent-escaped characters.
> >
> >I think this leads down the path of not using IRIs. When dereferencing
> >an HTTP IRI, one has to punyify the domain name and percentulate the
> >path, mapping http://伝言.example/?user=أكرم to
> >http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI
> >with characters outside of the legal URI characters will map to a
> >differently spelled URI, necessitating some typing of these respective
> >strings. If we're taking away the sharp knives, we'll have to take
> >away non-ascii characters and díäcrìtïcâl markç.
> 
> I wonder if it's safe to think of dereferencing as a black box and
> as none of our concern?

I'd say that's safe unless we proselytize Linked Data dereferencing
ideals in the core spec. Currently, RDF Concepts¹ says it uses URI
references, which the DAWG decided really meant IRIs², HTTP tells us
how to GET URIs³, and the IRI spec tells us how to map from IRIs to
URIs⁴. It's currently up to the user to concieve of pasting an RDF
node identifier into a browser. Apart from that, it's all spelled out.

¹ http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
² http://www.w3.org/TR/rdf-sparql-query/#QSynIRI
³ http://tools.ietf.org/html/rfc2616#section-5.1.2http://tools.ietf.org/html/rfc3987#section-3.1


> Personally I have to confess that I'd much rather encounter
> http://伝言. example/?user=أكرم in some RDF than
> http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85
> 
> best, nathan

-- 
-ericP

Received on Friday, 29 April 2011 00:23:26 UTC