Re: IRI guidance from Ivan Herman on 2011-04-29 (public-rdf-wg@w3.org from April 2011)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 29 Apr 2011 08:24:57 +0200
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Alex Hall <alexhall@revelytix.com>, nathan@webr3.org, RDF WG <public-rdf-wg@w3.org>
Message-Id: <D14AAE0F-C1ED-432E-BE58-48D213C46164@w3.org>

On Apr 28, 2011, at 23:59 , Eric Prud'hommeaux wrote:
<snip/>
>> 
>> Unfortunately this can lead to unexpected consequences, such as an
>> application dereferencing the IRI http://xn--rsum-bpad.example.org (not sure
>> how GMail will escape that -- that's the punycode version) and getting a
>> document with a description of some resource with IRI
>> http://résumé.example.org <http://xn--rsum-bpad.example.org> (Unicode
>> version).  To help prevent this, we could discourage the use of IRIs with
>> encoded IDNs in RDF, similar to how the existing spec discourages the use of
>> URI Refs with percent-escaped characters.
> 
> I think this leads down the path of not using IRIs. When dereferencing
> an HTTP IRI, one has to punyify the domain name and percentulate the
> path, mapping http://伝言.example/?user=أكرم to
> http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI
> with characters outside of the legal URI characters will map to a
> differently spelled URI, necessitating some typing of these respective
> strings. If we're taking away the sharp knives, we'll have to take
> away non-ascii characters and díäcrìtïcâl markç.

Eric, I am not sure I understand that. The proposal is to say that, in RDF, there should be a preference for the UTF version of the URI-s, ie, I should, if possible, opt for http://伝言.example/?user=أكرم rather than the the other version. What happens underneath if I dereference that URI and send it to tools for an HTTP get or anything similar is a separate issue. Indeed, on an English keyboard typing something even as simple as http://iván.herman.net is a pain for a user, but that is a practical problem which is again outside the realm of RDF.

Ie: saying that we keep to the current version of RDF, ie, equality of IRI-s is based on a character-by-character comparison (like now) but giving an advice to, if possible, use the IRI without the punycode seems to be a reasonable way of handling this... What else would you propose instead?

Cheers

Ivan

> 
> 
>> -Alex
> 
> -- 
> -ericP
> 

----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Friday, 29 April 2011 06:23:52 UTC