Fwd: URIEquivalence-15: influence on IRIs (was: Re: [Minutes] 27 Jan 2003 TAG teleconf (..., IRIEverywhere-27, ...)) from Martin Duerst on 2003-02-05 (www-international@w3.org from January to March 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 05 Feb 2003 13:28:06 -0500
To: www-international@w3.org
Message-Id: <4.2.0.58.J.20030205132659.04786140@localhost>

This message should have been cc'ed to www-international.
If you want to follow up, please add www-tag to the cc list.

>Date: Mon, 03 Feb 2003 14:33:20 -0500
>To: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
>From: Martin Duerst <duerst@w3.org>
>Subject: URIEquivalence-15: influence on IRIs (was: Re: [Minutes] 27   Jan 
>2003 TAG teleconf (..., IRIEverywhere-27, ...))


>At 20:20 03/01/27 -0500, Ian B. Jacobs wrote:
>
>>Minutes of the 27 Jan 2003 TAG teleconf available as
>>HTML [1] and as text below.
>
>>   2.3 IRIEverywhere-27
>
>>      [25] http://www.w3.org/2001/tag/ilist#IRIEverywhere-27
>
>>    [Ian]
>>           CL: There is a bigger effect on IRI spec and suggestions for
>>           RFC2396.
>
>>    [Chris]
>>           this has more effect on IRI comparison (which is done by
>>           transformation to URI and then comparing)
>
>[the current IRI draft does not mandate this kind of comparing IRIs]
>
>
>>    [Chris]
>>           it means that the *actual kanji* and the sequence of hexifyied
>>           octets compare to the same
>>           which helps in roundtripping a very great deal
>
>URIEquivalent-15 and IRIs are indeed very strongly related.
>
>In the I18N WG, we have discussed which solution would be better
>for internationalization:
>
>1) "%7e" and "%7E" and "~" are not necessarily equivalent for all
>    kinds of processing.
>
>2) "%7e" and "%7E" and "~" are equivalent in all cases.
>
>As Chris points out above, solution 2) is better for round-tripping,
>and may therefore be better for gradual acceptance and overall
>interoperability.
>
>However, there is also a strong feeling that being able to escape
>in all cases without any losses will lead to a lot of downgrading,
>and hopelessly confusing long sequences of %-escaping rather than
>'the real thing' (i.e. the actual IRI characters). Also, while
>escaping is always possible and relatively easy, un-escaping is
>a bit more difficult and needs to be done carefully to avoid
>converting non-UTF-8 octet sequences, to avoid to convert to
>characters that are not allowed in IRIs (yes, there are a few
>of these), and to avoid potential security issues.
>(please see
>http://www.w3.org/International/iri-edit/draft-duerst-iri.html#URItoIRI
>for details).
>
>So overall, from an IRI and internationalization viewpoint, it is not
>clear that always comparing AFTER hex-escaping is the right way to go.
>
>What is very clear is that the solution chosen should be consistent
>across URIs/IRIs. I.e.
>
>if '%7e' is always equal to '%7E' and '~', then '%4C' and '%4c'
>and 'X' should always be equal, as well as e.g. &eacute; (in HTML),
>'%c3%a9', '%c3%A9', '%C3%a9', '%C3%A9', and vice versa (i.e.
>as an alternative, all these are different)
>[the only exception being for reserved characters as listed in
>  RFC 2396]
>
>
>Regards,    Martin.

Received on Wednesday, 5 February 2003 14:21:39 UTC