- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 05 Feb 2003 13:28:06 -0500
- To: www-international@w3.org
This message should have been cc'ed to www-international. If you want to follow up, please add www-tag to the cc list. >Date: Mon, 03 Feb 2003 14:33:20 -0500 >To: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org >From: Martin Duerst <duerst@w3.org> >Subject: URIEquivalence-15: influence on IRIs (was: Re: [Minutes] 27 Jan >2003 TAG teleconf (..., IRIEverywhere-27, ...)) >At 20:20 03/01/27 -0500, Ian B. Jacobs wrote: > >>Minutes of the 27 Jan 2003 TAG teleconf available as >>HTML [1] and as text below. > >> 2.3 IRIEverywhere-27 > >> [25] http://www.w3.org/2001/tag/ilist#IRIEverywhere-27 > >> [Ian] >> CL: There is a bigger effect on IRI spec and suggestions for >> RFC2396. > >> [Chris] >> this has more effect on IRI comparison (which is done by >> transformation to URI and then comparing) > >[the current IRI draft does not mandate this kind of comparing IRIs] > > >> [Chris] >> it means that the *actual kanji* and the sequence of hexifyied >> octets compare to the same >> which helps in roundtripping a very great deal > >URIEquivalent-15 and IRIs are indeed very strongly related. > >In the I18N WG, we have discussed which solution would be better >for internationalization: > >1) "%7e" and "%7E" and "~" are not necessarily equivalent for all > kinds of processing. > >2) "%7e" and "%7E" and "~" are equivalent in all cases. > >As Chris points out above, solution 2) is better for round-tripping, >and may therefore be better for gradual acceptance and overall >interoperability. > >However, there is also a strong feeling that being able to escape >in all cases without any losses will lead to a lot of downgrading, >and hopelessly confusing long sequences of %-escaping rather than >'the real thing' (i.e. the actual IRI characters). Also, while >escaping is always possible and relatively easy, un-escaping is >a bit more difficult and needs to be done carefully to avoid >converting non-UTF-8 octet sequences, to avoid to convert to >characters that are not allowed in IRIs (yes, there are a few >of these), and to avoid potential security issues. >(please see >http://www.w3.org/International/iri-edit/draft-duerst-iri.html#URItoIRI >for details). > >So overall, from an IRI and internationalization viewpoint, it is not >clear that always comparing AFTER hex-escaping is the right way to go. > >What is very clear is that the solution chosen should be consistent >across URIs/IRIs. I.e. > >if '%7e' is always equal to '%7E' and '~', then '%4C' and '%4c' >and 'X' should always be equal, as well as e.g. é (in HTML), >'%c3%a9', '%c3%A9', '%C3%a9', '%C3%A9', and vice versa (i.e. >as an alternative, all these are different) >[the only exception being for reserved characters as listed in > RFC 2396] > > >Regards, Martin.
Received on Wednesday, 5 February 2003 14:21:39 UTC