- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 03 Feb 2003 14:33:20 -0500
- To: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
At 20:20 03/01/27 -0500, Ian B. Jacobs wrote: >Minutes of the 27 Jan 2003 TAG teleconf available as >HTML [1] and as text below. > 2.3 IRIEverywhere-27 > [25] http://www.w3.org/2001/tag/ilist#IRIEverywhere-27 > [Ian] > CL: There is a bigger effect on IRI spec and suggestions for > RFC2396. > [Chris] > this has more effect on IRI comparison (which is done by > transformation to URI and then comparing) [the current IRI draft does not mandate this kind of comparing IRIs] > [Chris] > it means that the *actual kanji* and the sequence of hexifyied > octets compare to the same > which helps in roundtripping a very great deal URIEquivalent-15 and IRIs are indeed very strongly related. In the I18N WG, we have discussed which solution would be better for internationalization: 1) "%7e" and "%7E" and "~" are not necessarily equivalent for all kinds of processing. 2) "%7e" and "%7E" and "~" are equivalent in all cases. As Chris points out above, solution 2) is better for round-tripping, and may therefore be better for gradual acceptance and overall interoperability. However, there is also a strong feeling that being able to escape in all cases without any losses will lead to a lot of downgrading, and hopelessly confusing long sequences of %-escaping rather than 'the real thing' (i.e. the actual IRI characters). Also, while escaping is always possible and relatively easy, un-escaping is a bit more difficult and needs to be done carefully to avoid converting non-UTF-8 octet sequences, to avoid to convert to characters that are not allowed in IRIs (yes, there are a few of these), and to avoid potential security issues. (please see http://www.w3.org/International/iri-edit/draft-duerst-iri.html#URItoIRI for details). So overall, from an IRI and internationalization viewpoint, it is not clear that always comparing AFTER hex-escaping is the right way to go. What is very clear is that the solution chosen should be consistent across URIs/IRIs. I.e. if '%7e' is always equal to '%7E' and '~', then '%4C' and '%4c' and 'X' should always be equal, as well as e.g. é (in HTML), '%c3%a9', '%c3%A9', '%C3%a9', '%C3%A9', and vice versa (i.e. as an alternative, all these are different) [the only exception being for reserved characters as listed in RFC 2396] Regards, Martin.
Received on Monday, 3 February 2003 14:34:34 UTC