- From: Francois Yergeau <FYergeau@alis.com>
- Date: Wed, 05 Dec 2001 09:28:17 -0500
- To: Martin Duerst <duerst@w3.org>, ietf-charsets@iana.org
Hello Martin, Questionning the applicability of something that was argued only for Unicode certainly makes sense, but I'm afraid I don't quite follow your reasoning. In my mind, the argument never hinged on conversion (or not) to Unicode, it was simply a recognition that having a new label for compatible upgrades of charsets actually hurt more that it helped, because a non-updated receiver that gets the new label has to fail completely, whereas if the old label is kept, the same non-updated receiver only has to fail if it actually gets some of the new characters it doesn't know about. This seems to apply whether the charset in question is UTF-8 of ISO-8859-8, unless one assumes that in the case of UTF-8, an application can accept characters that it doesn't know about. Is that what you were implying? -- Francois > -----Message d'origine----- > De?: Martin Duerst [mailto:duerst@w3.org] > Envoye?: 5 decembre, 2001 00:27 > A?: Francois Yergeau; ietf-charsets@iana.org; Keld J ?n > Simonsen; Jonathan Rosenne > Objet?: RE: ISO 8859 -8:1999 > > > Hello Francois, > > I'm not exactly sure your argument with UTF-8 works. > > The problem becomes obvious if we assume that the UCS > is the reference. In this case, additions to the UCS > can always be made to work as good as the current > implementation (because we don't expect conversion > to legacy encodings to work, except for something > like NCRs in HTML/XML). However, additions to legacy > encodings will break interoperability if new data > is sent to old implementations with the old label, > in particular if the implementation converts to > UCS internally (which more and more implementations > do nowadays). > > Regards, Martin. > > At 14:15 01/12/02 -0500, Francois Yergeau wrote: > >Keld J?n Simonsen wrote: > > > Jonathan Rosenne wrote: > > >> Justification: ISO_8859-8:1999 is a superset of > ISO_8859-8:1988. Valid > > >> ISO_8859-8:1988 data will still be valid under > ISO_8859-8:1999. The new > > >> characters were reserved in ISO_8859-8:1988. Registering > ISO_8859-8:1999 > > >> as a separate character set would cause too much > unnecessary confusion. > > > > > >As the two charsets are not exactly equivalent, they should not be > > >aliases. > > > >Not so fast, please. This question was debated at length, > on this very > >list, when RFC 2279 was under scrutiny. The result was that > the label > >"UTF-8" was determined to be appropriate for all version of > Unicode after > >1.1, provided no incompatible change occurs, which is the > same as saying > >that subsequent versions have a superset relationship to > previous ones. The > >arguments are in the RFC itself, section 5: > > > > It is noteworthy that the label "UTF-8" does not contain > a version > > identification, referring generically to ISO/IEC 10646. This is > > intentional, the rationale being as follows: > > > > A MIME charset label is designed to give just the > information needed > > to interpret a sequence of bytes received on the wire > into a sequence > > of characters, nothing more (see RFC 2045, section 2.2, > in [MIME]). > > As long as a character set standard does not change incompatibly, > > version numbers serve no purpose, because one gains nothing by > > learning from the tag that newly assigned characters may > be received > > that one doesn't know about. The tag itself doesn't > teach anything > > about the new characters, which are going to be received anyway. > > > > Hence, as long as the standards evolve compatibly, the apparent > > advantage of having labels that identify the versions is > only that, > > apparent. But there is a disadvantage to such version-dependent > > labels: when an older application receives data accompanied by a > > newer, unknown label, it may fail to recognize the label and be > > completely unable to deal with the data, whereas a generic, known > > label would have triggered mostly correct processing of the data, > > which may well not contain any new characters. > > > > ["Korean mess" paragraph elided] > > > > In practice, then, a version-independent label is > warranted, provided > > the label is understood to refer to all versions after > Amendment 5, > > and provided no incompatible change actually occurs. Should > > incompatible changes occur in a later version of ISO/IEC > 10646, the > > MIME charset label defined here will stay aligned with > the previous > > version until and unless the IETF specifically decides otherwise. > > > > > >I think the same argument can apply to any other charset that evolves > >compatibly, i.e. later versions are strict supersets of > earlier ones and > >nothing else creates incompatibility. Jonathan tells us > that this is the > >case with ISO_8859-8:1988 and :1999. > > > >This does not prevent the IANA registry from containing superset > >relationship information. > > > >Regards, > > > >-- > >Fran輟is Yergeau >
Received on Wednesday, 5 December 2001 09:31:01 UTC