W3C home > Mailing lists > Public > ietf-charsets@w3.org > October to December 2001

RE: ISO 8859 -8:1999

From: Francois Yergeau <FYergeau@alis.com>
Date: Wed, 05 Dec 2001 09:28:17 -0500
To: Martin Duerst <duerst@w3.org>, ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB7366091D11@alis-2k.alis.domain>
Hello Martin,

Questionning the applicability of something that was argued only for Unicode
certainly makes sense, but I'm afraid I don't quite follow your reasoning.

In my mind, the argument never hinged on conversion (or not) to Unicode, it
was simply a recognition that having a new label for compatible upgrades of
charsets actually hurt more that it helped, because a non-updated receiver
that gets the new label has to fail completely, whereas if the old label is
kept, the same non-updated receiver only has to fail if it actually gets
some of the new characters it doesn't know about.

This seems to apply whether the charset in question is UTF-8 of ISO-8859-8,
unless one assumes that in the case of UTF-8, an application can accept
characters that it doesn't know about.  Is that what you were implying?

-- 
Francois

> -----Message d'origine-----
> De?: Martin Duerst [mailto:duerst@w3.org]
> Envoye?: 5 decembre, 2001 00:27
> A?: Francois Yergeau; ietf-charsets@iana.org; Keld J ?n 
> Simonsen; Jonathan Rosenne
> Objet?: RE: ISO 8859 -8:1999
> 
> 
> Hello Francois,
> 
> I'm not exactly sure your argument with UTF-8 works.
> 
> The problem becomes obvious if we assume that the UCS
> is the reference. In this case, additions to the UCS
> can always be made to work as good as the current
> implementation (because we don't expect conversion
> to legacy encodings to work, except for something
> like NCRs in HTML/XML). However, additions to legacy
> encodings will break interoperability if new data
> is sent to old implementations with the old label,
> in particular if the implementation converts to
> UCS internally (which more and more implementations
> do nowadays).
> 
> Regards,   Martin.
> 
> At 14:15 01/12/02 -0500, Francois Yergeau wrote:
> >Keld J?n Simonsen wrote:
> > > Jonathan Rosenne wrote:
> > >> Justification: ISO_8859-8:1999 is a superset of 
> ISO_8859-8:1988. Valid
> > >> ISO_8859-8:1988 data will still be valid under 
> ISO_8859-8:1999. The new
> > >> characters were reserved in ISO_8859-8:1988. Registering 
> ISO_8859-8:1999
> > >> as a separate character set would cause too much 
> unnecessary confusion.
> > >
> > >As the two charsets are not exactly equivalent, they should not be
> > >aliases.
> >
> >Not so fast, please.  This question was debated at length, 
> on this very
> >list, when RFC 2279 was under scrutiny.  The result was that 
> the label
> >"UTF-8" was determined to be appropriate for all version of 
> Unicode after
> >1.1, provided no incompatible change occurs, which is the 
> same as saying
> >that subsequent versions have a superset relationship to 
> previous ones.  The
> >arguments are in the RFC itself, section 5:
> >
> >    It is noteworthy that the label "UTF-8" does not contain 
> a version
> >    identification, referring generically to ISO/IEC 10646.  This is
> >    intentional, the rationale being as follows:
> >
> >    A MIME charset label is designed to give just the 
> information needed
> >    to interpret a sequence of bytes received on the wire 
> into a sequence
> >    of characters, nothing more (see RFC 2045, section 2.2, 
> in [MIME]).
> >    As long as a character set standard does not change incompatibly,
> >    version numbers serve no purpose, because one gains nothing by
> >    learning from the tag that newly assigned characters may 
> be received
> >    that one doesn't know about.  The tag itself doesn't 
> teach anything
> >    about the new characters, which are going to be received anyway.
> >
> >    Hence, as long as the standards evolve compatibly, the apparent
> >    advantage of having labels that identify the versions is 
> only that,
> >    apparent.  But there is a disadvantage to such version-dependent
> >    labels: when an older application receives data accompanied by a
> >    newer, unknown label, it may fail to recognize the label and be
> >    completely unable to deal with the data, whereas a generic, known
> >    label would have triggered mostly correct processing of the data,
> >    which may well not contain any new characters.
> >
> >    ["Korean mess" paragraph elided]
> >
> >    In practice, then, a version-independent label is 
> warranted, provided
> >    the label is understood to refer to all versions after 
> Amendment 5,
> >    and provided no incompatible change actually occurs.  Should
> >    incompatible changes occur in a later version of ISO/IEC 
> 10646, the
> >    MIME charset label defined here will stay aligned with 
> the previous
> >    version until and unless the IETF specifically decides otherwise.
> >
> >
> >I think the same argument can apply to any other charset that evolves
> >compatibly, i.e. later versions are strict supersets of 
> earlier ones and
> >nothing else creates incompatibility.  Jonathan tells us 
> that this is the
> >case with ISO_8859-8:1988 and :1999.
> >
> >This does not prevent the IANA registry from containing superset
> >relationship information.
> >
> >Regards,
> >
> >--
> >Fran輟is Yergeau
> 
Received on Wednesday, 5 December 2001 09:31:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:52 GMT