W3C home > Mailing lists > Public > ietf-charsets@w3.org > October to December 2001

RE: ISO 8859 -8:1999

From: Francois Yergeau <FYergeau@alis.com>
Date: Sun, 02 Dec 2001 14:15:01 -0500
To: ietf-charsets@iana.org, Keld Jørn Simonsen <keld@dkuug.dk>, Jonathan Rosenne <rosenne@qsm.co.il>
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB73660EB2BD@alis-2k.alis.domain>
Keld Jørn Simonsen wrote:
> Jonathan Rosenne wrote:
>> Justification: ISO_8859-8:1999 is a superset of ISO_8859-8:1988. Valid
>> ISO_8859-8:1988 data will still be valid under ISO_8859-8:1999. The new
>> characters were reserved in ISO_8859-8:1988. Registering ISO_8859-8:1999
>> as a separate character set would cause too much unnecessary confusion.
>
>As the two charsets are not exactly equivalent, they should not be
>aliases.

Not so fast, please.  This question was debated at length, on this very
list, when RFC 2279 was under scrutiny.  The result was that the label
"UTF-8" was determined to be appropriate for all version of Unicode after
1.1, provided no incompatible change occurs, which is the same as saying
that subsequent versions have a superset relationship to previous ones.  The
arguments are in the RFC itself, section 5:

   It is noteworthy that the label "UTF-8" does not contain a version
   identification, referring generically to ISO/IEC 10646.  This is
   intentional, the rationale being as follows:

   A MIME charset label is designed to give just the information needed
   to interpret a sequence of bytes received on the wire into a sequence
   of characters, nothing more (see RFC 2045, section 2.2, in [MIME]).
   As long as a character set standard does not change incompatibly,
   version numbers serve no purpose, because one gains nothing by
   learning from the tag that newly assigned characters may be received
   that one doesn't know about.  The tag itself doesn't teach anything
   about the new characters, which are going to be received anyway.

   Hence, as long as the standards evolve compatibly, the apparent
   advantage of having labels that identify the versions is only that,
   apparent.  But there is a disadvantage to such version-dependent
   labels: when an older application receives data accompanied by a
   newer, unknown label, it may fail to recognize the label and be
   completely unable to deal with the data, whereas a generic, known
   label would have triggered mostly correct processing of the data,
   which may well not contain any new characters.

   ["Korean mess" paragraph elided]

   In practice, then, a version-independent label is warranted, provided
   the label is understood to refer to all versions after Amendment 5,
   and provided no incompatible change actually occurs.  Should
   incompatible changes occur in a later version of ISO/IEC 10646, the
   MIME charset label defined here will stay aligned with the previous
   version until and unless the IETF specifically decides otherwise.


I think the same argument can apply to any other charset that evolves
compatibly, i.e. later versions are strict supersets of earlier ones and
nothing else creates incompatibility.  Jonathan tells us that this is the
case with ISO_8859-8:1988 and :1999.

This does not prevent the IANA registry from containing superset
relationship information.

Regards,

-- 
François Yergeau 
Received on Tuesday, 4 December 2001 09:33:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:52 GMT