Re: Comments on 31 March spec

On Wed, 9 Apr 1997 01:51:56 -0400 Murata Makoto said:
>Gavin Nicol writes:
>>
>>Unecessary. XML uses ISO 10646.
>
>We should clarify which ISO 10646 and which Unicode.  Before or after
>DAM9 (Draft Ammendment 9)?  This DAM *changes* codes for Hangul
>characters and introduce many new characters.

Tim Bray and I have recently heard heard this wish from others as well.
In general, I think specifying clearly what is intended is a good
idea; in this particular case, it's not clear to me which version of
10646 and which version of Unicode should be specified.  In principle,
it seems to me that it would be best if:

  - we could specify the most recent version of each standard
  - we could refer both to ISO 10646 and to Unicode
  - the versions of 10646 and Unicode to which we refer were
identical in technical content

Unfortunately, these three principles seem not to be combinable,
unless perhaps the draft amendments to 10646 bring the two standards
into agreement again.

Several years ago, when it appeared that Unicode and 10646 would be
incompatible, the user community rose up and demanded that those
responsible for the two standards settle their differences and keep them
in synch with each other.  Now, it sometimes appears that the
responsible authorities are failing to keep their promises, and allowing
the two standards to diverge; I hope this appearance is an illusion.

If there is any version of the two standards more recent than
ISO 10646:1993 / Unicode 1.1 in which the two are technically
identical, my first instinct would be to specify those versions.

If not, our choices seem to be:

  - to specify just some version of ISO 10646 and ignore Unicode
    (in which case, which version?)
  - to specify just some version of Unicode and ignore ISO 10646
    (in which case, which version?)
  - to specify one version of each, and give a rule about what to
    do when the two standards disagree (e.g. "in cases where
    ISO 10646:1993 and Unicode 2.0 disagree over the meaning of
    a bit string or the bit string for a character, the behavior
    of XML processors is undefined" or "is undefined and must be
    settled by agreements outside the scope of this specification")
  - to say nothing about the cases where the two standards are
    different.

We need to try to seek some consensus on the WG before deciding what
to do; I fear however that if those interested in character sets do
not reach agreement quickly, a protracted discussion of the fine
points and politics of Unicode and 10646 will create a groundswell
of opinion in favor of going back to ISO 8859-1 or even ISO 646.

What ought we to decide?

-C. M. Sperberg-McQueen

Received on Wednesday, 9 April 1997 10:05:49 UTC