Re: Your comments on the Character Model [C150, C151] from Martin Duerst on 2004-01-29 (www-i18n-comments@w3.org from January 2004)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 29 Jan 2004 18:18:41 -0500
To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Cc: www-i18n-comments@w3.org, w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20040126110459.07800cf8@localhost>
Hello Michael,

This is to notify you, in a recorded form, that we have
reconsidered your arguments re. your comments
http://www.w3.org/International/Group/2002/charmod-lc/#C150 and
http://www.w3.org/International/Group/2002/charmod-lc/#C151.

We have realized that in the mail leading to
http://lists.w3.org/Archives/Public/www-i18n-comments/2003Feb/0000.html,
we have not given a detailed justification for our decisions.

We would like to explain our decisions and the reasons we made them,
including the reasons that came up when we reconsidered this.
We hope that this will help you to accept our decisions, and ask
you to tell us whether you are satisfied with our decisions and
explanations or not within a reasonable time (e.g. two weeks).

Regarding
http://www.w3.org/International/Group/2002/charmod-lc/#C150,
you write:

 > Sec. 1.1 says, inter alia, 'In this document, Unicode is used as a
 > synonym for the Universal Character Set.' I believe the term 'UCS'
 > would be better, because it is clearer and less subject to misconstruction.

We have used the term Unicode because it is much more widely used
than UCS. A lot more people have heard about Unicode than about UCS.
The tendency to use acronyms in programming and related subjects is
usually seen as a problem more than as a features, and we think that
where reasonably useful terms/words are available, these should be used.
Overall, the use of Unicode makes our specification more readable
without making it less precise.


 > It is clearer because the term 'Unicode' may reasonably be used to
 > denote (a) the consortium of that name, (b) the Univeral Character
 > Set defined by ISO/IEC 10646 and by the Unicode Standard, (c) the
 > UCS taken together with the additional rules defined by the Unicode
 > Standard, which Unicode does NOT share with ISO/IEC 10646, and (d)
 > the Unicode Standard itself. Despite the explicit statement that
 > in the character model spec the term 'Unicode' is used in sense (b),
 > I suspect the common use, elsewhere, of the term in senses (a), (d),
 > and especially (c), will necessarily color readers' perceptions
 > of the meaning of the text.

We have taken special care to only use the term "Unicode" without
qualifications in the meaning of (a). If you can find an instance
where this is not the case, we would be glad to fix this.


 > The term 'UCS' is also less likely to convey to casual readers that
 > it is really the Unicode Standard, not ISO/IEC 10646, which counts.

Well, in certain cases, it actually IS the Unicode Standard, rather
than ISO/IEC 10646, that counts.


 > It is true, as you have pointed out from time to time, that the
 > Unicode Consortium and the responsible ISO/IEC technical committee
 > have worked well for some time now in keeping the two standards aligned.
 > I applaud that fact and the role some of you have individually played
 > in making it happen. But I remember too the years in which the two
 > organizations threatened to burden the world with two different and
 > incompatible universal character sets, and the roles some of you played
 > then, and I am unwilling that any W3C specification should risk conveying
 > the idea that if the two standards should diverge, the Web or the W3C
 > would naturally side with one or the other party.

We have watched the convergence between the Unicode Standard and
ISO/IEC 10646 over the years. It is not just that they have merged
over 10 years ago, but after that, they continuously moved closer
to each other. There are numerous examples of this, the last of
which is the transfer of the ISO collation standard to SC2/WG2
to allow better coordination between definition of new characters
and their (default) collation, as well as better coordination between
the UTC and ISO.

Sudden divergence between the Unicode Standard and ISO 10646 is
in theory possible, but in practice absolutely unlikely. In case
there would be even as much as just a hint of such a divergence,
W3C would neither side with one side or the other, but would
naturally put all its weight on both parties to convince them
to work together. W3C would also do everything necessary and
possible to mobilize other forces to exert their influence on
the UTC and ISO to avoid divergence. But as we know, the members
of the UTC and SC2 are very well aware of the need for continued
convergence, and we think that it is very unlikely that they
need to be reminded.


 > It would not be appropriate to use the term 'ISO/IEC 10646' (or just
 > '10646' for short) to refer to the UCS. It is also not appropriate
 > to use the term 'Unicode'.

The terms 'ISO/IEC 10646' or '10646' indeed lend itself much less
to use as words that 'Unicode'.

In http://lists.w3.org/Archives/Public/www-i18n-comments/2003Feb/0000.html,
you also say:

 > I regret to report that I find your rationale unconvincing.  If, as
 > you say in the response to C151, references to international standards
 > should invariably be preferred to references to corresponding national
 > standards, then how much more strongly ought specifications defined by
 > private industry consortia to be deprecated in favor of international
 > standards which define exactly the same technical content.

We did not say that there is a preference of ISO standards over consortia
standards. Indeed, it would not necessarily be appropriate for us as
as consortium to say that.


 > (And since
 > you say that the character model spec uses the term "Unicode" only to
 > refer to the UCS, the technical content must necessarily be exactly
 > the same.  I am a little surprised, since I thought that any
 > discussion of Unicode normalization must necessarily go beyond the
 > definition of the UCS, precisely into material specified by Unicode
 > but not by ISO 10646,

Yes, it does. But 'Unicode' and 'Unicode normalization' are not
the same. They are of course related, both in wording and in
meaning, but they are not the same.


 > and that use of the term 'Unicode' is
 > appropriate in that context.  But if you say Unicode normalization is
 > part of the definition of the UCS, I am not in a position to prove
 > otherwise.)




Regarding
http://www.w3.org/International/Group/2002/charmod-lc/#C151,
you write:

  > ANSI X3.4 is missing
  > The spec refers several times to ASCII. In the context of a
  > specification defining a character model, I assume that this term
  > is used in its proper and narrow sense to denote the coded
  > character set defined by American national standard ANSI X3.4.
  > That American national standard should be included among the
  > non-normative references.

We answered with:

  >    * Decision: Partially accepted.
  >    * Decision: Cite ISO 646 (International Reference Version), rather
  >      than ANSI X3.4, and link to it from the text.
  >    * Rationale for "Partially accepted": Where a national and an
  >      international standard define the same matter, use of the latter
  >      is preferable.

In http://lists.w3.org/Archives/Public/www-i18n-comments/2003Feb/0000.html,
you then wrote:

 > This would be an acceptable response if you had also removed the
 > references to "ASCII characters" and the like from the text of the
 > specification and replaced them with references to "ISO 646 IRV"
 > characters.  But the point of my comment was not solely that
 > references to specifications in the body of the text ought to be
 > accompanied by corresponding bibliographic information in the back
 > matter.  The term 'ASCII' is an acronym for the name of a specific
 > standard, the American Standard Code for Information Interchange,
 > published under that and several other titles from time to time
 > beginning in 1962.  If it is necessary to use the term 'ASCII', then
 > it seems to me that it would be a courtesy to your readers to explain
 > the acronym (this is a rule some authorities strongly recommend for
 > all acronyms), and a courtesy to those who developed the standard (as
 > well as to your readers) to provide a reference to the standard
 > itself.

The character model explains things as they are currently, with a
view to the future. There are other sources for the history of
character encoding. Sometimes, knowing the history of something
can help understanding. Also, the history of character encoding
is certainly interesting. But we do not believe that adding
historic references (and the one you suggest would be only a
start) would help the reader understand the content.

In addition, latest (we know of) version of the
standard would be ANSI X3.4:1886, which is entitled
... American NATIONAL standard code for information interchange.
We in general cite the latest version of a standard, and recommend
that other specifications do this, too, but in this case, this
would not really help explain the acronym.

Giving the expansions of all acronyms may be a worthwhile
style guideline in some circumstances, but we only do that
when the expansion actually contributes to the understanding
of our document. In addition, ASCII is not really a term
that should require explanation to the average reader of
our document who we assume has some basic familiarity
with computers.


 > As noted above, I believe that if you take your rationale for C151
 > seriously, it ought to compel a different decision on C150 (and,
 > indeed, the suppression of any mention of the Unicode Consortium).
 > This observation leads me to suspect that you do not, in fact, take it
 > very seriously.

We have said that as a general rule we prefer international standards
over national standards. This does not imply that we prefer ISO standards
over consortia standards. There are several reasons for preferring
international standards. The main reason is that the way to order
a standard is to contact your national standards body. Getting
a national standard from another standards body can take
significantly longer than getting an ISO standard, which is
usually available at each national standards body. In addition,
pricing may be different, national standards bodies may have
an easier job adapting the prices of ISO standards to the
cost of living in a country than doing the same for national
standards from another country.
You should note that we also mention ECMA-6 (technically identical
to ISO/IEC 646), because this allows to obtain the standard for free. 
Consortia are, after all, a good thing :-).


 > The only way I can interpret your response, and the note attached to
 > the bibliographic reference to ISO 646, is that the mention of the
 > American National Standards Institute has for reasons I do not
 > understand become taboo.

No, not at all. We just don't see such a mention relevant
to the content of our document, and we don't think we should
construct such a mention just to show that it is indeed not
a taboo.


 > One gets the impression that you believe it
 > somehow indelicate to refer, in a specification concerned with
 > internationalization, to a national standard, and embarrassing that
 > common usage should refer to a particular national standard, when it
 > ought, really, to refer to the corresponding international one.  I
 > think you should overcome your fear of indelicacy and break the taboo.

We have no such beliefs, and there is no such taboo, and we hope
that we have given sufficiently explanations above to believe us.


 > I don't object to your mentioning that ASCII is, formally (not, as far
 > as I can tell, historically) simply a national version of ISO 646.

It is not simply a national version. It was so for a long time,
but it is now the International Reference Version. We explicitly
mention that.


 > But I do think it a discourtesy to your readers to use the term
 > 'ASCII' without giving a coherent and historically accurate account of
 > the meaning and origin of the acronym.

We think that we would do a disservice to our readers to bother
them wit a full account of the history of ASCII when this is
not relevant to the contents of our document. If you could
explain how this additional material would help the readers
of our document to gain a better understanding of the contents
in this document, rather than just to satisfy an occasional
reader's curiosity, we might be able to reconsider our decision.


 > [I notice now that it would similarly be useful to provide references
 > to the two DIN standards which specify sorting of German for names and
 > for other applications, and for standards which specify the various
 > other behaviors described in the examples in 3.1.5, where such
 > standards exist. I should have mentioned that before, sorry.]

To the extent described in that section, these behaviors are not
so much defined by standards but are long established practice that
at some point in time got codified by standards.


Regards,   Martin.
Received on Thursday, 29 January 2004 18:26:07 UTC