- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 29 Jan 2004 18:18:41 -0500
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
- Cc: www-i18n-comments@w3.org, w3c-i18n-ig@w3.org
Hello Michael, This is to notify you, in a recorded form, that we have reconsidered your arguments re. your comments http://www.w3.org/International/Group/2002/charmod-lc/#C150 and http://www.w3.org/International/Group/2002/charmod-lc/#C151. We have realized that in the mail leading to http://lists.w3.org/Archives/Public/www-i18n-comments/2003Feb/0000.html, we have not given a detailed justification for our decisions. We would like to explain our decisions and the reasons we made them, including the reasons that came up when we reconsidered this. We hope that this will help you to accept our decisions, and ask you to tell us whether you are satisfied with our decisions and explanations or not within a reasonable time (e.g. two weeks). Regarding http://www.w3.org/International/Group/2002/charmod-lc/#C150, you write: > Sec. 1.1 says, inter alia, 'In this document, Unicode is used as a > synonym for the Universal Character Set.' I believe the term 'UCS' > would be better, because it is clearer and less subject to misconstruction. We have used the term Unicode because it is much more widely used than UCS. A lot more people have heard about Unicode than about UCS. The tendency to use acronyms in programming and related subjects is usually seen as a problem more than as a features, and we think that where reasonably useful terms/words are available, these should be used. Overall, the use of Unicode makes our specification more readable without making it less precise. > It is clearer because the term 'Unicode' may reasonably be used to > denote (a) the consortium of that name, (b) the Univeral Character > Set defined by ISO/IEC 10646 and by the Unicode Standard, (c) the > UCS taken together with the additional rules defined by the Unicode > Standard, which Unicode does NOT share with ISO/IEC 10646, and (d) > the Unicode Standard itself. Despite the explicit statement that > in the character model spec the term 'Unicode' is used in sense (b), > I suspect the common use, elsewhere, of the term in senses (a), (d), > and especially (c), will necessarily color readers' perceptions > of the meaning of the text. We have taken special care to only use the term "Unicode" without qualifications in the meaning of (a). If you can find an instance where this is not the case, we would be glad to fix this. > The term 'UCS' is also less likely to convey to casual readers that > it is really the Unicode Standard, not ISO/IEC 10646, which counts. Well, in certain cases, it actually IS the Unicode Standard, rather than ISO/IEC 10646, that counts. > It is true, as you have pointed out from time to time, that the > Unicode Consortium and the responsible ISO/IEC technical committee > have worked well for some time now in keeping the two standards aligned. > I applaud that fact and the role some of you have individually played > in making it happen. But I remember too the years in which the two > organizations threatened to burden the world with two different and > incompatible universal character sets, and the roles some of you played > then, and I am unwilling that any W3C specification should risk conveying > the idea that if the two standards should diverge, the Web or the W3C > would naturally side with one or the other party. We have watched the convergence between the Unicode Standard and ISO/IEC 10646 over the years. It is not just that they have merged over 10 years ago, but after that, they continuously moved closer to each other. There are numerous examples of this, the last of which is the transfer of the ISO collation standard to SC2/WG2 to allow better coordination between definition of new characters and their (default) collation, as well as better coordination between the UTC and ISO. Sudden divergence between the Unicode Standard and ISO 10646 is in theory possible, but in practice absolutely unlikely. In case there would be even as much as just a hint of such a divergence, W3C would neither side with one side or the other, but would naturally put all its weight on both parties to convince them to work together. W3C would also do everything necessary and possible to mobilize other forces to exert their influence on the UTC and ISO to avoid divergence. But as we know, the members of the UTC and SC2 are very well aware of the need for continued convergence, and we think that it is very unlikely that they need to be reminded. > It would not be appropriate to use the term 'ISO/IEC 10646' (or just > '10646' for short) to refer to the UCS. It is also not appropriate > to use the term 'Unicode'. The terms 'ISO/IEC 10646' or '10646' indeed lend itself much less to use as words that 'Unicode'. In http://lists.w3.org/Archives/Public/www-i18n-comments/2003Feb/0000.html, you also say: > I regret to report that I find your rationale unconvincing. If, as > you say in the response to C151, references to international standards > should invariably be preferred to references to corresponding national > standards, then how much more strongly ought specifications defined by > private industry consortia to be deprecated in favor of international > standards which define exactly the same technical content. We did not say that there is a preference of ISO standards over consortia standards. Indeed, it would not necessarily be appropriate for us as as consortium to say that. > (And since > you say that the character model spec uses the term "Unicode" only to > refer to the UCS, the technical content must necessarily be exactly > the same. I am a little surprised, since I thought that any > discussion of Unicode normalization must necessarily go beyond the > definition of the UCS, precisely into material specified by Unicode > but not by ISO 10646, Yes, it does. But 'Unicode' and 'Unicode normalization' are not the same. They are of course related, both in wording and in meaning, but they are not the same. > and that use of the term 'Unicode' is > appropriate in that context. But if you say Unicode normalization is > part of the definition of the UCS, I am not in a position to prove > otherwise.) Regarding http://www.w3.org/International/Group/2002/charmod-lc/#C151, you write: > ANSI X3.4 is missing > The spec refers several times to ASCII. In the context of a > specification defining a character model, I assume that this term > is used in its proper and narrow sense to denote the coded > character set defined by American national standard ANSI X3.4. > That American national standard should be included among the > non-normative references. We answered with: > * Decision: Partially accepted. > * Decision: Cite ISO 646 (International Reference Version), rather > than ANSI X3.4, and link to it from the text. > * Rationale for "Partially accepted": Where a national and an > international standard define the same matter, use of the latter > is preferable. In http://lists.w3.org/Archives/Public/www-i18n-comments/2003Feb/0000.html, you then wrote: > This would be an acceptable response if you had also removed the > references to "ASCII characters" and the like from the text of the > specification and replaced them with references to "ISO 646 IRV" > characters. But the point of my comment was not solely that > references to specifications in the body of the text ought to be > accompanied by corresponding bibliographic information in the back > matter. The term 'ASCII' is an acronym for the name of a specific > standard, the American Standard Code for Information Interchange, > published under that and several other titles from time to time > beginning in 1962. If it is necessary to use the term 'ASCII', then > it seems to me that it would be a courtesy to your readers to explain > the acronym (this is a rule some authorities strongly recommend for > all acronyms), and a courtesy to those who developed the standard (as > well as to your readers) to provide a reference to the standard > itself. The character model explains things as they are currently, with a view to the future. There are other sources for the history of character encoding. Sometimes, knowing the history of something can help understanding. Also, the history of character encoding is certainly interesting. But we do not believe that adding historic references (and the one you suggest would be only a start) would help the reader understand the content. In addition, latest (we know of) version of the standard would be ANSI X3.4:1886, which is entitled ... American NATIONAL standard code for information interchange. We in general cite the latest version of a standard, and recommend that other specifications do this, too, but in this case, this would not really help explain the acronym. Giving the expansions of all acronyms may be a worthwhile style guideline in some circumstances, but we only do that when the expansion actually contributes to the understanding of our document. In addition, ASCII is not really a term that should require explanation to the average reader of our document who we assume has some basic familiarity with computers. > As noted above, I believe that if you take your rationale for C151 > seriously, it ought to compel a different decision on C150 (and, > indeed, the suppression of any mention of the Unicode Consortium). > This observation leads me to suspect that you do not, in fact, take it > very seriously. We have said that as a general rule we prefer international standards over national standards. This does not imply that we prefer ISO standards over consortia standards. There are several reasons for preferring international standards. The main reason is that the way to order a standard is to contact your national standards body. Getting a national standard from another standards body can take significantly longer than getting an ISO standard, which is usually available at each national standards body. In addition, pricing may be different, national standards bodies may have an easier job adapting the prices of ISO standards to the cost of living in a country than doing the same for national standards from another country. You should note that we also mention ECMA-6 (technically identical to ISO/IEC 646), because this allows to obtain the standard for free. Consortia are, after all, a good thing :-). > The only way I can interpret your response, and the note attached to > the bibliographic reference to ISO 646, is that the mention of the > American National Standards Institute has for reasons I do not > understand become taboo. No, not at all. We just don't see such a mention relevant to the content of our document, and we don't think we should construct such a mention just to show that it is indeed not a taboo. > One gets the impression that you believe it > somehow indelicate to refer, in a specification concerned with > internationalization, to a national standard, and embarrassing that > common usage should refer to a particular national standard, when it > ought, really, to refer to the corresponding international one. I > think you should overcome your fear of indelicacy and break the taboo. We have no such beliefs, and there is no such taboo, and we hope that we have given sufficiently explanations above to believe us. > I don't object to your mentioning that ASCII is, formally (not, as far > as I can tell, historically) simply a national version of ISO 646. It is not simply a national version. It was so for a long time, but it is now the International Reference Version. We explicitly mention that. > But I do think it a discourtesy to your readers to use the term > 'ASCII' without giving a coherent and historically accurate account of > the meaning and origin of the acronym. We think that we would do a disservice to our readers to bother them wit a full account of the history of ASCII when this is not relevant to the contents of our document. If you could explain how this additional material would help the readers of our document to gain a better understanding of the contents in this document, rather than just to satisfy an occasional reader's curiosity, we might be able to reconsider our decision. > [I notice now that it would similarly be useful to provide references > to the two DIN standards which specify sorting of German for names and > for other applications, and for standards which specify the various > other behaviors described in the examples in 3.1.5, where such > standards exist. I should have mentioned that before, sorry.] To the extent described in that section, these behaviors are not so much defined by standards but are long established practice that at some point in time got codified by standards. Regards, Martin.
Received on Thursday, 29 January 2004 18:26:07 UTC