- From: Karl Dubost <karl@w3.org>
- Date: Tue, 30 Mar 2004 18:55:30 -0500
- To: www-i18n-comments@w3.org
- Message-Id: <B994E5EA-82A5-11D8-9C2B-000A95718F82@w3.org>
Dear I18N WG, This is a review of Character Model for the World Wide Web 1.0: Fundamentals W3C Working Draft 25 February 2004 http://www.w3.org/TR/2004/WD-charmod-20040225 First of all, and I want to make it very important. **************************************** This document is very enjoyable and very instructive. Thank you very much. A must to read, even if you don't use it for implementation purpose. **************************************** Basically some of my previous review are still valid, but I will be more explicit and so you will be able to fill an issue for each of them. KD-XXX, where XXX is a number. * KD-001 C001 [S] [I] [C] Specifications, software and content MUST NOT assume that there is a one-to-one correspondence between characters and the sounds of a language. ===> How do you test that for each implementations [S][I][C]? What will be the three tests that you will be able to create to demonstrate the implementability of this during the CR period where you will seek for implementation? If you can't design a test for it, it means that your assertion is not testable, therefore not implementable. I think one of the problems comes from the "assume". Imagine a language where you have "a one-to-one correspondence between characters and the sounds of a language". If the software implements only this language because it's a specific use for only this language. It means that it's not conformant to C001, even if this software does the correct thing. * KD-002 C002 [S] [I] [C] Specifications, software and content MUST NOT assume a one-to-one mapping between characters and units of displayed text. ===> Same comment than KD-001 How do you test that for each implementations [S][I][C]? What will be the three tests that you will be able to create to demonstrate the implementability of this during the CR period where you will seek for implementation? If you can't design a test for it, it means that your assertion is not testable, therefore not implementable. I think one of the problems comes from the "assume". Imagine a language where you have "a one-to-one mapping between characters and units of displayed text". If the software implements only this language because it's a specific use for only this language. It means that it's not conformant to C002, even if this software does the correct thing. *KD-003 C005 [S] [I] Specifications and software MUST NOT assume that a single keystroke results in a single character, nor that a single character can be input with a single keystroke (even with modifiers), nor that keyboards are the same all over the world. ===> Same comment than KD-001 How do you test that for each implementations [S][I]? What will be the two tests that you will be able to create to demonstrate the implementability of this during the CR period where you will seek for implementation? If you can't design a test for it, it means that your assertion is not testable, therefore not implementable. I think one of the problems comes from the "assume". Imagine a language where you have "a single keystroke results in a single character". If the software implements only this language because it's a specific use for only this language. It means that it's not conformant to C005, even if this software does the correct thing. Could the following solve your problem? "C005 Specifications and software MUST authorize complex input methods where there is single keystroke doesn't result in a single character... ... " *KD-004 C008 [S] [I] Specifications and implementations of sorting and searching algorithms SHOULD accommodate all characters in Unicode. ===> What's happening if you implement all western languages but not asian because the context of applications do not make it necessary. Do I still have to implement everything? If not how can I be conformant? *KD-005 C009 [S] [I] [C] Specifications, software and content MUST NOT assume a one-to-one relationship between characters and units of physical storage. ===> Same comment than KD-001. Make it testable. *KD-006 C067 [S] Specifications SHOULD avoid the use of the term 'character' if a more specific term is available. ===> Not testable. avoid is like assume, there's a notion of intention, of vague choice. You could say: "Specifications SHOULD use specific terms, when it's available, instead of the general term 'character'." *KD-007 C018 [S] When a unique character encoding is mandated, the character encoding MUST be UTF-8, UTF-16 or UTF-32. C019 [S] If a unique character encoding is mandated and compatibility with US-ASCII is desired, UTF-8 (see [RFC 3629]) is RECOMMENDED. In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one of these include efficiency of internal processing and interoperability with other processes. ===> Please separate the part about APIs. Basically, jump a line ;) The clue for now is just visual which means, it's not anymore visible nor accessible without colors. It can lead to misunderstanding. *KD-008 C027 [S] Specifications MAY define either UTF-8 or UTF-16 as a default encoding form (or both if they define suitable means of distinguishing them), but they MUST NOT use any other character encoding as a default. ===> Double assertions make difficult to understand and analyse what is the exact conformance clause. Try to wrap up in one or separate it. *KD-009 032 [I] Receiving software MAY recognize as many character encodings and as many charset names and aliases for them as appropriate. ===> Jump a line. AND it's not testable. That's a good recommendation but you can't really test it. It encourages people to support as much as possible but it's not a requirement or you have to define clearly and without ambiguities appropriate. *KD-010 C033 [I] Software MUST completely implement the mechanisms for character encoding identification and SHOULD implement them in such a way that they are easy to use (for instance in HTTP servers). ===> same comment than KD-008. Double assertions. *KD-011 C069 [C] Content SHOULD NOT misuse character technology for pictures or graphics. ===> I perfectly understand the rationale behind this comment, but it might lead to a strictness which for example might block someone who will use character technology for an artistic project. Though not that it's fondamental anywhere. But I'm not sure, it achieves something. Could you give more examples with this requirement, why it's bad, how does it lead to problem, etc? For example, does that mean you forbid all possibilities of ascii arts.... or even smileys :)))) For example in your own specification you are using [S], then the characters "[" and "]". Is it a valid usage of this character in american english language or is a graphical abuse? to make it like a button. Where elsewhere you are using it for marking a reference to a document. Do you mean in fact: <span class="requirement-type"> <img src="specificationbutton" alt="Specification"> </span> or <abbr class="requirement-type" title="Specification">S.</abbr> *KD-012 C012 [S] The 'character string' definition of a string is generally the most useful and SHOULD be used by most specifications, following the examples of Production [2] of XML 1.0 [XML 1.0], the SGML declaration of HTML 4.0 [HTML 4.01], and the character model of RFC 2070 [RFC 2070]. ===> you may want to rephrase that sentence as: "The 'character string' definition of a string SHOULD be used by most specifications..." *KD-013 C062 [S] Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications SHOULD include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646. By providing a reference to the Unicode Standard implementers can benefit from the wealth of information provided in the standard and on the Unicode Consortium Web site. ===> Jump a line *KD-014 C064 [S] All generic references to the Unicode Standard [Unicode] MUST refer to the latest version of the Unicode Standard available at the date of publication of the containing specification. ===> Will it block some republication. Imagine you republished a specification for erratas and fixing typos. But you are referring to an old version of Unicode. Do you have to modify the specification to make it conformant to Charmod? Which means that it can lead to a complete remodeling of a spec where you have things which could be strongly dependant on that references. (Just trying here to get the rabbits out of the bush) *KD-015 "MUST NOT assume" is a bad terminology. You are often using this term to explain to software developers and specifications writers that if they are creating a *generic international* application, they have to be careful. The problem is that it makes it NOT testable at all. You have to find a way to turn your requirements that will make them testable. A software can sometimes be a piece of code which is a Library that will implement perfectly the support for ONE language, without respecting what you are saying in this document. You may want to precise also at the begining of your document, that this specification is made for people implementing and developing things for a multilingual context and use. It will avoid to have to precise at the start of each sentence. "When you implement a international [S][I][C], blabla MUST..." I precise that which seems obvious but which is in fact not clear in your introduction. OR If I'm a developer of an application, a library which deals with only one language: - Should I care about this spec? - if yes, can I be conformant? (exemple: Do I have to care about chinese input method if I'm creating a spell checker library for an english scrabble game? How do I answer to C005?) You did it for example in C006 by adding "for the relevant language and/or application." * KD-016 There's a need for a glossary where you will define the terms. Maybe you could expand the terminology section and use the specific markup for it. A benefit of that is that the W3C glossary will be enriched and make it easier to have a controlled vocabulary of terms used at W3C. -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager *** Be Strict To Be Cool ***
Received on Tuesday, 30 March 2004 19:54:54 UTC