[QA Review] CharMod for the Web 1.0: Fundamentals WD 25 Feb 2004 from Karl Dubost on 2004-03-30 (www-i18n-comments@w3.org from March 2004)

From: Karl Dubost <karl@w3.org>
Date: Tue, 30 Mar 2004 18:55:30 -0500
To: www-i18n-comments@w3.org
Message-Id: <B994E5EA-82A5-11D8-9C2B-000A95718F82@w3.org>
Dear I18N WG,

This is a review of

Character Model for the World Wide Web 1.0: Fundamentals
W3C Working Draft 25 February 2004
http://www.w3.org/TR/2004/WD-charmod-20040225

First of all, and I want to make it very important.

	****************************************
	This document is very enjoyable and very
	instructive. Thank you very much. A must
	to read, even if you don't use it for
	implementation purpose.
	****************************************


Basically some of my previous review are still valid, but I will be 
more explicit and so you will be able to fill an issue for each of 
them. KD-XXX, where XXX is a number.

* KD-001
C001   [S]   [I]   [C]   Specifications,  software and content MUST NOT 
assume that there is a one-to-one  correspondence between characters 
and the sounds of a  language.

===> How do you test that for each implementations [S][I][C]? What will 
be the three tests that you will be able to create to demonstrate the 
implementability of this during the CR period where you will seek for 
implementation? If you can't design a test for it, it means that your 
assertion is not testable, therefore not implementable. I think one of 
the problems comes from the "assume".
	Imagine a language where you have "a one-to-one correspondence between 
characters and the sounds of a  language". If the software implements 
only this language because it's a specific use for only this language. 
It means that it's not conformant to C001, even if this software does 
the correct thing.


* KD-002
C002   [S]   [I]   [C]   Specifications,  software and content MUST NOT 
assume a one-to-one mapping between  characters and units of displayed 
text.

===> Same comment than KD-001
How do you test that for each implementations [S][I][C]? What will be 
the three tests that you will be able to create to demonstrate the 
implementability of this during the CR period where you will seek for 
implementation? If you can't design a test for it, it means that your 
assertion is not testable, therefore not implementable. I think one of 
the problems comes from the "assume".
	Imagine a language where you have "a one-to-one mapping between  
characters and units of displayed text". If the software implements 
only this language because it's a specific use for only this language. 
It means that it's not conformant to C002, even if this software does 
the correct thing.

*KD-003
C005   [S]   [I]   Specifications  and software MUST NOT assume that a 
single keystroke results  in a single character, nor that a single 
character can be input with a single keystroke (even with modifiers), 
nor that keyboards are the same all over the  world.

===> Same comment than KD-001
How do you test that for each implementations [S][I]? What will be the 
two tests that you will be able to create to demonstrate the 
implementability of this during the CR period where you will seek for 
implementation? If you can't design a test for it, it means that your 
assertion is not testable, therefore not implementable. I think one of 
the problems comes from the "assume".
	Imagine a language where you have "a single keystroke results  in a 
single character". If the software implements only this language 
because it's a specific use for only this language. It means that it's 
not conformant to C005, even if this software does the correct thing.

Could the following solve your problem?
	"C005 Specifications and software MUST authorize complex input methods 
where there is  single keystroke doesn't result in a single 
character...   ... "

*KD-004
C008   [S]   [I]   Specifications and implementations of sorting and 
searching algorithms SHOULD accommodate all characters in Unicode.

===> What's happening if you implement all western languages but not 
asian because the context of applications do not make it necessary. Do 
I still have to implement everything? If not how can I be conformant?


*KD-005
C009   [S]   [I]   [C]   Specifications,  software and content MUST NOT 
assume a one-to-one relationship  between characters and units of 
physical storage.

===> Same comment than KD-001. Make it testable.


*KD-006
C067   [S]   Specifications SHOULD  avoid the use of the term 
'character' if a more specific term is  available.

===> Not testable. avoid is like assume, there's a notion of intention, 
of vague choice. You could say:
	"Specifications SHOULD use specific terms, when it's available, 
instead of the general term 'character'."

*KD-007
C018   [S]   When a unique character encoding is  mandated, the 
character encoding MUST be UTF-8, UTF-16 or  UTF-32.   C019   [S]   If 
a unique  character encoding is mandated and compatibility with 
US-ASCII is desired, UTF-8 (see  [RFC 3629]) is RECOMMENDED.  In  other 
situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. 
  Possible reasons for choosing one of these include efficiency of 
internal  processing and interoperability with other processes.

===> Please separate the part about APIs. Basically, jump a line ;) The 
clue for now is just visual which means, it's not anymore visible nor 
accessible without colors. It can lead to misunderstanding.


*KD-008
C027   [S]   Specifications MAY  define either UTF-8 or UTF-16 as a 
default encoding form (or both if they  define suitable means of 
distinguishing them), but they MUST  NOT use any other character 
encoding as a default.

===> Double assertions make difficult to understand and analyse what is 
the exact conformance clause. Try to wrap up in one or separate it.


*KD-009
032   [I]   Receiving software  MAY recognize as many character 
encodings and as many charset names and aliases for them as  
appropriate.

===> Jump a line. AND it's not testable. That's a good recommendation 
but you can't really test it. It encourages people to support as much 
as possible but it's not a requirement or you have to define clearly 
and without ambiguities appropriate.


*KD-010
C033   [I]   Software  MUST completely implement the mechanisms for 
character  encoding identification and SHOULD implement them in such a  
way that they are easy to use (for instance in HTTP  servers).

===> same comment than KD-008. Double assertions.


*KD-011
C069 [C] Content  SHOULD NOT misuse character technology for pictures 
or graphics.

===> I perfectly understand the rationale behind this comment, but it 
might lead to a strictness which for example might block someone who 
will use character technology for an artistic project. Though not that 
it's fondamental anywhere. But I'm not sure, it achieves something. 
Could you give more examples with this requirement, why it's bad, how 
does it lead to problem, etc?
For example, does that mean you forbid all possibilities of ascii 
arts.... or even smileys :))))
	For example in your own specification you are using [S], then the 
characters "[" and "]". Is it a valid usage of this character in 
american english language or is a graphical abuse? to make it like a 
button. Where elsewhere you are using it for marking a reference to a 
document. Do you mean in fact:

<span class="requirement-type">
	<img src="specificationbutton" alt="Specification">
</span>  

or

<abbr class="requirement-type" title="Specification">S.</abbr>  


*KD-012
C012   [S]   The 'character  string' definition of a string is 
generally the most useful and  SHOULD be used by most specifications, 
following the examples of Production [2] of XML 1.0 [XML 1.0], the SGML 
  declaration of HTML 4.0 [HTML 4.01], and the character model of RFC  
2070 [RFC 2070].

===> you may want to rephrase that sentence as:
	"The 'character string' definition of a string SHOULD be used by most 
specifications..."


*KD-013
C062   [S]   Since specifications in general  need both a definition 
for their characters and the semantics associated with  these 
characters, specifications SHOULD include a reference  to the Unicode 
Standard, whether or not they include a  reference to ISO/IEC 10646.  
By providing a reference to the Unicode Standard implementers can 
benefit from the wealth of information  provided in the standard and on 
the Unicode Consortium Web site.

===> Jump a line


*KD-014
C064   [S]   All generic  references to the Unicode Standard [Unicode]  
MUST refer to the latest version of the Unicode Standard available at 
the date of publication of the containing specification.

===> Will it block some republication. Imagine you republished a 
specification for erratas and fixing typos. But you are referring to an 
old version of Unicode. Do you have to modify the specification to make 
it conformant to Charmod? Which means that it can lead to a complete 
remodeling of a spec where you have things which could be strongly 
dependant on that references. (Just trying here to get the rabbits out 
of the bush)


*KD-015
"MUST NOT assume" is a bad terminology. You are often using this term 
to explain to software developers and specifications writers that if 
they are creating a *generic international* application, they have to 
be careful. The problem is that it makes it NOT testable at all. You 
have to find a way to turn your requirements that will make them 
testable. A software can sometimes be a piece of code which is a 
Library that will implement perfectly the support for ONE language, 
without respecting what you are saying in this document.

You may want to precise also at the begining of your document, that 
this specification is made for people implementing and developing 
things for a multilingual context and use. It will avoid to have to 
precise at the start of each sentence. "When you implement a 
international [S][I][C], blabla MUST..." I precise that which seems 
obvious but which is in fact not clear in your introduction. OR If I'm 
a developer of an application, a library which deals with only one 
language:
	- Should I care about this spec?
	- if yes, can I be conformant?
	(exemple: Do I have to care about chinese input method if I'm creating 
a spell checker library for an english scrabble game? How do I answer 
to C005?)

	You did it for example in C006 by adding "for the relevant  language 
and/or application."



* KD-016
There's a need for a glossary where you will define the terms. Maybe 
you could expand the terminology section and use the specific markup 
for it. A benefit of that is that the W3C glossary will be enriched and 
make it easier to have a controlled vocabulary of terms used at W3C.

-- 
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager
*** Be Strict To Be Cool ***
Received on Tuesday, 30 March 2004 19:54:54 UTC