- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Sat, 13 Jul 2002 02:50 +0900
- To: www-i18n-comments@w3.org
- Cc: cmsmcq@acm.org (C. M. Sperberg-McQueen)
This is a last call comment from C. M. Sperberg-McQueen (cmsmcq@acm.org) on the Character Model for the World Wide Web 1.0 (http://www.w3.org/TR/2002/WD-charmod-20020430/). Semi-structured version of the comment: Submitted by: C. M. Sperberg-McQueen (cmsmcq@acm.org) Submitted on behalf of (maybe empty): XML Schema WG Comment type: substantive Chapter/section the comment applies to: 3.6 Choice and Identification of Character Encodings The comment will be visible to: public Comment title: Reliability of character encoding identification Comment: Section 3.6 specifies that "[S] Specifications MUST either specify a unique encoding, or provide character encoding identification mechanisms such that the encoding of text can always be reliably identified." The XML Schema WG believes that this requirement, as formulated, is not met by any existing specifications and is unlikely ever to be met by any. Document producers, software implementors, and server administrators, working alone or in concert, have innumerable opportunities to render character-set labels false out of malice, ignorance, or indifference; if character-set labels are false, the encoding of the text can only rarely be reliably identified. The word "always" seems to suggest that encoding identification mechanisms must function even in the case of hostile users or misconfigured servers; that's not possible. Either the i18n WG should lower its expectations or it should express its expectations more clearly. We believe a more correct standard would be to require that specifications provide mechanisms to ensure that it is POSSIBLE to get things right, or to ensure that with correct operation / under normal circumstances character encodings are reliably and correctly identified. N.B. This comment is substantially similar to comment C157 (http://www.w3.org/International/Group/2002/charmod-lc/Overview.html#C157) and to comment 3.13 of our comments on the previous last-call draft (http://www.w3.org/XML/Group/2002/03/charmodel.annotated.html#ab1b3b3c17c14). Structured version of the comment: <lc-comment visibility="public" status="pending" decision="pending" impact="substantive"> <originator email="cmsmcq@acm.org" represents="XML Schema WG" >C. M. Sperberg-McQueen</originator> <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Encodings' >3.6</charmod-section> <title>Reliability of character encoding identification</title> <description> <comment> <dated-link date="2002-07-12" >Reliability of character encoding identification</dated-link> <para>Section 3.6 specifies that "[S] Specifications MUST either specify a unique encoding, or provide character encoding identification mechanisms such that the encoding of text can always be reliably identified." The XML Schema WG believes that this requirement, as formulated, is not met by any existing specifications and is unlikely ever to be met by any. Document producers, software implementors, and server administrators, working alone or in concert, have innumerable opportunities to render character-set labels false out of malice, ignorance, or indifference; if character-set labels are false, the encoding of the text can only rarely be reliably identified. The word "always" seems to suggest that encoding identification mechanisms must function even in the case of hostile users or misconfigured servers; that's not possible. Either the i18n WG should lower its expectations or it should express its expectations more clearly. We believe a more correct standard would be to require that specifications provide mechanisms to ensure that it is POSSIBLE to get things right, or to ensure that with correct operation / under normal circumstances character encodings are reliably and correctly identified. N.B. This comment is substantially similar to comment C157 (http://www.w3.org/International/Group/2002/charmod-lc/Overview.html#C157) and to comment 3.13 of our comments on the previous last-call draft (http://www.w3.org/XML/Group/2002/03/charmodel.annotated.html#ab1b3b3c17c14). </para> </comment> </description> </lc-comment>
Received on Friday, 12 July 2002 13:50:53 UTC