Always reliable identification is a chimaera from C. M. Sperberg-McQueen on 2002-07-12 (www-i18n-comments@w3.org from July 2002)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Fri, 12 Jul 2002 11:06 +0900
To: www-i18n-comments@w3.org
Cc: cmsmcq@acm.org (C. M. Sperberg-McQueen)
Message-Id: <20020712020626.85C19286@toro.w3.mag.keio.ac.jp>

This is a last call comment from C. M. Sperberg-McQueen (cmsmcq@acm.org) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: C. M. Sperberg-McQueen (cmsmcq@acm.org)
Submitted on behalf of (maybe empty):
Comment type: substantive
Chapter/section the comment applies to: 3.6 Choice and Identification of Character Encodings
The comment will be visible to: public
Comment title: Always reliable identification is a chimaera
Comment:
The requirement "[S] Specifications MUST either specify a unique encoding,
or provide character encoding identification mechanisms such that the
encoding of text can always be reliably identified" is, I think, too
strong. I do not believe that any identification mechanism can
ALWAYS guarantee the correct identification of an encoding; if I am
right, this requirement guarantees that no specification ever written
has ever conformed, and no specification will ever conform, to the
character model specification. Malicious users, incompetent users,
ignorance or indifference on the part of those responsible for servers,
and transcoders which understandably do not touch the internal labels
on the data they transcode, can combine to defeat any labeling or
encoding-identification scheme ever devised. Even the W3C server
has been known, from time to time, to serve documents with the wrong
character-encoding identification.

Please weaken this requirement so that it is achievable, or else XML
1.1 and every other spec now under development by the W3C will be
blocked by this unrealistic counsel of perfection. The identification
mechanisms of XML 1.0 are pretty good, if I say so myself. But they
do not come close to meeting the requirement stated here. I think
you've set the bar too high.

Structured version of the comment:

<lc-comment
visibility="public" status="pending"
decision="pending" impact="substantive">
<originator email="cmsmcq@acm.org" represents="-"
>C. M. Sperberg-McQueen</originator>
<charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Encodings'
>3.6</charmod-section>
<title>Always reliable identification is a chimaera</title>
<description>
<comment>
<dated-link date="2002-07-12"
>Always reliable identification is a chimaera</dated-link>
<para>The requirement "[S] Specifications MUST either specify a unique encoding,
or provide character encoding identification mechanisms such that the
encoding of text can always be reliably identified" is, I think, too
strong. I do not believe that any identification mechanism can
ALWAYS guarantee the correct identification of an encoding; if I am
right, this requirement guarantees that no specification ever written
has ever conformed, and no specification will ever conform, to the
character model specification. Malicious users, incompetent users,
ignorance or indifference on the part of those responsible for servers,
and transcoders which understandably do not touch the internal labels
on the data they transcode, can combine to defeat any labeling or
encoding-identification scheme ever devised. Even the W3C server
has been known, from time to time, to serve documents with the wrong
character-encoding identification.

Received on Thursday, 11 July 2002 22:06:28 UTC