Always reliable identification is a chimaera

This is a last call comment from C. M. Sperberg-McQueen (cmsmcq@acm.org) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: C. M. Sperberg-McQueen (cmsmcq@acm.org)
Submitted on behalf of (maybe empty): 
Comment type: substantive
Chapter/section the comment applies to: 3.6 Choice and Identification of Character Encodings
The comment will be visible to: public
Comment title: Always reliable identification is a chimaera
Comment:
The requirement "[S] Specifications MUST either specify a unique encoding, 
or provide character encoding identification mechanisms such that the 
encoding of text can always be reliably identified" is, I think, too
strong.  I do not believe that any identification mechanism can 
ALWAYS guarantee the correct identification of an encoding; if I am 
right, this requirement guarantees that no specification ever written
has ever conformed, and no specification will ever conform, to the
character model specification.  Malicious users, incompetent users,
ignorance or indifference on the part of those responsible for servers,
and transcoders which understandably do not touch the internal labels
on the data they transcode, can combine to defeat any labeling or
encoding-identification scheme ever devised.  Even the W3C server
has been known, from time to time, to serve documents with the wrong
character-encoding identification.  

Please weaken this requirement so that it is achievable, or else XML 
1.1 and every other spec now under development by the W3C will be 
blocked by this unrealistic counsel of perfection.  The identification 
mechanisms of XML 1.0 are pretty good, if I say so myself.  But they 
do not come close to meeting the requirement stated here.  I think 
you've set the bar too high.



Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="substantive">
  <originator email="cmsmcq@acm.org" represents="-"
      >C. M. Sperberg-McQueen</originator>
  <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Encodings'
    >3.6</charmod-section>
  <title>Always reliable identification is a chimaera</title>
  <description>
    <comment>
      <dated-link date="2002-07-12"
        >Always reliable identification is a chimaera</dated-link>
      <para>The requirement "[S] Specifications MUST either specify a unique encoding, 
or provide character encoding identification mechanisms such that the 
encoding of text can always be reliably identified" is, I think, too
strong.  I do not believe that any identification mechanism can 
ALWAYS guarantee the correct identification of an encoding; if I am 
right, this requirement guarantees that no specification ever written
has ever conformed, and no specification will ever conform, to the
character model specification.  Malicious users, incompetent users,
ignorance or indifference on the part of those responsible for servers,
and transcoders which understandably do not touch the internal labels
on the data they transcode, can combine to defeat any labeling or
encoding-identification scheme ever devised.  Even the W3C server
has been known, from time to time, to serve documents with the wrong
character-encoding identification.  

Please weaken this requirement so that it is achievable, or else XML 
1.1 and every other spec now under development by the W3C will be 
blocked by this unrealistic counsel of perfection.  The identification 
mechanisms of XML 1.0 are pretty good, if I say so myself.  But they 
do not come close to meeting the requirement stated here.  I think 
you've set the bar too high.
</para>
    </comment>
  </description>
</lc-comment>

Received on Thursday, 11 July 2002 22:06:28 UTC