Heuristics considered useful

This is a last call comment from C. M. Sperberg-McQueen (cmsmcq@acm.org) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: C. M. Sperberg-McQueen (cmsmcq@acm.org)
Submitted on behalf of (maybe empty): XML Schema WG
Comment type: substantive
Chapter/section the comment applies to: 3.6.2 Character encoding identification
The comment will be visible to: public
Comment title: Heuristics considered useful
Comment:
The rule "[S] Specifications MUST NOT propose the use of heuristics to 
determine the encoding of data" appears to mean that the XML 1.0 
Recommendation does not conform to the character model spec, since
in its Appendix F (http://www.w3.org/TR/REC-xml#sec-guessing) it proposes 
the use of heuristics for recognizing the character encoding being
used well enough to bootstrap and read the XML declaration and any
encoding declaration included within it.  If this is the intent, we
believe this rule should be scaled back to something more like what was
in the first last-call spec: Specifications MUST NOT require or encourage 
the use of unreliable heuristics."  If this is not the intent, we 
believe the rule needs to be rewritten to be clearer.

Either way, it would be useful to define "heuristics"; without a
definition, it's hard to know exactly what constraint is intended to
be expressed by this rule.

N.B. this comment is similar but not identical to C158
(http://www.w3.org/International/Group/2002/charmod-lc/Overview.html#C158
-- which should be linked to 3.6.2 not 3.6).


Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="substantive">
  <originator email="cmsmcq@acm.org" represents="XML Schema WG"
      >C. M. Sperberg-McQueen</originator>
  <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-EncodingIdent'
    >3.6.2</charmod-section>
  <title>Heuristics considered useful</title>
  <description>
    <comment>
      <dated-link date="2002-07-12"
        >Heuristics considered useful</dated-link>
      <para>The rule "[S] Specifications MUST NOT propose the use of heuristics to 
determine the encoding of data" appears to mean that the XML 1.0 
Recommendation does not conform to the character model spec, since
in its Appendix F (http://www.w3.org/TR/REC-xml#sec-guessing) it proposes 
the use of heuristics for recognizing the character encoding being
used well enough to bootstrap and read the XML declaration and any
encoding declaration included within it.  If this is the intent, we
believe this rule should be scaled back to something more like what was
in the first last-call spec: Specifications MUST NOT require or encourage 
the use of unreliable heuristics."  If this is not the intent, we 
believe the rule needs to be rewritten to be clearer.

Either way, it would be useful to define "heuristics"; without a
definition, it's hard to know exactly what constraint is intended to
be expressed by this rule.

N.B. this comment is similar but not identical to C158
(http://www.w3.org/International/Group/2002/charmod-lc/Overview.html#C158
-- which should be linked to 3.6.2 not 3.6).</para>
    </comment>
  </description>
</lc-comment>

Received on Friday, 12 July 2002 18:05:31 UTC