Spanish 'ch' is not a letter sequence

This is a last call comment from C. M. Sperberg-McQueen (cmsmcq@acm.org) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: C. M. Sperberg-McQueen (cmsmcq@acm.org)
Submitted on behalf of (maybe empty): 
Comment type: substantive
Chapter/section the comment applies to: 3.1.5 Units of collation
The comment will be visible to: public
Comment title: Spanish 'ch' is not a letter sequence
Comment:
Section 3.1.5 says "EXAMPLE: In traditional Spanish sorting, the letter sequences 'ch' and 'll' are treated as atomic collation units. Although Spanish sorting, and to some extent Spanish everyday use, treat 'ch' as a single unit, current digital encodings treat it as two letters, and keyboards do the same (the user types 'c', then 'h')."

This is not what I learned in grade school.  Sra. Robles was quite
clear, and rather strict about it (and so of course I am sure that
she is right and your informants must be wrong).

I believe the paragraph would be more accurate and clearer if it 
read "EXAMPLE: In traditional Spanish sorting, the character sequences 'ch' and 'll' are treated as single letters and as atomic collation units. Although Spanish sorting, and to some extent Spanish everyday use, treat 'ch' as a single unit, current digital encodings treat it as two characters, and keyboards do the same (the user types 'c', then 'h')."

I don't know of any digital encoding whose specification provides any 
definition of "letter", and thus I find it surprising and confusing
to read that most such encodings treat "ch" as two letters:  I don't
believe that any character set specifications or encodings can 
meaningfully be said to treat ANYTHING as ANY number of "letters",
since "letter" is a concept foreign to their universe of discourse.



Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="substantive">
  <originator email="cmsmcq@acm.org" represents="-"
      >C. M. Sperberg-McQueen</originator>
  <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-CollationUnits'
    >3.1.5</charmod-section>
  <title>Spanish 'ch' is not a letter sequence</title>
  <description>
    <comment>
      <dated-link date="2002-07-12"
        >Spanish 'ch' is not a letter sequence</dated-link>
      <para>Section 3.1.5 says "EXAMPLE: In traditional Spanish sorting, the letter sequences 'ch' and 'll' are treated as atomic collation units. Although Spanish sorting, and to some extent Spanish everyday use, treat 'ch' as a single unit, current digital encodings treat it as two letters, and keyboards do the same (the user types 'c', then 'h')."

This is not what I learned in grade school.  Sra. Robles was quite
clear, and rather strict about it (and so of course I am sure that
she is right and your informants must be wrong).

I believe the paragraph would be more accurate and clearer if it 
read "EXAMPLE: In traditional Spanish sorting, the character sequences 'ch' and 'll' are treated as single letters and as atomic collation units. Although Spanish sorting, and to some extent Spanish everyday use, treat 'ch' as a single unit, current digital encodings treat it as two characters, and keyboards do the same (the user types 'c', then 'h')."

I don't know of any digital encoding whose specification provides any 
definition of "letter", and thus I find it surprising and confusing
to read that most such encodings treat "ch" as two letters:  I don't
believe that any character set specifications or encodings can 
meaningfully be said to treat ANYTHING as ANY number of "letters",
since "letter" is a concept foreign to their universe of discourse.
</para>
    </comment>
  </description>
</lc-comment>

Received on Thursday, 11 July 2002 21:19:43 UTC