Give example of transcoding

This is a last call comment from Steven Pemberton (steven.pemberton@cwi.nl) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: Steven Pemberton (steven.pemberton@cwi.nl)
Submitted on behalf of (maybe empty): HTML WG
Comment type: editorial
Chapter/section the comment applies to: 3.3 Transcoding
The comment will be visible to: public
Comment title: Give example of transcoding
Comment:
It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS.

Something along the lines of:

"For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string "&0x0151;&0x0151;" where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at code point 0x0151 in the document as processed by the user agent."

Note: "&0x0151;&0x0151;" should look like "o&0x0151;" with a double acute on the o; i.e. an actual character followed by a NCR. Feel free to substitute any similar character if you wish.



Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="editorial">
  <originator email="steven.pemberton@cwi.nl" represents="HTML WG"
      >Steven Pemberton</originator>
  <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Transcoding'
    >3.3</charmod-section>
  <title>Give example of transcoding</title>
  <description>
    <comment>
      <dated-link date="2002-07-03"
        >Give example of transcoding</dated-link>
      <para>It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS.

Something along the lines of:

"For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string "&0x0151;&amp;0x0151;" where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at code point 0x0151 in the document as processed by the user agent."

Note: "&0x0151;&amp;0x0151;" should look like "o&0x0151;" with a double acute on the o; i.e. an actual character followed by a NCR. Feel free to substitute any similar character if you wish.
</para>
    </comment>
  </description>
</lc-comment>

Received on Wednesday, 3 July 2002 08:50:35 UTC