W3C home > Mailing lists > Public > www-i18n-comments@w3.org > July 2002

Give example of transcoding

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Wed, 3 Jul 2002 21:50 +0900
To: www-i18n-comments@w3.org
Cc: steven.pemberton@cwi.nl (Steven Pemberton)
Message-Id: <20020703125033.1FA16140D@toro.w3.mag.keio.ac.jp>

This is a last call comment from Steven Pemberton (steven.pemberton@cwi.nl) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: Steven Pemberton (steven.pemberton@cwi.nl)
Submitted on behalf of (maybe empty): HTML WG
Comment type: editorial
Chapter/section the comment applies to: 3.3 Transcoding
The comment will be visible to: public
Comment title: Give example of transcoding
Comment:
It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS.

Something along the lines of:

"For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string "&0x0151;&amp;0x0151;" where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at code point 0x0151 in the document as processed by the user agent."

Note: "&0x0151;&amp;0x0151;" should look like "o&0x0151;" with a double acute on the o; i.e. an actual character followed by a NCR. Feel free to substitute any similar character if you wish.



Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="editorial">
  <originator email="steven.pemberton@cwi.nl" represents="HTML WG"
      >Steven Pemberton</originator>
  <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Transcoding'
    >3.3</charmod-section>
  <title>Give example of transcoding</title>
  <description>
    <comment>
      <dated-link date="2002-07-03"
        >Give example of transcoding</dated-link>
      <para>It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS.

Something along the lines of:

"For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string "&0x0151;&amp;0x0151;" where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at code point 0x0151 in the document as processed by the user agent."

Note: "&0x0151;&amp;0x0151;" should look like "o&0x0151;" with a double acute on the o; i.e. an actual character followed by a NCR. Feel free to substitute any similar character if you wish.
</para>
    </comment>
  </description>
</lc-comment>
Received on Wednesday, 3 July 2002 08:50:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:32 GMT