- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Wed, 3 Jul 2002 21:50 +0900
- To: www-i18n-comments@w3.org
- Cc: steven.pemberton@cwi.nl (Steven Pemberton)
This is a last call comment from Steven Pemberton (steven.pemberton@cwi.nl) on the Character Model for the World Wide Web 1.0 (http://www.w3.org/TR/2002/WD-charmod-20020430/). Semi-structured version of the comment: Submitted by: Steven Pemberton (steven.pemberton@cwi.nl) Submitted on behalf of (maybe empty): HTML WG Comment type: editorial Chapter/section the comment applies to: 3.3 Transcoding The comment will be visible to: public Comment title: Give example of transcoding Comment: It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS. Something along the lines of: "For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string "&0x0151;&0x0151;" where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at code point 0x0151 in the document as processed by the user agent." Note: "&0x0151;&0x0151;" should look like "o&0x0151;" with a double acute on the o; i.e. an actual character followed by a NCR. Feel free to substitute any similar character if you wish. Structured version of the comment: <lc-comment visibility="public" status="pending" decision="pending" impact="editorial"> <originator email="steven.pemberton@cwi.nl" represents="HTML WG" >Steven Pemberton</originator> <charmod-section href='http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Transcoding' >3.3</charmod-section> <title>Give example of transcoding</title> <description> <comment> <dated-link date="2002-07-03" >Give example of transcoding</dated-link> <para>It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS. Something along the lines of: "For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string "&0x0151;&0x0151;" where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at code point 0x0151 in the document as processed by the user agent." Note: "&0x0151;&0x0151;" should look like "o&0x0151;" with a double acute on the o; i.e. an actual character followed by a NCR. Feel free to substitute any similar character if you wish. </para> </comment> </description> </lc-comment>
Received on Wednesday, 3 July 2002 08:50:35 UTC