- From: Markus Scherer <duerst@w3.org>
- Date: Fri, 27 Feb 2004 18:20:40 -0500
- To: www-i18n-comments@w3.org
This is a last call comment from Markus Scherer (markus.scherer@jtcsv.com) on the Character Model for the World Wide Web 1.0 (http://www.w3.org/TR/2002/WD-charmod-20020430/). Semi-structured version of the comment: Submitted by: Markus Scherer (markus.scherer@jtcsv.com) Submitted on behalf of (maybe empty): Comment type: substantive Chapter/section the comment applies to: Overall The comment will be visible to: public Comment title: charmod vs. UTF-16/32 Comment: Comments on charmod: - The names UTF-16 and UTF-32 are each used for an encoding form and an encoding scheme. charmod should mention this, and mention that the encoding scheme versions use Byte Order Marks (BOMs) while the encoding forms don't. - It should be explicitly permissible to recognize that a document uses the UTF-16 encoding scheme by its BOM, if it is present. This is common practice for HTML and XML and has proven valuable because these encoding schemes are not compatible with ASCII byte streams. - There are BOM-like signature byte sequences for other Unicode encodings as well, such as UTF-32 and SCSU. Justification as before; UTF-8 is not always the most desirable encoding. - charmod C051/C052 prefers code point indexing (called "character string indexing"). This will lead to inefficiencies because most implementations will use UTF-16 strings. It would be better to recommend UTF-16 code unit indexing. (See UTN #12 http://www.unicode.org/notes/tn12/) Best regards, markus Structured version of the comment: <lc-comment visibility="public" status="pending" decision="pending" impact="substantive" id="LC-"> <originator email="markus.scherer@jtcsv.com" >Markus Scherer</originator> <represents email="" >-</represents> <charmod-section href='http://www.w3.org/TR/2004/WD-charmod-20040225/' >Overall</charmod-section> <title>charmod vs. UTF-16/32</title> <description> <comment> <dated-link date="2004-02-27" href="http://www.w3.org/mid/791355868.20040227220245@toro.w3.mag.k href="http://www.w3.org/mid/791355868.20040227220245@toro.w3.mag.keio.ac.jp" >charmod vs. UTF-16/32</dated-link> <para>Comments on charmod: - The names UTF-16 and UTF-32 are each used for an encoding form and an encoding scheme. charmod should mention this, and mention that the encoding scheme versions use Byte Order Marks (BOMs) while the encoding forms don't. - It should be explicitly permissible to recognize that a document uses the UTF-16 encoding scheme by its BOM, if it is present. This is common practice for HTML and XML and has proven valuable because these encoding schemes are not compatible with ASCII byte streams. - There are BOM-like signature byte sequences for other Unicode encodings as well, such as UTF-32 and SCSU. Justification as before; UTF-8 is not always the most desirable encoding. - charmod C051/C052 prefers code point indexing (called "character string indexing"). This will lead to inefficiencies because most implementations will use UTF-16 strings. It would be better to recommend UTF-16 code unit indexing. (See UTN #12 http://www.unicode.org/notes/tn12/) Best regards, markus </para> </comment> </description> </lc-comment>
Received on Saturday, 28 February 2004 06:07:16 UTC