- From: Tolkin, Steve <Steve.Tolkin@FMR.COM>
- Date: Wed, 7 Feb 2001 12:54:00 -0500
- To: "'www-i18n-comments@w3.org'" <www-i18n-comments@w3.org>
- Cc: w3c-xml-query-wg@w3.org
Certain Unicode "characters" have the same hexadecimal value as the ASCII control characters. For the purposes of this email I use the term "control character" to mean certain special code points in Unicode. Examples of "control characters" are U+0000 to U+001F inclusive, except U+0009, U+000A, and U+000D. Please clarify the proper way to handle these, e.g. with respect to string normalization. Specifically, in Character Model for the World Wide Web 1.0 W3C Working Draft 26 January 2001 This version: http://www.w3.org/TR/2001/WD-charmod-20010126 Latest version: http://www.w3.org/TR/charmod/ section 3.5 states: The specification MUST NOT arbitrarily restrict the range of characters that can be used, which must cover all Unicode code points from 0 to 0x10FFFF inclusive. In contrast Extensible Markup Language (XML) 1.0 (Second Edition) W3C Recommendation 6 October 2000 This version: http://www.w3.org/TR/2000/REC-xml-20001006 Latest version: http://www.w3.org/TR/REC-xml section 2.2 states: Consequently, XML processors must accept any character in the range specified for Char. ... Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ The above implies that a processor may reject a document containing a control character. In fact some processors do that, raising an error that the document is not well formed. The character model should be clear about how the "control characters" behave with respect to string normalization. Must they be left alone? May they be deleted by a conforming processsor? Or should each one be replaced by a space, and further normalized? Or perhaps the Character Model specification should explicitly state that this decision is in the scope of the application. Hopefully helpfully yours, Steve -- Steven Tolkin steve.tolkin@fmr.com 617-563-0516 Fidelity Investments 82 Devonshire St. V10D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates.
Received on Wednesday, 7 February 2001 13:08:26 UTC