- From: Tolkin, Steve <Steve.Tolkin@FMR.COM>
- Date: Wed, 7 Feb 2001 12:54:00 -0500
- To: "'www-i18n-comments@w3.org'" <www-i18n-comments@w3.org>
- Cc: w3c-xml-query-wg@w3.org
Certain Unicode "characters" have the same hexadecimal value
as the ASCII control characters.
For the purposes of this email I use the term "control character"
to mean certain special code points in Unicode.
Examples of "control characters" are U+0000 to U+001F inclusive,
except U+0009, U+000A, and U+000D.
Please clarify the proper way to handle these, e.g. with respect to
string normalization.
Specifically, in Character Model for the World Wide Web 1.0
W3C Working Draft 26 January 2001
This version: http://www.w3.org/TR/2001/WD-charmod-20010126
Latest version: http://www.w3.org/TR/charmod/
section 3.5 states:
The specification MUST NOT arbitrarily restrict the range of characters
that can be used, which must cover all Unicode code points from 0 to
0x10FFFF inclusive.
In contrast
Extensible Markup Language (XML) 1.0 (Second Edition)
W3C Recommendation 6 October 2000
This version: http://www.w3.org/TR/2000/REC-xml-20001006
Latest version: http://www.w3.org/TR/REC-xml
section 2.2 states:
Consequently, XML processors must accept any character in the range
specified for Char. ...
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */
The above implies that a processor may reject a document containing
a control character. In fact some processors do that, raising an error
that the document is not well formed.
The character model should be clear about how the "control
characters" behave with respect to
string normalization. Must they be left alone?
May they be deleted by a conforming processsor?
Or should each one be replaced by a space, and further normalized?
Or perhaps the Character Model specification should explicitly
state that this decision is in the scope of the application.
Hopefully helpfully yours,
Steve
--
Steven Tolkin steve.tolkin@fmr.com 617-563-0516
Fidelity Investments 82 Devonshire St. V10D Boston MA 02109
There is nothing so practical as a good theory. Comments are by me,
not Fidelity Investments, its subsidiaries or affiliates.
Received on Wednesday, 7 February 2001 13:08:26 UTC