- From: Rick Jelliffe <ricko@topologi.com>
- Date: Wed, 16 Apr 2003 01:11:44 +1000
- To: <www-tag@w3.org>
Tim Bray wrote | > I think we're kind of stuck with the C1 chars based on them having been > allowed in XML 1.0. I think this is not what Tim meant to write, but nevertheless: 1) The C1 characters in Unicode 3.1 are not the same characters as in Unicode 2. Now that characters have been allocated to those code points, the choice on whether to *adopt* them comes down to the architectural issue of which layer XML belongs to: -- is it textual (something capable of being text/*, telnet, etc) or is is text (something that needs to be bin64 encoded, for example, when sent over protocols expecting textual data)? -- is imaginary error detection acceptable for mission-critical data? 2) It was entirely legitimate for XML processors to strip out the controls before they even reached the parser, at the transcoding stage, because they belong to protocols not text. As I mentioned, MSXML 4 did this with encoding 8859-1, because the controls are not defined as part of 8859-1 (i.e. it is perfectly fine, and indeed good, if a transcoder strips them from incoming 8859-1 text). Even if XML 1.1 says it allows control characters, it cannot guarantee or specify that transcoders should not strip them (from byte-based encodings) because that is the business of the encoding and the text protocol's details. Cheers Rick Jelliffe
Received on Tuesday, 15 April 2003 11:07:51 UTC