- From: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Wed, 10 Sep 2008 15:05:23 +0100 (BST)
- To: xml-editor@w3.org
- Cc: John Boyer <boyerj@ca.ibm.com>
(I'm replying to a rather old message here, it's http://lists.w3.org/Archives/Public/xml-editor/2008JanMar/0004.html) > Some have commented that they believed the sentence "XML processors MUST > accept the UTF-8 and UTF-16 encodings of Unicode 3.1" meant that encodings > for characters not in Unicode 3.1 were not allowed. I don't think this sentence is about what characters are allowed. It's about what encodings must be supported. It originally (in 1st edition) said "the UTF-8 and UTF-16 encodings of 10646", where "of 10646" was I think just to fully name them - at that time many people were unfamiliar with Unicode and wouldn't have known that they were encodings of the 10646/Unicode character set. "10646" was changed to "Unicode 3.1" in third edition; again I don't think this was intended to imply anything about the allowed characters, but was simply an updated reference. The allowed characters are specified in production 2 immediately above, and are the same in 5th edition as they were in 1st. All Unicode code points except for most C0 controls, surrogates, and FFFE/FFFF are allowed. That is, all non-C0 potential Unicode characters have always been allowed, and we have not and are not changing that. The right change would be to remove the phrase "of Unicode X.Y" and if desired replace it with references to the definitions of those encodings. -- Richard -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Received on Wednesday, 10 September 2008 14:09:27 UTC