- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sat, 8 Sep 2012 02:42:42 -0400
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
James Clark scripsit: > I find the case for excluding non-characters pretty compelling. Excellent. > d) There's nothing from a Unicode perspective I know of that distinguishes > the two members that XML 1.0 excludes from the other members of the class. When XML 1.0 1e was written, U+FFFE and U+FFFF were the only non-character codepoints, so it's just a historical question. There is no difference any more. > On the down side, it makes the spec a bit longer (although we could make up > a notation to make it shorter). The list is already in the Editor's Draft as "discouraged", so there is no increase in length. > The situation with control codes seem a bit murkier to me, particularly as > regards #x85. We don't allow #xC (form-feed), although it's defined in > Unicode, so why should be allow #x85? It seems to me that the more > consistent policy would be only to allow control codes that we define to be > white-space. I agree. #x85 should have been discouraged in XML 1.0; the reason it was not is that the list was pulled out of XML 1.1, where #x85 was a newline character (as Unicode says it is). I'll add it to the banned list, unless there are objections. -- John Cowan cowan@ccil.org http://ccil.org/~cowan If I have not seen as far as others, it is because giants were standing on my shoulders. --Hal Abelson
Received on Saturday, 8 September 2012 06:43:05 UTC