- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sat, 8 Sep 2012 02:42:42 -0400
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
James Clark scripsit:
> I find the case for excluding non-characters pretty compelling.
Excellent.
> d) There's nothing from a Unicode perspective I know of that distinguishes
> the two members that XML 1.0 excludes from the other members of the class.
When XML 1.0 1e was written, U+FFFE and U+FFFF were the only non-character
codepoints, so it's just a historical question. There is no difference
any more.
> On the down side, it makes the spec a bit longer (although we could make up
> a notation to make it shorter).
The list is already in the Editor's Draft as "discouraged", so there is
no increase in length.
> The situation with control codes seem a bit murkier to me, particularly as
> regards #x85. We don't allow #xC (form-feed), although it's defined in
> Unicode, so why should be allow #x85? It seems to me that the more
> consistent policy would be only to allow control codes that we define to be
> white-space.
I agree. #x85 should have been discouraged in XML 1.0; the reason it
was not is that the list was pulled out of XML 1.1, where #x85 was a
newline character (as Unicode says it is). I'll add it to the banned
list, unless there are objections.
--
John Cowan cowan@ccil.org http://ccil.org/~cowan
If I have not seen as far as others, it is because giants were standing
on my shoulders.
--Hal Abelson
Received on Saturday, 8 September 2012 06:43:05 UTC