Re: 12. Are C1 controls and Unicode non-characters disallowed?

James Clark scripsit:

> I find the case for excluding non-characters pretty compelling. 

Excellent.

> d) There's nothing from a Unicode perspective I know of that distinguishes
> the two members that XML 1.0 excludes from the other members of the class.

When XML 1.0 1e was written, U+FFFE and U+FFFF were the only non-character
codepoints, so it's just a historical question.  There is no difference
any more.

> On the down side, it makes the spec a bit longer (although we could make up
> a notation to make it shorter).

The list is already in the Editor's Draft as "discouraged", so there is
no increase in length.

> The situation with control codes seem a bit murkier to me, particularly as
> regards #x85. We don't allow #xC (form-feed), although it's defined in
> Unicode, so why should be allow #x85?  It seems to me that the more
> consistent policy would be only to allow control codes that we define to be
> white-space.

I agree.  #x85 should have been discouraged in XML 1.0; the reason it
was not is that the list was pulled out of XML 1.1, where #x85 was a
newline character (as Unicode says it is).  I'll add it to the banned
list, unless there are objections.

-- 
John Cowan  cowan@ccil.org  http://ccil.org/~cowan
If I have not seen as far as others, it is because giants were standing
on my shoulders.
        --Hal Abelson

Received on Saturday, 8 September 2012 06:43:05 UTC