On Tue, Sep 11, 2012 at 4:28 AM, John Cowan <cowan@mercury.ccil.org> wrote:
>
> We can say "U+FFFE and U+FFFF are banned in documents because that's
> what XML says."
>
> Or we can say "Unicode non-characters are banned in documents."
I think these both correspond to reasonable positions.
The first position amounts to saying that we think XML made a mistake in
disallowing U+FFFE and U+FFFF; it should have allowed all code points
except surrogates. Therefore we shouldn't make the situation any worse by
increasing the number of disallowed code points. MicroXML disallows U+FFFE
and U+FFFF only because XML did.
The second position says that XML was right to disallow U+FFFE and U+FFFF.
Since XML was designed, Unicode has added characters that are in the same
category as U+FFFE and U+FFFF. We should now fix MicroXML so that it is
consistent with the current version of Unicode (and all future versions
because of the Unicode stability policy).
James