Re: 12. Are C1 controls and Unicode non-characters disallowed?

On Mon, September 10, 2012 2:49 am, James Clark wrote:
...
> So my question for Tony would be: what is the difference between
>
> - 0xFFFE - 0xFFFF, and
> - the other 64 noncharacters
>
> that justifies forbidding the former but not the latter?

Nothing.  If I were using the other non-characters instead and had stated
that I would find it personally inconvenient if they were eventually
disallowed by the tools that I wanted to use, then you could ask the same
question the other way around just as easily.

> You could argue that the right approach for noncharacters is to recommend
> against their use for interchange rather than forbid them, but given that
> XML 1.0 has forbidden U+FFFE-U+FFFF, it seems to me that the cleanest
> approach is to forbid all noncharacters.

Without arguing for or against the inclusion of non-characters, I don't
understand the motivation for forbidding them.  If the goal is radical
simplicity, then it would be simpler to allow the whole slew of
characters.  If the goal is to "complement rather than replace XML, JSON
and HTML" [1] then if one of the three disallows them (I don't know about
JSON), they should be forbidden.

I don't know whether this has been discussed, but while the current draft
specifies UTF-8 only, but another way to simplify the character processing
(post-parser) would be to also specify Normalization Form C [2][3], which
would mean there would be only one way in MicroXML documents to represent
particular characters.

Regards,


Tony Graham                                   tgraham@mentea.net
Consultant                                 http://www.mentea.net
Mentea       13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
 --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
    XML, XSL-FO and XSLT consulting, training and programming

[1] http://www.w3.org/community/microxml/wiki/Editor%27s_Draft
[2] http://www.unicode.org/reports/tr15/#Norm_Forms
[3] http://www.w3.org/TR/charmod-norm/#sec-ChoiceNFC

Received on Wednesday, 12 September 2012 13:17:07 UTC