Re: 12. Are C1 controls and Unicode non-characters disallowed? from Mike Sokolov on 2012-09-08 (public-microxml@w3.org from September 2012)

From: Mike Sokolov <sokolov@falutin.net>
Date: Sat, 08 Sep 2012 10:55:42 -0400
To: James Clark <jjc@jclark.com>
Cc: John Cowan <cowan@mercury.ccil.org>, public-microxml@w3.org
Message-id: <504B5C6E.2060405@falutin.net>

On 9/8/2012 1:14 AM, James Clark wrote:
>
> I find the case for excluding non-characters pretty compelling. I 
> would state it like this:
Just for the sake of completeness, would you mind explaining what's 
compelling about it?  My initial reaction was: if we don't *need* to 
restrict the code-point set, why would we?  Is it a benefit in that tool 
chains will catch invalid characters further upstream than they might 
otherwise? I understand Unicode bans these code points, but if someone 
puts them in a file and then processes them as uXML, where's the harm?  
Is there some difficulty encoding these as UTF-8 or other Unicode encoding?

-Mike

Received on Saturday, 8 September 2012 14:56:19 UTC