RE: 12. Are C1 controls and Unicode non-characters disallowed?

Ignorant Question; 
What do prevailing (yes I know that not precise) XML parsers and processors do today when they encounter this range of discouraged Unicode characters ?
Secondly is it a goal that if
1) A microxml parser successfully parses a microxml document then a "off the shelf" XML parser should also successfully parse the same document (by "parse" here I mean not abort or generate fatal errors)

Conversely
2) An "off the shelf" XML parses a MicroXML document then all MicroXML parsers should also parse that document without failure.   I guess this one is self referential.
What I am getting at is 'what would the user see that is bad if we allowed these discouraged characters"
Its too late now, but I feel it was a mistake for XML to ban valid Unicode characters (like control chars) just because there was no practical definition of what they meant. ( say FF) ... For XML compatibility we cant undo that ... but do we need to go further and ban other unicode codepoints that are not explicitly causing parse errors in XML ?  Why ?



-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.


-----Original Message-----
From: John Cowan [mailto:cowan@ccil.org] On Behalf Of John Cowan
Sent: Monday, September 10, 2012 10:13 AM
To: Michael Kay
Cc: public-microxml@w3.org
Subject: Re: 12. Are C1 controls and Unicode non-characters disallowed?

Michael Kay scripsit:

> Problem is (a) many developers don't see it as a benefit to be told 
> that incoming content over which they have no control is unacceptable, 
> and (b) they shoot the messenger - that is, the piece of software that 
> gives them the bad news.

Fortunately, MicroXML doesn't have draconian error recovery.

-- 
Principles.  You can't say A is         John Cowan <cowan@ccil.org>
made of B or vice versa.  All mass      http://www.ccil.org/~cowan
is interaction.  --Richard Feynman

Received on Monday, 10 September 2012 18:10:15 UTC