RE: 12. Are C1 controls and Unicode non-characters disallowed?

How does adding to the list of characters a parser can handle simplify the language ?
To my reading that makes the spec, and the language more complex (it has higher information count because it takes more rules to define what not to do ... those restrictions could be simply removed e.g simplified, without without complication ...    )

-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

From: Uche Ogbuji [mailto:uche@ogbuji.net]
Sent: Monday, September 10, 2012 12:36 PM
To: public-microxml@w3.org
Subject: Re: 12. Are C1 controls and Unicode non-characters disallowed?

On Mon, Sep 10, 2012 at 12:09 PM, David Lee <David.Lee@marklogic.com<mailto:David.Lee@marklogic.com>> wrote:
2) An "off the shelf" XML parses a MicroXML document then all MicroXML parsers should also parse that document without failure.   I guess this one is self referential.
What I am getting at is 'what would the user see that is bad if we allowed these discouraged characters"
Its too late now, but I feel it was a mistake for XML to ban valid Unicode characters (like control chars) just because there was no practical definition of what they meant. ( say FF) ... For XML compatibility we cant undo that ... but do we need to go further and ban other unicode codepoints that are not explicitly causing parse errors in XML ?  Why ?

I think James has several times restated the reasons why.  In short, it provides a clarity that is missing in XML ("do not use non-characters").  I think doing so very much simplifies the language, which is the entire point of MicroXML.


--
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji

Received on Monday, 10 September 2012 21:24:53 UTC