RE: 12. Are C1 controls Unicode non-characters disallowed?

But no one ever successfully made any bad XML files :)


-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

From: Uche Ogbuji [mailto:uche@ogbuji.net]
Sent: Monday, September 10, 2012 6:44 PM
To: public-microxml@w3.org
Subject: Re: 12. Are C1 controls Unicode non-characters disallowed?

On Mon, Sep 10, 2012 at 7:39 PM, Michael Sokolov <sokolov@falutin.net<mailto:sokolov@falutin.net>> wrote:
Yes - some kind of recovery process would be a boon; +1 for allowing parsers to replace these disallowed codepoints with the special Unicode character reserved to mean "unknown or unrepresentable character": FFFD.

Yes. MicroXML's policy on error handling states that the parser must report that the document is not a MicroXML document, but having done so, it is free to recover as it pleases.

David's point and yours here is a good one, and it's the lesson I think most have learned the hard way from XML's experiment with draconian error handling.  I think most would agree that experiment failed (just as Postel's Law predicted it would ;) )


--
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji

Received on Tuesday, 11 September 2012 01:56:41 UTC