- From: Paul Grosso <pgrosso@arbortext.com>
- Date: Wed, 09 Apr 2003 08:51:16 -0500
- To: <www-tag@w3.org>
At 20:26 2003 04 09 +1000, Rick Jelliffe wrote: >For encoding error-detection, XML 1.1 takes one small step backwards >(by opening up the characters used in names) but then takes a very large >step forwards (by not allowing most C1 control characters directly). >(The C1 controls are roughly U+0080-U+009F: reserving these is enough >to detect many common encoding errors, in particular mislabelling >character sets --such as Big 5 or Win 1252 "ANSI"-- as ISO 8859-1.) The XML Core WG has not resolved this open issue yet, so I for one wouldn't mind understanding this better. The current text in the XML 1.1 CR disallows the C1 control characters directly in well-formed XML (instead, they must be escaped using numeric character references). This is the only thing in XML 1.1 that prevents certain potential (if rare) well-formed XML 1.0 documents from being turned into well-formed XML 1.1 documents by merely changing the version number in the XML declaration. I am unclear on the benefits of this. In exchange for making some well-formed XML 1.0 documents no longer well-formed XML 1.1, what exactly are we getting? I gather the answer is greater "encoding error detection," that is, the ability to reject yet more documents. I'm not yet sure what I think of this, and the XML Core WG has members on both sides of this issue. If someone could make a clear cost/benefit argument here, it might help some of us on the fence. paul [speaking for myself, not the XML Core WG]
Received on Wednesday, 9 April 2003 09:56:34 UTC