Maintaining robust encoding-error detection in XML

I have put up a page (draft attached) at 
  http://www.topologi.com/public/XML_Naming_Rules.html
which looks in detail at how XML 1.0's naming rules provide robust error detection in many important cases.  I believe this robustness is an essential and integral part of XML which must be maintained.

The purpose of the page is to see whether it is possible to derive simple, Unicode-version-independent rules for allowed characters (in names and rules) that maintain or enhance this robustness. 

I believe this is the first time anyone has tried to analyse this issue or to formulate rules based on rational considerations. Indeed, there has been no need until now because the strict naming rules of XML 1.0 took care of this issue to a great extent.

I believe these rules offer significant benefit over the rules currently considered in the XML 1.0 WD, and they accord with the WG's desire to be version-independent of Unicode for well-formedness.

I believe that in order to maintain the current robustness while moving name-checking to being some kind of validation issue boils down to the following rules:

1) NUL cannot be allowed in documents.
2) C1 cannot be allowed in documents (except NEL)
3) U+00A0 to U+00BF, U+00D7, U+00F7 cannot be allowed in names

I ask the WG to put this robustness issue on the issue list. I expect that there will be other input on this issue from the I18n Working Group as well.


Rick Jelliffe

Received on Monday, 25 February 2002 03:22:25 UTC