W3C home > Mailing lists > Public > xml-editor@w3.org > April to June 2000

Re: Possible changes for XML 2nd Edition

From: Rick JELLIFFE <ricko@geotempo.com>
Date: Thu, 25 May 2000 04:04:41 +0800
Message-ID: <392C35D9.7961AFFA@geotempo.com>
To: xml-editor@w3.org, "xml-dev@xml.org" <xml-dev@xml.org>
John Cowan wrote:
 
> Issue PE28:
> 
> Currently the XML Recommendation is silent about the handling of
> documents that contain "impossible" bytes.  For example, the byte 0xFF
> cannot appear in any UTF-8 encoded document.  We are considering making
> such violations of the encoding a fatal error.
> 
> PRO: an improperly encoded document is not really a text document at all;
> nothing should be done on the basis of it.  XML's draconian error handling rule
> should lead to a "fatal error", which means the rest of the document must
> not be parsed.
> 
> CON: Some parsers may be relying on libraries supplied by the OS, which may
> not properly signal erroneous input.  Is it too great a burden on the
> parser implementor to impose this restriction?
 
I think this goes too far, for basic WF.

Instead, I would propose another level of validity "character validity"
which XML processors should be encouraged, but not required, to support,
or to support as much as they can. Unlike validity, which sits on top
of well-formedness, "character validity" sits more-or-less underneath
well-formedness as XML's soft underbelly.

An XML document that was "character valid" would
 1) not have any impossible bytes in any entity
 2) not have a BOM if the encoding="utf16le" or "utf16be" (and any other
encoding constraints)
 3) all names in markup must follow the NAMECHAR conventions.
 4) all data Unicode-normalized

This would keep a basic XML implementation that did not support
"character
validity" simple:
 1) it can use any library for transcoding
 2) it does not have to have any special BOM handling for utf16xe
 3) it can tokenize tags based on whitespace and delimiters rather than
NAMECHAR or NAMESTRT
 4) normalization not checked/enforced

A character-validating processor should be the goal for any XML
processor
not specifically aimed at ultra-lightweight uses.


Rick Jelliffe
Received on Wednesday, 24 May 2000 15:56:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:30 GMT