Re: Possible changes for XML 2nd Edition from Steve Schafer on 2000-05-24 (xml-editor@w3.org from April to June 2000)

From: Steve Schafer <pandeng@telepath.com>
Date: Wed, 24 May 2000 16:01:40 -0500
To: xml-editor@w3.org, "xml-dev@xml.org" <xml-dev@xml.org>
Message-ID: <qggoiss58harki7jvkdb545fofchp8hpvi@4ax.com>

On Wed, 24 May 2000 14:51:43 -0400, you wrote:

>Currently the XML Recommendation is silent about the handling of
>documents that contain "impossible" bytes.  For example, the byte 0xFF
>cannot appear in any UTF-8 encoded document.  We are considering making
>such violations of the encoding a fatal error.

The way I handle it now is to try to determine the encoding using the
Appendix F heuristics (or the explicit encoding declaration, if it
exists), and then switch to a stream that understands that encoding
and spits out Unicode characters. If that stream subsequently
encounters such an "impossible" byte, then it throws an exception. The
parser per se never gets to see it.

>CON: Some parsers may be relying on libraries supplied by the OS, which may
>not properly signal erroneous input.  Is it too great a burden on the
>parser implementor to impose this restriction?

I don't see this as a significant issue. If I find that a library
function that I'm using is silently swallowing errors of this
magnitude, then I'm going to dump the library function and use
something else. 

-Steve Schafer

Received on Wednesday, 24 May 2000 17:00:20 UTC