- From: Steve Schafer <pandeng@telepath.com>
- Date: Wed, 24 May 2000 16:01:40 -0500
- To: xml-editor@w3.org, "xml-dev@xml.org" <xml-dev@xml.org>
On Wed, 24 May 2000 14:51:43 -0400, you wrote: >Currently the XML Recommendation is silent about the handling of >documents that contain "impossible" bytes. For example, the byte 0xFF >cannot appear in any UTF-8 encoded document. We are considering making >such violations of the encoding a fatal error. The way I handle it now is to try to determine the encoding using the Appendix F heuristics (or the explicit encoding declaration, if it exists), and then switch to a stream that understands that encoding and spits out Unicode characters. If that stream subsequently encounters such an "impossible" byte, then it throws an exception. The parser per se never gets to see it. >CON: Some parsers may be relying on libraries supplied by the OS, which may >not properly signal erroneous input. Is it too great a burden on the >parser implementor to impose this restriction? I don't see this as a significant issue. If I find that a library function that I'm using is silently swallowing errors of this magnitude, then I'm going to dump the library function and use something else. -Steve Schafer
Received on Wednesday, 24 May 2000 17:00:20 UTC