- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 23 Apr 2001 00:43:30 +0200
- To: Liam Quinn <liam@htmlhelp.com>
- Cc: www-validator@w3.org, Terje Bless <link@tss.no>
* Liam Quinn wrote: >The WDG HTML Validator labels US-ASCII documents as ISO-8859-1 when >passing off to lq-nsgmls, and so it considers that example document valid. >And it is valid: > > "An XML document is valid if it has an associated document type > declaration and if the document complies with the constraints expressed > in it." [1] >The 8-bit character is an error, but it's an error in a similar way to >including <a href="foo bar"> in an HTML document. I don't agree here. XML 1.0 reads: "It is a fatal error [2] if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains octet sequences that are not legal in that encoding. It is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16."[1] I'd say, documents that have fatal errors can neither be well-formed nor valid, but that's not in the spec, instead it states "An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way)." [2] Anyway, un-decodeable documents (and documents with illegal octet sequences are un-decodeable) cannot be parsed properly, so they cannot be checked for validity or well-formedness. A validator must report such a fatal error and optionally refuse further processing, IMO. Btw. this is, as I'm sure you know, worse for HTML documents. XML documents can be encoded in UTF-8 or UTF-16 without declaring it, HTML can't, you must always declare the used encoding, since the user agent must not assume any default character encoding. [1] http://www.w3.org/TR/REC-xml#NT-EncodingDecl [2] http://www.w3.org/TR/REC-xml#dt-fatal -- Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Sunday, 22 April 2001 18:42:16 UTC