- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 24 Sep 2009 04:34:42 +0200
- To: ht@inf.ed.ac.uk (Henry S. Thompson)
- Cc: public-html-comments@w3.org
* Henry S. Thompson wrote: >I don't think I have a problem with that, I can imagine an argument >that it's broken (although http://www.ltg.ed.ac.uk/~ht/char_alias.xml >is _not_ broken per the XML specification. . .), but I can't find >anywhere in the HTML5 spec. which says so. Does it/should it? It is not broken per the XML specification by the same reasoning that a PNG image is not broken per the XML specification. Procedurally for both cases the XML processor determines some character encoding and attempts to decode the document, and then encounters byte sequences that do not have a well-defined meaning according to the encoding's specification. It is therefore not possible to restore the textual data the binary data represents, and the XML specification only defines conformance for pro- cessors and textual data objects. Consider that the XML specification does not normatively define exactly how to determine the character encoding (and I am ignoring that you've used text/xml as media type for the document which has other theoretical considerations rarely met in practise), so you can easily define a new character encoding very-bogus-encoding as "Any sequence of bytes stands for the text <?xml version='1.0' encoding='very-bogus-encoding'?><x/>" and your document would be perfectly conforming if the processor does indeed support that encoding. Cases like this do in fact exist in the real world, for example, with UTF-32 encoded documents the processor may not support UTF-32 and may instead detect UTF-16 or UTF-8 and encounter illegal byte sequences or disallowed characters. The only difference is in perception as UTF-32 is widely recognized while very-bogus-encoding is not. It is ultimately entirely irrelevant whether your document is broken per the XML specification as it is as far as common sense goes broken per the US-ASCII specification. You might just as well have your web server send out malformed TCP datagrams or a malformed HTTP response and muse how that is or is not broken per unrelated specifications. Similarily is very-bogus-encoding irrelevant because it violates what is considered common sense. http://xkcd.com/468/ comes to mind. (The XML specification actually considers your case a fatal error and those are errors which in turn are violations of the constraints of the specification, I've argued unsuccessfully against that in the past as having specification violations dependant on processor capabilities is a violation of common sense.) -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 24 September 2009 02:35:23 UTC