- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Wed, 18 Nov 2009 14:16:29 +0100
- To: Liam Quin <liam@w3.org>
- Cc: public-html@w3.org, public-xml-core-wg@w3.org
Liam Quin wrote: > On Tue, Nov 17, 2009 at 08:26:16PM +0100, Lachlan Hunt wrote: >> Liam Quin wrote: >>> To amplify a little... the XML Spec says (in essence) >>> that software that takes something (anything at all) >>> that is not well-formed XML, can turn it into XML, but, >>> if it does, it must not claim that the original input >>> was XML. >> >> If that is really the case, then that is a problem because of the lack >> of defined error recovery behaviour. > > No, not at all. The standard XML behaviour is that if it's got > well-formedness errors in it, it's not XML. It's a fatal error > to try and process such "document" as XML. > > But that doesn't mean you can't fix the error. This seems to be turning into a circular argument. The issue is not about whether or not they could fix the error, but rather *how* to fix the error. I've been trying to figure out where exactly the disagreement between us lies, but I think we can all agree on the following: 1. There are applications that have the need and/or desire to implement non-draconian error recovery for documents created with the intention of being XML, but for whatever reason are not well-formed. 2. In order to achieve interoperability among such applications, it is necessary to have a specification that clearly defines how to parse documents intended to be XML and recover from any fatal errors. 3. The XML 1.0 specification only defines the format of a well-formed XML document. Anything else is left undefined, and the spec takes no position on how to process documents that are not well-formed, beyond requiring that the error be reported to the application and giving a vague requirement about not continuing normal processing. I think the source of disagreement comes from a much deeper philosophical difference here between the approaches taken by XML and HTML. The approach taken by the XML specifications is to define what constitutes a well-formed document, while leaving the question of what the data is if, during parsing, it turns out to not be well-formed, undefined — it is simply not XML. From a document format and conformance perspective alone, I can understand the logic behind this. However, this doesn't make as much sense from an implementation perspective where there is a need to process in some way, any input that is passed with the presumption of it being XML, to an XML parser. This differs from the approach taken by HTML5 which simply makes a distinction between conforming and non-conforming HTML documents, while still accepting that non-conforming documents are, for all intents and purposes, HTML. This table roughly illustrates the difference: Intended Resource Type | No Errors | Syntax Errors =======================+=================+====================== HTML | Conforming HTML | Non-conforming HTML -----------------------+-----------------+---------------------- XML | Well-formed XML | Undefined It seems that those people supporting the XML philosophy consider it more of a feature that XML leaves non-well-formed data undefined, whereas others, including myself, consider it to be a flaw in the design of the XML specification, which the XML5 proposal is attempting to rectify. The current XML5 proposal focusses entirely on the parsing issue, leaving the definition of what's considered to be a conforming, well-formed XML document to XML 1.0. So, in this sense, it is fully compatible with XML 1.0, and any conforming XML 1.0 parser will also be a conforming XML5 parser, as the algorithm allows for either aborting or applying the defined recovery procedure upon encountering a fatal error. However, there have also been some suggestions to extend the list of pre-defined entity references to all of those defined in HTML5 (which includes the XHTML and MathML sets). If this were done, then conforming XML 1.0 parsers would need to be updated to recognise these entities in order to become conforming XML5 parsers. -- Lachlan Hunt - Opera Software http://lachy.id.au/ http://www.opera.com/
Received on Wednesday, 18 November 2009 13:17:10 UTC