- From: Doug Schepers <schepers@w3.org>
- Date: Tue, 14 Jul 2009 03:16:07 -0400
- To: "public-html@w3.org" <public-html@w3.org>
Hi, HTML WG- There are advantages and disadvantages to both the strict ("draconian") and error-correcting parsing of markup. HTML evolved to have loose parsing with undefined and browser-specific error correction, and XML was designed and well-defined to have strict parsing (probably as a reaction to the chaotic HTML approach). We have come full circle on the matter, and the HTML5 spec marries many of the advantages of both approaches, by offering a well-defined error-correction model. This has the advantage that it is sometimes easier to author (though it can make debugging more difficult), the more profound advantage that it hides problems from the reader, and the even more important advantage that it is more or less how browsers already parse HTML documents. However, it cannot gracefully address all the situations in which strict parsing is an advantage: * For authoring, it is often useful to know when you have validity or well-formedness errors, which helps debug script and CSS, and doing this on the fly in the browser is faster and easier while developing than reiterative validation with a separate tool; * Strict markup works predictably for mashups and mixtures of different markup languages; * Draconian error handling enforces structure and content models for mission-critical applications, such as the canonical "financial transactions" example, where the reader *wants* to know about problems in the markup [1], and for use cases that are low-tolerance for potential errors (such as the government and some industries). To meet this need, I propose a new attribute, 'parsing', which, when placed on the document root, defines the type of parsing which a UA must use when parsing the document. The values would be "loose" and "strict", with loose parsing as the default (an omitted @parsing attribute would result in loose parsing). When the parsing is loose, the error-correction algorithms defined in HTML5 must be applied; when the parsing is strict, there must be no error-correction (as is commonly the case for XHTML in most browsers). This way, authors could optionally enforce strictness when they want or need to, and then change/remove the value when they are ready for publication, or when the needs change. It is possible that there would be instances where strict parsing makes it out of development and into production code, but this would have relatively few negative consequences (the kind of author who uses this would probably product strict code anyway, and would know it if they didn't), and would be easily corrected. And, quite frankly, some people simply prefer stricter parsing for aesthetic or whatever, and this would provide them with that option while not imposing it on others. Had this option been available in XML from the beginning, many problems and community schisms may have been avoided. I believe that presenting the option for strict parsing may change how the various communities approach HTML5, and avoid further schisms. I see this as having relatively low costs for the specification, and very little implementation cost, since browsers will already have both modes (even IE has a built-in XML parser, though it doesn't use it for XHTML). Please correct me if my assumption here is wrong. I also believe that this is backwards-compatible, since the default will be loose parsing as is already applied, and forwards-compatible, since any alternate future parsing models (such as the proposed XML2 or XML5, or some use case we don't see today) can be specified as the value for @parsing in a later specification without changing how it would be used as defined in HTML5. It may lay the groundwork for a new formulation of error-correcting XML, as Anne proposed. I'm hoping that the dust has sufficiently settled about the parsing debate that we can hold a logical discussion of this proposal on its merits. (Meta: I chose the keywords of the attribute and values for brevity, and I'm not at all married to them; treat them as placeholders for the purposes of discussing this proposal; another option might be something like @error-correction="true | false". Please don't suggest different names quite yet unless they represent a functional difference to this proposal. Also, I've BCC'ed the TAG just so they know.) [1] http://www.tbray.org/ongoing/When/200x/2004/01/11/PostelPilgrim Regards- -Doug Schepers W3C Team Contact, SVG and WebApps WGs
Received on Tuesday, 14 July 2009 07:17:20 UTC