- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Tue, 16 Aug 2011 09:31:55 -0400
- To: David Carlisle <davidc@nag.co.uk>
- CC: "public-html-xml@w3.org" <public-html-xml@w3.org>
On 8/16/2011 7:31 AM, David Carlisle wrote: > The view that html markup has errors which may be repaired (which was > really the sgml/html4 view) doesn't really fit with the html(5) parsing > model which is simply designed to parse anything and produce a document > tree. I disagree. The HTML specs make very clear the distinction between correct and erroneous content. It does provide standardized rules for processing both, and many user agents will indeed proceed more or less silently in the face of errors. On the other hand, validators and other such tools presumably work to enforce correct content, and there may be purposes other than browsing (automated data extraction?) for which the right tradeoff may be to reject erroneous data after all. To John Cowan's point: yes, there tends to be a generalized parsing layer used for XML, and yes, it usually needs to be the application that decides whether it's OK to proceed in the face of errors. I don't think that means that the XML5 direction is broken: one can easily imagine XML5 parsers that provide, in addition to a DOM/Infoset, information about where fixups have been done. If for some application accepting such fixups is the wrong thing to do, then that application can warn or fail accordingly. Noah
Received on Tuesday, 16 August 2011 13:32:23 UTC