Re: Suggested revised text for HTML/XML report intro from Noah Mendelsohn on 2011-08-16 (public-html-xml@w3.org from August 2011)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Tue, 16 Aug 2011 09:31:55 -0400
To: David Carlisle <davidc@nag.co.uk>
CC: "public-html-xml@w3.org" <public-html-xml@w3.org>
Message-ID: <4E4A714B.1050708@arcanedomain.com>

On 8/16/2011 7:31 AM, David Carlisle wrote:
> The view that html markup has errors which may be repaired (which was
> really the sgml/html4 view) doesn't really fit with the html(5) parsing
> model which is simply designed to parse anything and produce a document
> tree.

I disagree. The HTML specs make very clear the distinction between correct 
and erroneous content. It does provide standardized rules for processing 
both, and many user agents will indeed proceed more or less silently in the 
face of errors.  On the other hand, validators and other such tools 
presumably work to enforce correct content, and there may be purposes other 
than browsing (automated data extraction?) for which the right tradeoff may 
be to reject erroneous data after all.

To John Cowan's point: yes, there tends to be a generalized parsing layer 
used for XML, and yes, it usually needs to be the application that decides 
whether it's OK to proceed in the face of errors. I don't think that means 
that the XML5 direction is broken: one can easily imagine XML5 parsers that 
provide, in addition to a DOM/Infoset, information about where fixups have 
been done. If for some application accepting such fixups is the wrong thing 
to do, then that application can warn or fail accordingly.

Noah

Received on Tuesday, 16 August 2011 13:32:23 UTC