Re: Suggested revised text for HTML/XML report intro from Noah Mendelsohn on 2011-08-17 (public-html-xml@w3.org from August 2011)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Wed, 17 Aug 2011 10:58:20 -0400
To: John Cowan <cowan@mercury.ccil.org>
CC: David Carlisle <davidc@nag.co.uk>, "public-html-xml@w3.org" <public-html-xml@w3.org>
Message-ID: <4E4BD70C.6030503@arcanedomain.com>

On 8/16/2011 6:45 PM, John Cowan wrote:
> What would be the benefit of lenient XML parsing in the XHTML context?
> You might just as well use the HTML syntax.

I >think< what it allows you to do is to start migrating toward use of an 
XML-stack and perhaps use of some new media type (maybe 
application/xhtml+xml5?). First of all, you are more likely to be able to 
do a mechanical mapping of your old broken HTML, perhaps just be serving it 
as is under the new media type. You get the debugging "benefits" of not 
having your entire page fail to render just due to mismatched quotes on one 
attribute (and yes, there's a debugging cost to more lenient error checking).

Finally, if people choose to go that route, you have a defined mapping that 
allows all of this somewhat broken data to be managed by existing XML 
tools. For example, you might have an XML-aware database. Today, you can 
use it to manage XHTML, but not tag soup. Assuming you are aware of the 
risks, you can now move to a content management system where all your HTML 
is run on such an XML database. Of course, where the input is not well 
formed, some of the mappings may not what you expect, and there will be 
questions when re-serializing of whether you expect back the tag soup or 
the fixed up XML. Still, there is some value there I think, especially as a 
migration path toward unifying the stacks.

Noah

Received on Wednesday, 17 August 2011 14:58:52 UTC