Re: Suggested revised text for HTML/XML report intro from John Cowan on 2011-08-17 (public-html-xml@w3.org from August 2011)

From: John Cowan <cowan@mercury.ccil.org>
Date: Wed, 17 Aug 2011 12:31:48 -0400
To: Noah Mendelsohn <nrm@arcanedomain.com>
Cc: David Carlisle <davidc@nag.co.uk>, "public-html-xml@w3.org" <public-html-xml@w3.org>
Message-ID: <20110817163148.GA26492@mercury.ccil.org>

Noah Mendelsohn scripsit:

> I >think< what it allows you to do is to start migrating toward use of an 
> XML-stack and perhaps use of some new media type (maybe  
> application/xhtml+xml5?). First of all, you are more likely to be able to 
> do a mechanical mapping of your old broken HTML, perhaps just be serving 
> it as is under the new media type. You get the debugging "benefits" of 
> not having your entire page fail to render just due to mismatched quotes 
> on one attribute (and yes, there's a debugging cost to more lenient error 
> checking).

I agree that these are Good Things.  But surely they are achieved more
easily with an HTML5 to XML converter?  Given the existing HTML5 parsers,
it's easy to walk the resulting DOM and output XML directly.

> Finally, if people choose to go that route, you have a defined mapping 
> that allows all of this somewhat broken data to be managed by existing 
> XML tools. For example, you might have an XML-aware database. Today, you 
> can use it to manage XHTML, but not tag soup. Assuming you are aware of 
> the risks, you can now move to a content management system where all your 
> HTML is run on such an XML database. Of course, where the input is not 
> well formed, some of the mappings may not what you expect, and there will 
> be questions when re-serializing of whether you expect back the tag soup 
> or the fixed up XML. Still, there is some value there I think, especially 
> as a migration path toward unifying the stacks.

Again, I agree that these are Good Things, but I think they too are best
managed without adding still another loose parsing mode.

-- 
Real FORTRAN programmers can program FORTRAN    John Cowan
in any language.  --Ed Post                     cowan@ccil.org

Received on Wednesday, 17 August 2011 16:32:27 UTC