Re: XML Calabash 0.9.36 released from Zearin on 2011-10-05 (xproc-dev@w3.org from October 2011)

From: Zearin <zearin@gonk.net>
Date: Wed, 5 Oct 2011 11:05:45 -0400
To: Norman Walsh <ndw@nwalsh.com>
Cc: XProc Dev <xproc-dev@w3.org>
Message-Id: <F8394870-5194-4F56-A155-1320EC03739B@gonk.net>

On Oct 5, 2011, at 10:42 AM, Norman Walsh wrote:
>> On Oct 5, 2011, at 9:59 AM, Norman Walsh wrote:
>>> This should produce valid HTML5-as-XML.
>> 
>> …Impossible…
>> 
>> …Can it be?!?
> 
> It probably isn't what you think it is. That's an *input* option, not
> an *output* option.
> 
> It means if you scrape random sequences of characters off the
> floor^H^H^H^H^Hweb, you'll get an HTML5 DOM for them after you run
> p:unescape-markup.
> 
> *Producing* HTML5 output is going to be harder and I'm kinda sorta
> mostly waiting for an "html5" output method to magically appear in
> Saxon.

What I want more than anything is something to convert HTML5 to XHTML5.  (+100 points if there’s an option to convert it to polyglot XHTML5!)

Back in the day, htmltidy could convert uncivilized HTML into XHTML.  Sure—it wasn’t always perfect, but it got you 90% of the way there.  And once it was in XHTML form, cleaning up any remaining cruft was (usually) trivial.  Best of all, after I was done using the power of XML tools to work on the document, it was simple to transform it back into plain HTML again (for example, if I was working on something for somebody else who wanted vanilla HTML).

Is there any hope for this?

—Tony

Received on Wednesday, 5 October 2011 15:06:22 UTC