- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Sat, 05 Nov 2011 01:16:34 +0100
- To: tidy-develop@lists.sourceforge.net
Hi, http://lists.w3.org/Archives/Public/www-archive/2011Nov/0005.html has a patch that adds support for "HTML5" and "XHTML5" as per W3C's "Last Call" Working Draft <http://www.w3.org/TR/2011/WD-html5-20110525/>. The intended level of support is just "Does not corrupt or mark as invalid fully conforming documents". It is not intended to conform, say, to the "HTML5" parsing requirements in any way beyond that. The patch breaks the public `tidyAttrIsProp` function, which is supposed to tell whether an attribute is proprietary, but it's passed only the attribute and that's not enough to answer the question, so now it always returns the same value. I doubt this affects anybody. I'll probably make it do so by returning just "no" instead of the current indirect method, and change AttributeVersions back into a static function. Breaking it is a side-effect of removing the versions column from the attribute_defs table, as above, it's not useful to know which versions have a "type" attribute on one or more elements, as we have that for all important document types and all their elements and attributes on a per- element basis. This currently rejects "data-*" attributes, they need a special case in some place I haven't yet looked up. It also does not support inline SVG and MathML content, I am not entirely sure how to support those without breaking other content while not spending much effort on the problem. A simple example would be handling of the SVG <title> element which likely needs to be handled differently than the HTML <title> element. So the patch mainly just updates the element and attributes tables, and I guessed some parsing approximations, like <section> is parsed like a <div>, which is a good approximation, but others might be not so good. I also updated the "auto" doctype logic, so if you use "HTML5"-only markup and no non-"HTML5"-markup you should get the appropriate doctype. There is no --doctype setting to force "HTML5" output. I might add a "plain" setting there, not the best choice, but "five" would be misleading due to the lack of version numbers. I have not updated any of the already known elements in the tag_defs table, I am unsure how to handle <menu> there for instance which used to be CM_OBSOLETE but has been resurrected. <keygen> and <wbr> are similar. So those likely need some fine-tuning. Similarily, there may have been changes to the lexical space of some attribute values which may lead Tidy to complain about values that haven't been allowed before. That too is fine-tuning that doesn't necessarily have to be done by me. If there is enough interest in this that we get some test reports to the develop@lists.sourceforge.net mailing list, and people can't find major bugs, I might polish the patch and commit it. regards, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Saturday, 5 November 2011 00:17:02 UTC