On 28 Dec 2022, at 09:49, Reece Dunn <msclrhd@googlemail.com<mailto:msclrhd@googlemail.com>> wrote:
Hi,
I'm wondering if it would make sense to construct a set of tests that:
a) exercise the different parts of the HTML parser algorithm;
b) exercise the different parts of the HTML tree construction algorithm (e.g. the addition of a missing html element);
c) exercise the various HTML entities;
d) exercise the various void elements;
e) cover the various html elements.
That's certainly useful, but it's significant work to do this from scratch. I'm hoping we will do a mixture of hand-generated tests for specific cases like this, and a larger set of "unfocussed" tests based on a sample of real HTML5.
Note: For HTML5 support, I've used JSoup (https://github.com/jhy/jsoup) in various projects.
JSoup looks useful but it seems to build its own object model and there's no export to XHTML as far as I can see, so you're left with the tak of mapping the JSoup object model to XDM. Which is doable, of course, but involves work.
Michael Kay
Saxonica