Re: Testing parse-html

On 28 Dec 2022, at 09:49, Reece Dunn <msclrhd@googlemail.com<mailto:msclrhd@googlemail.com>> wrote:

Hi,

I'm wondering if it would make sense to construct a set of tests that:
a) exercise the different parts of the HTML parser algorithm;
b) exercise the different parts of the HTML tree construction algorithm (e.g. the addition of a missing html element);
c) exercise the various HTML entities;
d) exercise the various void elements;
e) cover the various html elements.

That's certainly useful, but it's significant work to do this from scratch. I'm hoping we will do a mixture of hand-generated tests for specific cases like this, and a larger set of "unfocussed" tests based on a sample of real HTML5.

Note: For HTML5 support, I've used JSoup (https://github.com/jhy/jsoup) in various projects.


JSoup looks useful but it seems to build its own object model and there's no export to XHTML as far as I can see, so you're left with the tak of mapping the JSoup object model to XDM. Which is doable, of course, but involves work.

Michael Kay
Saxonica

Received on Wednesday, 28 December 2022 13:16:54 UTC