- From: Sasha Firsov <suns@firsov.net>
- Date: Fri, 23 Dec 2022 09:57:50 -0800
- To: Michael Kay <mike@saxonica.com>
- Cc: "public-xslt-40@w3.org" <public-xslt-40@w3.org>
- Message-ID: <CAL8k5D8aOBLzV0j3Kec9ToBfXqURXJuK3YjBfosEhq3HzFJUBg@mail.gmail.com>
> (a) Find a useful set of HTML files (around 1000, ideally) There is a "brutal force" approach. If you can not get the set sufficient for the goal, take the superset which is guaranteed to cover all. Chromium sources have ~90K html files, most reside in test folders. I guess Firefox would have about the same. The old W3C set is still valid but not current. The union of all 3 would give you the assurance of integrity. Of course it is time consuming, but since the final result is needed only during release, perhaps justified. -s [image: image.png] On Thu, Dec 22, 2022 at 3:24 PM Michael Kay <mike@saxonica.com> wrote: > I guess one approach might be: > > (a) Find a useful set of HTML files (around 1000, ideally) that exhibit > the right range of markup characteristics - remembering that we're not > interested in script variations or CSS variations or interactive behavoiur, > only in markup > > (b) Put these through a test generator based on Henri Sivonen's HTML5 > parser, to generate (for each one) an XML document that has the same XDM > representation as the HTML. > > (c) Have the test driver compare the XDM produced by parse-html() on the > original document with the XDM produced by parse-xml() on the equivalent > XML. > > I'm still not sure how best to achieve (a). > > This isn't ideal, because we're testing against a trusted implementation > rather than against the specification. And it gets circular if the actual > product-under-test is using the same HTML5 parser that was used to > construct the tests. But it's a potential way forward. > > Michael Kay > Saxonica > > On 22 Dec 2022, at 22:23, Sasha Firsov <suns@firsov.net> wrote: > > Michael, > Not a real answer but could cover half of the needs. > > The test suite has a set of test samples and results to compare against. > The second can be achieved by feeding the input string to actual DOM engine > (Chromium/Blink) and comparing your own parser results with the actual > browser DOM. Going further, by utilizing *cross-browser testing* capabilities like > from @web/test-runner-playwright, you would have a *parser browser > compatibility matrix*. > > While the approach is not a test against "ideal" standards, it is more > valuable in the web development world as shows the cross-browser support, a > criteria to accept any JS library. The browsers themselves are not > following W3C test suites anymore: > > Blink does not currently (4/2013) regularly import and run the W3C's > tests > <https://www.chromium.org/blink/blink-testing-and-the-w3c/#ideal-state> > > As for the 1st half of the question on the parser test set, the > Chromium(Blink) or FF sources have the parser tests in the sources. > Extraction of those is a bit of a challenge though. > Blink sources are in chromium repo > <https://github.com/chromium/chromium/blob/main/third_party/blink/renderer/core/html/parser>. > Do not use `git clone` as it is 30+Gb, get the latest instead: > > https://github.com/chromium/chromium/archive/refs/heads/main.zip > > -s > > PS I have used such approach for testing of TEMPLATE > <https://github.com/chromium/chromium/blob/main/third_party/blink/web_tests/external/wpt/shadow-dom/slots.html#L8> > tag shadowDOM simulation in css-chain > <https://github.com/sashafirsov/css-chain-test/blob/main/src/slots-light-vs-shadow.html> > and light-dom-element tests. > > On Wed, Dec 21, 2022 at 4:06 PM Michael Kay <mike@saxonica.com> wrote: > >> I've just been running a few new tests on our existing parse-html() >> function on SaxonJ (built on TagSoup) and SaxonCS (built on >> HtmlAgilityPack) and reallising how different they are. I suspect that >> getting a good level of interoperability (and tests to prove it) for >> fn:parse-html is going to be challenging! >> >> Is there an HTML5 test suite we can build on? >> >> Michael Kay >> Saxonica >> > >
Attachments
- image/png attachment: image.png
Received on Friday, 23 December 2022 17:58:16 UTC