- From: Jirka Kosek <jirka@kosek.cz>
- Date: Fri, 23 Dec 2022 23:31:00 +0100
- To: Michael Kay <mike@saxonica.com>, "public-xslt-40@w3.org" <public-xslt-40@w3.org>
- Message-ID: <47528d4c-a4b6-2642-294c-cd282590a257@kosek.cz>
On 22.12.2022 1:06, Michael Kay wrote:
> I've just been running a few new tests on our existing parse-html() function on SaxonJ (built on TagSoup) and SaxonCS (built on HtmlAgilityPack) and reallising how different they are. I suspect that getting a good level of interoperability (and tests to prove it) for fn:parse-html is going to be challenging!
Hi,
I think it would be good to have parsing consistent with web browsers
which means implementing HTML5 parsing algorithm. I have been using the
following parser when I needed to process HTML5 input by XSLT:
https://about.validator.nu/htmlparser/
Perhaps switching to this parser from TagSoup would give better results
if some other HTML5 compliant parser would be used in .NET product as well.
Jirka
--
------------------------------------------------------------------
Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
------------------------------------------------------------------
Professional XML and Web consulting and training services
DocBook/DITA customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
Bringing you XML Prague conference http://xmlprague.cz
------------------------------------------------------------------
Received on Friday, 23 December 2022 22:31:16 UTC