- From: James Graham <jgraham@opera.com>
- Date: Tue, 22 Jan 2013 09:36:17 +0100
- To: public-html@w3.org
On 01/22/2013 08:16 AM, Henri Sivonen wrote: > On Mon, Jan 21, 2013 at 4:56 PM, Sam Ruby <rubys@intertwingly.net> wrote: >> I've been told that that will be solved over time. I've been told that over >> a long period of time. So far, that has not proven to be true. > ... >> From my perspective, if but a fraction of the energy spent on >> trying to stop this effort were instead spent on either improving parsing >> tools like libxml2 or on determining what a simplified and more robust >> subset of HTML5 would look like, then we could make better progress on this >> issue. > > Mozilla and I don't have pressing own needs for that piece of > software, so time for writing it has been starved by higher-priority > items over and over again. (And now an HTML parser in Rust made its > way into the work queue ahead of the libxml2-compatible thing…) > > If you need it solved faster, the best bet for making is to write code > for a libxml2-compatible HTML-compliant parser yourself instead of > spending time promoting polyglot. (In the case of the Validator.nu > HTML Parser code base, the Gecko-specific stuff is already factored > into CppType.java in the translator, so you could subclass CppTypes > with something that returns value suitable for a libxml2 > API-compatible translation. Support for UTF-8 as the internal encoding > will likely emerge as the side effect of the Rust effort.) FWIW, if you don't want to have a dependency on Henri's Java code (which I feel would be a reasonable position to take for something like libxml2), writing a spec-compliant, non-scripting, HTML parser from scratch just isn't that hard. There is quite a lot of work, perhaps the order of a handful of man weeks, but fortunately there is an *excellent* specification :) In my experience, when implementing in a browser, the *vast* majority of the complexity comes from supporting scripting and document.write, which isn't needed for libxml2.
Received on Tuesday, 22 January 2013 08:36:46 UTC