- From: Maciej Stachowiak <mjs@apple.com>
- Date: Mon, 17 Nov 2008 17:34:08 -0800
- To: elharo@metalab.unc.edu
- Cc: Jonas Sicking <jonas@sicking.cc>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, noah_mendelsohn@us.ibm.com, Dean Edridge <dean@dean.org.nz>, public-html <public-html@w3.org>, www-tag@w3.org
On Nov 17, 2008, at 7:21 AM, Elliotte Harold wrote: > Maciej Stachowiak wrote: > >> And finally, in my experience, it is not necessarily even true that >> the error handling of HTML makes it harder to implement parsing >> than for XML. In WebKit, the pieces of code implementing HTML and >> XML parsing are close to the same size, and that is not even >> including the libxml library that does most of the heavy lifting in >> XML parsing. > > But is WebKit already fully implementing the HTML 5 parsing > algorithm? That is what I suspect is going to complexify parsers > beyond plausibility. We don't fully implement the HTML5 parsing algorithm, but I believe that when we do, it will make our parsing code simpler rather than more complex. The HTML5 parsing algorithm makes a lot of things systematic that are messy ad-hoc cases in our current parsing code. As an example of this, we do have a pretty much HTML5-compatible tokenizer for our preload scanning, and it is a lot cleaner than the main tokenizer. Part of my excitement about the HTML5 parsing algorithm, as a browser implementor, is that it will simplify our parsing code while also making it more compatible with content and more interoperable with other browsers. If I thought it would be a lot more complicated than our current code, I would probably be opposed to implementing it. Regards, Maciej
Received on Tuesday, 18 November 2008 01:34:51 UTC