- From: Edward O'Connor <hober0@gmail.com>
- Date: Wed, 6 Oct 2010 11:54:03 -0700
- To: Marco Rogers <marco.rogers@gmail.com>
- Cc: www-archive@w3.org
Hi, [Taken off-list as this isn't really node-specific anymore.] > @Edward, the html parser in libxml2 is very good. In some preliminary > tests, I've done, it does pretty well even with crappy markup. Fundamentally, I'm interested in DOM consistency. Given the same sequence of bytes, does the libxml2 HTML parser generate the same DOM that the major browsers do? > When you say "browser-compatible" When I say "browser-compatible," I mean http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing > that doesn't mean much because each browser has their own parser, and > when you dig in you'll find that there is quite a bit of difference > between them. All four browser engines are converging on the same parsing algorithm, linked above. Which means that, going forward, all five major browsers will produce the same DOM from the same arbitrary-pile-of-bytes that passes for HTML on the web. Which means that there's really no reason for people to implement or use other tag soup parsing algorithms. Ted
Received on Wednesday, 6 October 2010 18:54:57 UTC