- From: Marco Rogers <marco.rogers@gmail.com>
- Date: Wed, 6 Oct 2010 14:59:28 -0400
- To: "Edward O'Connor" <hober0@gmail.com>
- Cc: www-archive@w3.org
- Message-ID: <AANLkTimU7iOTFNvZmUjZa=GVfMJe_9KiF7Sje6wgekbL@mail.gmail.com>
Interesting. I wouldn't put money on how long it'll be before you can actually rely on this algorithm. But I get your point. Is there a reference implementation of it? :Marco On Wed, Oct 6, 2010 at 2:54 PM, Edward O'Connor <hober0@gmail.com> wrote: > Hi, > > [Taken off-list as this isn't really node-specific anymore.] > > > @Edward, the html parser in libxml2 is very good. In some preliminary > > tests, I've done, it does pretty well even with crappy markup. > > Fundamentally, I'm interested in DOM consistency. Given the same > sequence of bytes, does the libxml2 HTML parser generate the same DOM > that the major browsers do? > > > When you say "browser-compatible" > > When I say "browser-compatible," I mean > > > http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing > > > that doesn't mean much because each browser has their own parser, and > > when you dig in you'll find that there is quite a bit of difference > > between them. > > All four browser engines are converging on the same parsing algorithm, > linked above. Which means that, going forward, all five major browsers > will produce the same DOM from the same arbitrary-pile-of-bytes that > passes for HTML on the web. > > Which means that there's really no reason for people to implement or use > other tag soup parsing algorithms. > > > Ted > -- Marco Rogers marco.rogers@gmail.com Life is ten percent what happens to you and ninety percent how you respond to it. - Lou Holtz
Received on Thursday, 7 October 2010 17:32:50 UTC