W3C home > Mailing lists > Public > www-archive@w3.org > October 2010

Re: [nodejs] Re: [ANN] W3C standards C++ XML DOM parser for NodeJS

From: Edward O'Connor <hober0@gmail.com>
Date: Wed, 6 Oct 2010 11:54:03 -0700
Message-ID: <AANLkTikkzOnQwei60H87yuqG_-HKsO49d2d3guKkjDBf@mail.gmail.com>
To: Marco Rogers <marco.rogers@gmail.com>
Cc: www-archive@w3.org
Hi,

[Taken off-list as this isn't really node-specific anymore.]

> @Edward, the html parser in libxml2 is very good.  In some preliminary
> tests, I've done, it does pretty well even with crappy markup.

Fundamentally, I'm interested in DOM consistency. Given the same
sequence of bytes, does the libxml2 HTML parser generate the same DOM
that the major browsers do?

> When you say "browser-compatible"

When I say "browser-compatible," I mean

http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing

> that doesn't mean much because each browser has their own parser, and
> when you dig in you'll find that there is quite a bit of difference
> between them.

All four browser engines are converging on the same parsing algorithm,
linked above. Which means that, going forward, all five major browsers
will produce the same DOM from the same arbitrary-pile-of-bytes that
passes for HTML on the web.

Which means that there's really no reason for people to implement or use
other tag soup parsing algorithms.


Ted
Received on Wednesday, 6 October 2010 18:54:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:33:54 UTC