Re: [nodejs] Re: [ANN] W3C standards C++ XML DOM parser for NodeJS

Interesting.  I wouldn't put money on how long it'll be before you can
actually rely on this algorithm.  But I get your point.  Is there a
reference implementation of it?

:Marco

On Wed, Oct 6, 2010 at 2:54 PM, Edward O'Connor <hober0@gmail.com> wrote:

> Hi,
>
> [Taken off-list as this isn't really node-specific anymore.]
>
> > @Edward, the html parser in libxml2 is very good.  In some preliminary
> > tests, I've done, it does pretty well even with crappy markup.
>
> Fundamentally, I'm interested in DOM consistency. Given the same
> sequence of bytes, does the libxml2 HTML parser generate the same DOM
> that the major browsers do?
>
> > When you say "browser-compatible"
>
> When I say "browser-compatible," I mean
>
>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing
>
> > that doesn't mean much because each browser has their own parser, and
> > when you dig in you'll find that there is quite a bit of difference
> > between them.
>
> All four browser engines are converging on the same parsing algorithm,
> linked above. Which means that, going forward, all five major browsers
> will produce the same DOM from the same arbitrary-pile-of-bytes that
> passes for HTML on the web.
>
> Which means that there's really no reason for people to implement or use
> other tag soup parsing algorithms.
>
>
> Ted
>



-- 
Marco Rogers
marco.rogers@gmail.com

Life is ten percent what happens to you and ninety percent how you respond
to it.
- Lou Holtz

Received on Thursday, 7 October 2010 17:32:50 UTC