Re: [nodejs] Re: [ANN] W3C standards C++ XML DOM parser for NodeJS from Edward O'Connor on 2010-10-06 (www-archive@w3.org from October 2010)

From: Edward O'Connor <hober0@gmail.com>
Date: Wed, 6 Oct 2010 12:20:29 -0700
To: Marco Rogers <marco.rogers@gmail.com>
Cc: www-archive@w3.org
Message-ID: <AANLkTimkW3eKY8PsUcgnUeFAjeFzPZAsJZ1jGW73bPUa@mail.gmail.com>

Hi,

> Interesting.  I wouldn't put money on how long it'll be before you can
> actually rely on this algorithm.

FF4 ships with it. (FF3.something shipped with it too, actually, but
it was turned off by default. Set option "html5.enable" to true to use
it in FF36.)

WebKit nightlies also ship with it. I can't remember if the current
releases of Safari and Chrome do too, but they will soon.

Dunno about Opera offhand. IE9's parser, while in some respects better
than what came before, isn't quite there.

All that said, the algorithm was (is) developed with backwards
compatibility in mind--the DOM it produces is usually close to what a
legacy browser would produce. Closer than some other random tag soup
parser, anyway.

> But I get your point.  Is there a reference implementation of it?

Not an official one, but there are some mature ones out there.
html5lib implements it in Python and Ruby. Henri Sivonen's Java
implementation powers validator.nu, and (in
mechanically-translated-into-C++-form) is the FF parser too. As I
mentioned on-list, node's own Aria Stewart has implemented it in
JavaScript. And (apparently because I hate myself) I'm working on an
implementation in Emacs Lisp.


HTH.
Ted

Received on Wednesday, 6 October 2010 19:21:17 UTC