- From: Sam Ruby <rubys@intertwingly.net>
- Date: Wed, 06 Dec 2006 10:22:35 -0500
Anne van Kesteren wrote: > On Wed, 06 Dec 2006 15:13:26 +0100, Sam Ruby <rubys at intertwingly.net> > wrote: >> Count me in. This is actually closer to the original reason why I >> originally subscribed to this list. If given a few tests, I could >> convert them into a useful form,and this form could serve as a model >> for future tests. >> >> My original interest was to write a replacement for Python's SGMLLIB, >> i.e., one that was not based on the theoretical ideal of how SGML >> vocabularies work, but one based on the practical notion of how HTML >> actually is parsed. > > The HTMLTokenizer for such a project is mostly finished already: > > http://code.google.com/p/html5lib/ > > (As in, it actually emits the tokens it has to. I'm quite happy about it!) > > James Graham has been working on the Tree Construction part of the > process (called HTMLParser in parser.py) and Lachlan Hunt is working on > an HTMLInputStream class which handles some of the specifics needed for > the input stream. I have no interest in participating in a project without test cases. On the bright side, the license chosen for that work is fine, and -- if there are test cases -- I have no interest in duplicating others work. - Sam Ruby
Received on Wednesday, 6 December 2006 07:22:35 UTC