W3C home > Mailing lists > Public > www-archive@w3.org > December 2007

Re: Parsing and breaking an HTML document into pieces

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Tue, 04 Dec 2007 08:21:07 +0000
Message-ID: <47550DF3.4000507@cam.ac.uk>
To: Karl Dubost <karl@w3.org>
CC: James Graham <jg307@cam.ac.uk>, www-archive@w3.org

> Hi James,
> 
> I kind of remember that you created a script to parse HTML 5  
> specification and break into pieces? Using [html5lib][1] probably?

I think that was me instead

> Do you have an handy link for the script?

http://html5.googlecode.com/svn/trunk/spec-splitter/spec-splitter.py

> Did you choose to break on specific heading levels?

It breaks on <h2>, <h3>, and a few hard-coded extra headings ('video', 
'the-canvas', 'the-command', 'tokenisation', 'tree-construction')

> Best.

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Tuesday, 4 December 2007 08:22:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:18:12 GMT