W3C home > Mailing lists > Public > www-talk@w3.org > November to December 1997

anyone got a YACC HTML grammar?

From: <D.White@mcs.surrey.ac.uk>
Date: Wed, 19 Nov 1997 05:36:00 -0500 (EST)
Message-Id: <m0xY7TM-0002rNC@ainur.ee.surrey.ac.uk>
To: www-talk@w3.org
cc: D.White@mcs.surrey.ac.uk, S.Schuman@ee.surrey.ac.uk
Hello everyone,

A quick question: does anyone have a decent YACC grammar for HTML (any recent
version)?  If at all possible, I'd prefer not to get into SGML and DTD parsing
as they seem substantially over-complex..

I'm particularly interested in an empty YACC grammar which I can add suitable
Abstract Data Tree construction calls to, to fully represent the structure
of a single HTML file.

This is for a final year undergraduate project I'm supervising, looking at
building a set of flexible tools to do cross-document checking and updating
for a group of Web pages.  The sort of things I have in mind:

	-	"set a corporate style on all these pages"
		(eg. add a body background tag, add a logo, add a button bar)

	-	"locate all misplaced headers"
		(eg. an h4 straight under an h2! where's the h3!)

	-	(re)number all headers.

	-	add link names to all headers.

	-	make a table of contents indexing all headers.

	-	build an index of all links for later checking.

We'd like to start with a good HTML parser, build parse trees, store them and
then investigate tree-rewriting routines and tree-printing routines to build
up a reusable toolkit of tools, perhaps being able to glue them together like
Unix pipelines..

cheers
duncan
Received on Thursday, 20 November 1997 19:43:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:23 GMT