- From: <D.White@mcs.surrey.ac.uk>
- Date: Wed, 19 Nov 1997 05:36:00 -0500 (EST)
- To: www-talk@w3.org
- cc: D.White@mcs.surrey.ac.uk, S.Schuman@ee.surrey.ac.uk
Hello everyone, A quick question: does anyone have a decent YACC grammar for HTML (any recent version)? If at all possible, I'd prefer not to get into SGML and DTD parsing as they seem substantially over-complex.. I'm particularly interested in an empty YACC grammar which I can add suitable Abstract Data Tree construction calls to, to fully represent the structure of a single HTML file. This is for a final year undergraduate project I'm supervising, looking at building a set of flexible tools to do cross-document checking and updating for a group of Web pages. The sort of things I have in mind: - "set a corporate style on all these pages" (eg. add a body background tag, add a logo, add a button bar) - "locate all misplaced headers" (eg. an h4 straight under an h2! where's the h3!) - (re)number all headers. - add link names to all headers. - make a table of contents indexing all headers. - build an index of all links for later checking. We'd like to start with a good HTML parser, build parse trees, store them and then investigate tree-rewriting routines and tree-printing routines to build up a reusable toolkit of tools, perhaps being able to glue them together like Unix pipelines.. cheers duncan
Received on Thursday, 20 November 1997 19:43:17 UTC