Hello everyone, A quick question: does anyone have a decent YACC grammar for HTML (any recent version)? If at all possible, I'd prefer not to get into SGML and DTD parsing as they seem substantially over-complex.. I'm particularly interested in an empty YACC grammar which I can add suitable Abstract Data Tree construction calls to, to fully represent the structure of a single HTML file. This is for a final year undergraduate project I'm supervising, looking at building a set of flexible tools to do cross-document checking and updating for a group of Web pages. The sort of things I have in mind: - "set a corporate style on all these pages" (eg. add a body background tag, add a logo, add a button bar) - "locate all misplaced headers" (eg. an h4 straight under an h2! where's the h3!) - (re)number all headers. - add link names to all headers. - make a table of contents indexing all headers. - build an index of all links for later checking. We'd like to start with a good HTML parser, build parse trees, store them and then investigate tree-rewriting routines and tree-printing routines to build up a reusable toolkit of tools, perhaps being able to glue them together like Unix pipelines.. cheers duncanReceived on Thursday, 20 November 1997 19:43:17 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:38:42 GMT