- From: Wayne Davison <wayned@users.sourceforge.net>
- Date: Fri, 1 Sep 2000 11:32:10 -0700 (PDT)
- To: jose.kahan@w3.org
- Cc: www-lib <www-lib@w3.org>
On Fri, 1 Sep 2000 jose.kahan@w3.org wrote: > You mean you want something to take HTML that may not be ok and outputting > valid HTML? It's not really that it's not OK, but that it's not well-formed (with all the implied open and close tags present). The basic idea is to allow any HTML document on the web to be input, and to generate SAX-style events (or a DOM tree) for (essentially) the XHTML version of the document, with all the implied tags present. For example, browsers understand the following HTML page: <TITLE>foo</TITLE> This is under construction! Using libxml's HTML parser, I get tag events for the following tag set: <html><head> <title>foo</title> </head><body> <p>This is under construction!</p> </body></html> Another popular set of implied tags has to do with tables. If the user uses a </table> inside a <td> element, a several implied close tags are generated. > You may be interested in tidy which does just that. I looked at tidy, but I wanted a SAX interface that would be less memory intensive. Fortunately, libxml's HTML parser fits the bill nicely. Thanks, ..wayne..
Received on Friday, 1 September 2000 14:32:48 UTC