W3C home > Mailing lists > Public > www-lib@w3.org > July to September 2000

Re: generating well-formed HTML?

From: Wayne Davison <wayned@users.sourceforge.net>
Date: Fri, 1 Sep 2000 11:32:10 -0700 (PDT)
To: jose.kahan@w3.org
Cc: www-lib <www-lib@w3.org>
Message-ID: <Pine.LNX.4.21.0009011115110.3354-100000@phong.blorf.net>
On Fri, 1 Sep 2000 jose.kahan@w3.org wrote:
> You mean you want something to take HTML that may not be ok and outputting 
> valid HTML? 

It's not really that it's not OK, but that it's not well-formed (with all
the implied open and close tags present).  The basic idea is to allow any
HTML document on the web to be input, and to generate SAX-style events (or
a DOM tree) for (essentially) the XHTML version of the document, with all
the implied tags present.  For example, browsers understand the following
HTML page:

<TITLE>foo</TITLE>
This is under construction!

Using libxml's HTML parser, I get tag events for the following tag set:

<html><head>
<title>foo</title>
</head><body>
<p>This is under construction!</p>
</body></html>

Another popular set of implied tags has to do with tables.  If the user
uses a </table> inside a <td> element, a several implied close tags are
generated.

> You may be interested in tidy which does just that.

I looked at tidy, but I wanted a SAX interface that would be less memory
intensive.  Fortunately, libxml's HTML parser fits the bill nicely.

Thanks,

..wayne..
Received on Friday, 1 September 2000 14:32:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:37 GMT