Re: generating well-formed HTML? from Wayne Davison on 2000-09-01 (www-lib@w3.org from July to September 2000)

From: Wayne Davison <wayned@users.sourceforge.net>
Date: Fri, 1 Sep 2000 11:32:10 -0700 (PDT)
To: jose.kahan@w3.org
Cc: www-lib <www-lib@w3.org>
Message-ID: <Pine.LNX.4.21.0009011115110.3354-100000@phong.blorf.net>

On Fri, 1 Sep 2000 jose.kahan@w3.org wrote:
> You mean you want something to take HTML that may not be ok and outputting 
> valid HTML? 

It's not really that it's not OK, but that it's not well-formed (with all
the implied open and close tags present).  The basic idea is to allow any
HTML document on the web to be input, and to generate SAX-style events (or
a DOM tree) for (essentially) the XHTML version of the document, with all
the implied tags present.  For example, browsers understand the following
HTML page:

<TITLE>foo</TITLE>
This is under construction!

Using libxml's HTML parser, I get tag events for the following tag set:

<html><head>
<title>foo</title>
</head><body>
<p>This is under construction!</p>
</body></html>

Another popular set of implied tags has to do with tables.  If the user
uses a </table> inside a <td> element, a several implied close tags are
generated.

> You may be interested in tidy which does just that.

I looked at tidy, but I wanted a SAX interface that would be less memory
intensive.  Fortunately, libxml's HTML parser fits the bill nicely.

Thanks,

..wayne..

Received on Friday, 1 September 2000 14:32:48 UTC