Re: generating well-formed HTML? from jose.kahan@w3.org on 2000-09-01 (www-lib@w3.org from July to September 2000)

From: <jose.kahan@w3.org>
Date: Fri, 1 Sep 2000 14:55:13 +0200 (MET DST)
To: Wayne Davison <wayne@clari.net>
CC: www-lib@w3.org
Message-Id: <200009011255.OAA13898@tuvalu.inrialpes.fr>

Hello Wayne,

Hello Wayne,

In our previous episode, Wayne Davison said:

> >From what I've been able to determine so far, it looks like the HTML
> parser in libwww doesn't have a means of converting HTML into well-
> formed syntax.  Correct? 

Let's see if I understood your question.

You mean you want something to take HTML that may not be ok and outputting 
valid HTML? 

As far as I can tell, the HTML parser ignores all tags and attributes that
it doesn't understand. I couldn't find any code for invoking user handlers
during the detection of tags.

You may be interested in tidy which does just that.

	http://www.w3.org/People/Ragget/tidy/

> I have an application that wants to get > all the implied start-tag and 
> end-tag events without having to implement my own custom
> understanding of the HTML 4 specs.  I'm
> currently planning to use gnome's libxml2 for this function, but I
> wanted to make sure that I wasn't missing anything first (such as
> some hidden expat functionality, or something).
 
If you're only interested in getting the beginning and end tag, you
can use expat for that. The Library/Examples/showxml.c module gives
some hints on how to use it inside libwww. Note that this is a non-validating
parser. It only checks that an XML document is well formed.

Hope this helps,

-Jose

Received on Friday, 1 September 2000 08:55:16 UTC