- From: <jose.kahan@w3.org>
- Date: Fri, 1 Sep 2000 14:55:13 +0200 (MET DST)
- To: Wayne Davison <wayne@clari.net>
- CC: www-lib@w3.org
Hello Wayne, Hello Wayne, In our previous episode, Wayne Davison said: > >From what I've been able to determine so far, it looks like the HTML > parser in libwww doesn't have a means of converting HTML into well- > formed syntax. Correct? Let's see if I understood your question. You mean you want something to take HTML that may not be ok and outputting valid HTML? As far as I can tell, the HTML parser ignores all tags and attributes that it doesn't understand. I couldn't find any code for invoking user handlers during the detection of tags. You may be interested in tidy which does just that. http://www.w3.org/People/Ragget/tidy/ > I have an application that wants to get > all the implied start-tag and > end-tag events without having to implement my own custom > understanding of the HTML 4 specs. I'm > currently planning to use gnome's libxml2 for this function, but I > wanted to make sure that I wasn't missing anything first (such as > some hidden expat functionality, or something). If you're only interested in getting the beginning and end tag, you can use expat for that. The Library/Examples/showxml.c module gives some hints on how to use it inside libwww. Note that this is a non-validating parser. It only checks that an XML document is well formed. Hope this helps, -Jose
Received on Friday, 1 September 2000 08:55:16 UTC