Re: Question about libwww SGML/HTML parser...

Maciej Puzio writes:

> I'm not an SGML expert, but perhaps my answer will help you.
> I had a lot of problems with the SGML parsing (SGML.c and HTMLPDTD.c),
> and I introduced some changes in my copy of the library code. Unfortunately
> none of them has been introduced into the distribution version (that's partly my
> fault: I have sent one patch to Henrik - no result, so I haven't sent any more). 
> If you are interested, I can share my ideas with you. These include:
> 1. Better error recovery from ill-formed documents
> 2. Special handling for <P> tags (consider HTML 2.0 construct: <P ALIGN=...> )
> 3. Some other minor changes

I am sorry about the missing patch - it slipped off my working list for 
unknown reasons. I'll put it up on the patch page for the Library.

> There is no HText module in the library (only the interface declaration).
> Perhaps you think about the HMTL.c module. I agree, it does sometimes
> strange things. This is the module I have changed to the biggest extent.
> The original version handles only HTML 1.0 and is designed rather for 
> character mode displays. My extensions made it capable for displaying
> HTML documents in the graphics environment pretty well (e.g. it handles
> nested styles). I haven't introduced any HTML 2.0 complex features 
> (e.g. forms, tables etc), unfortunately. If you are interested, I can give you
> the code and explanations.

You are right that the HText interface is only declared in the Library, the 
definition is for the application to do. You can see an example of this in the 
GridText.c in the Line Mode Browser. You can short circuit the SGML/HTML/HText 
stream pipe completely by simply not setting it up in the list of converter 

Woup - here it is - you can get the patch from


Thanks again!


Henrik Frystyk Nielsen, <frystyk@w3.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA