Re: Question about libwww SGML/HTML parser...
To: Maciej Puzio <firstname.lastname@example.org>
Subject: Re: Question about libwww SGML/HTML parser...
From: Henrik Frystyk Nielsen <email@example.com>
Date: Thu, 18 Jan 1996 11:34:23 -0500
Cc: "'Markku Savela'" <firstname.lastname@example.org>, "'WWW Library Mailing List'" <email@example.com>
From firstname.lastname@example.org Thu Jan 18 11: 35:10 1996
Reply-To: Henrik Frystyk Nielsen <email@example.com>
X-Mailer: exmh version 1.6.2 7/18/95
Maciej Puzio writes:
> I'm not an SGML expert, but perhaps my answer will help you.
> I had a lot of problems with the SGML parsing (SGML.c and HTMLPDTD.c),
> and I introduced some changes in my copy of the library code. Unfortunately
> none of them has been introduced into the distribution version (that's partly my
> fault: I have sent one patch to Henrik - no result, so I haven't sent any more).
> If you are interested, I can share my ideas with you. These include:
> 1. Better error recovery from ill-formed documents
> 2. Special handling for <P> tags (consider HTML 2.0 construct: <P ALIGN=...> )
> 3. Some other minor changes
I am sorry about the missing patch - it slipped off my working list for
unknown reasons. I'll put it up on the patch page for the Library.
> There is no HText module in the library (only the interface declaration).
> Perhaps you think about the HMTL.c module. I agree, it does sometimes
> strange things. This is the module I have changed to the biggest extent.
> The original version handles only HTML 1.0 and is designed rather for
> character mode displays. My extensions made it capable for displaying
> HTML documents in the graphics environment pretty well (e.g. it handles
> nested styles). I haven't introduced any HTML 2.0 complex features
> (e.g. forms, tables etc), unfortunately. If you are interested, I can give you
> the code and explanations.
You are right that the HText interface is only declared in the Library, the
definition is for the application to do. You can see an example of this in the
GridText.c in the Line Mode Browser. You can short circuit the SGML/HTML/HText
stream pipe completely by simply not setting it up in the list of converter
Woup - here it is - you can get the patch from
Henrik Frystyk Nielsen, <firstname.lastname@example.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA