Re: HTML comments

Klaus Weide wrote:
> Can you substantiate that claim that "Lynx does comments incorrectly"?
> I believe that you are either wrong, or your observation is based
> on an old version of Lynx or on unfamiliarity with the functions bound
> (by default) to the <`> and <'> keys:
> '           HISTORICAL    toggle historical vs. valid/minimal comment parsing
> `           MINIMAL       toggle minimal vs. valid comment parsing

I'm looking at Lynx 2.5. MINIMAL is on by default. When it's on,
comments are parsed incorrectly. So the default behaviour is

> That Lynx has to support various versions of "comment parsing", and the
> fact that "valid comment parsing" is not the recommended installation 
> default, is not Lynx's fault.  It just tries to cope with realities
> inflicted on the Web by other, "major" browsers.

It is Lynx's fault. There are better ways of handling it.
A method I like is to parse comments correctly, but if you reach
the end of the document and you're still inside a comment, re-parse
from the start of that comment, ending the comment at the first '>'.
This may not go too well with the way that libwww is written though.
(Although since the comment is being stored in an HTChunk anyway,
it should be quite possible.) You can probably get away with just
treating strings of dashes as single double-dashes, but I haven't
tested that.

I'm rather dubious about:

  if (end_comment && !isspace(c))
      end_comment = FALSE;

too. I can't think of any situations in which that will help,
but again I haven't tested it.

> In any case, the "libwww" part of Lynx code is based on an earlier
> version of the reference library, but has changed so much since then
> that forming opinions about the Library based upon Lynx's behaviour
> (or the other way around) can be only misleading.

Indeed. I didn't form any opinions about the Library based upon
Lynx's behaviour. If you re-read my original message, you will
see that I was *contrasting* the behaviour of Lynx and libwww.

Lynx makes a *fairly* good job of comments. libwww doesn't
make any attempt at comments.

> > I thought the idea of libwww was to have some kind of
> > reference implementation? It's not much use if it's
> > not right ;-).
> I think "reference implementation" means something different from
> "plug and play" ;)

There's a difference between not everything being done for you,
and the things which are done for you being done wrong.


\  //    Jon Ribbens    // 10MB virtual-hosted // www.oaktree.co.uk
 \// jon@oaktree.co.uk // web space for 49UKP //