Re: WWWLIB Parser, is there going to be any update on it?
Markku Savela writes:
> I have talked about this issue earlier, but couldn't find my message
> concerning it from the archives (someone deleted it?)
No, but we had had some problems with the mail archives. The software we use
is not stable :-(
> I think the "SGML.c" in the library attempts to be too clever and
> trips over. The control tables (HTMLPDTD.*) are not really sufficient
> for full SGML parsing and SGML.c parser should not try to be such.
It's actually too much to call the parser in SGML.c a SGML parser - it's
not! However, we are moving the parsing effort to our new Amaya client which
has been released to W3C members. It will later become publicly available
according to the normal W3C rules. This is the reason for not putting nay
more resources into the SGML/HTML/HText interface.
Yoy can find information about Amaya at
The interface will be the same in that the Amaya parser is a normal libwww
stream which can handle the data just like the old SGML stream, however, the
HTML and HText interfaces will change completely.
> My suggestion is, that SGML.c should be stripped into simple
> "SGML-tokenizer". It would produce technically the same output as it
> does now (structured stream with elements, content and entities), but
> it should not attempt any "fixing" or "checking" of the HTML.
Having the SGML parser being simply a tokenizer is a good idea, I know that Dan has been working on that for some time. You can find documentation on this at
Henrik Frystyk Nielsen, <firstname.lastname@example.org>
World Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA