Re: WWWLIB Parser, is there going to be any update on it?

Markku Savela writes:
> I have talked about this issue earlier, but couldn't find my message
> concerning it from the archives (someone deleted it?)

No, but we had had some problems with the mail archives. The software we use 
is not stable :-(

> I think the "SGML.c" in the library attempts to be too clever and
> trips over. The control tables (HTMLPDTD.*) are not really sufficient
> for full SGML parsing and SGML.c parser should not try to be such.

It's actually too much to call the parser in SGML.c a SGML parser - it's 
not! However, we are moving the parsing effort to our new Amaya client which 
has been released to W3C members. It will later become publicly available 
according to the normal W3C rules. This is the reason for not putting nay 
more resources into the SGML/HTML/HText interface.

Yoy can find information about Amaya at


The interface will be the same in that the Amaya parser is a normal libwww 
stream which can handle the data just like the old SGML stream, however, the 
HTML and HText interfaces will change completely.

> My suggestion is, that SGML.c should be stripped into simple
> "SGML-tokenizer". It would produce technically the same output as it
> does now (structured stream with elements, content and entities), but
> it should not attempt any "fixing" or "checking" of the HTML.

Having the SGML parser being simply a tokenizer is a good idea, I know that Dan has been working on that for some time. You can find documentation on this at


Henrik Frystyk Nielsen, <frystyk@w3.org>
World Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA

Follow-Ups: References: