- From: Henrik Frystyk Nielsen <frystyk@w3.org>
- Date: Wed, 14 Aug 1996 10:12:45 -0400
- To: msa@hemuli.tte.vtt.fi (Markku Savela)
- cc: www-lib@w3.org, Vincent.Quint@imag.fr, connolly@w3.org
Markku Savela writes: > I have talked about this issue earlier, but couldn't find my message > concerning it from the archives (someone deleted it?) No, but we had had some problems with the mail archives. The software we use is not stable :-( > I think the "SGML.c" in the library attempts to be too clever and > trips over. The control tables (HTMLPDTD.*) are not really sufficient > for full SGML parsing and SGML.c parser should not try to be such. It's actually too much to call the parser in SGML.c a SGML parser - it's not! However, we are moving the parsing effort to our new Amaya client which has been released to W3C members. It will later become publicly available according to the normal W3C rules. This is the reason for not putting nay more resources into the SGML/HTML/HText interface. Yoy can find information about Amaya at http://www.w3.org/pub/WWW/Amaya/ The interface will be the same in that the Amaya parser is a normal libwww stream which can handle the data just like the old SGML stream, however, the HTML and HText interfaces will change completely. > My suggestion is, that SGML.c should be stripped into simple > "SGML-tokenizer". It would produce technically the same output as it > does now (structured stream with elements, content and entities), but > it should not attempt any "fixing" or "checking" of the HTML. Having the SGML parser being simply a tokenizer is a good idea, I know that Dan has been working on that for some time. You can find documentation on this at http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml-lex -- Henrik Frystyk Nielsen, <frystyk@w3.org> World Wide Web Consortium, MIT/LCS NE43-356 545 Technology Square, Cambridge MA 02139, USA
Received on Wednesday, 14 August 1996 10:12:48 UTC