W3C home > Mailing lists > Public > www-lib@w3.org > January to March 1996

Re: about SGML parser

From: Don Park <donpark@telewise.com>
Date: Tue, 23 Jan 1996 11:38:07 -0800 (PST)
Message-Id: <199601231938.LAA15349@gw.quake.net>
To: Rainer Klute <klute@nads.de>
Cc: www-lib@w3.org
How about putting SP in a thread and have it go to sleep whenever it
starves?  I am working on a HTML scanner which uses this technique to
support both push and pull usage model.  Looking at the SP source code, I
see that some modifications will be necessary still.

BTW, my scanner is not finished and it is NOT a parser so please don't pelt
me with handitoverson comments.  It is just a tokenizer which returns a
stream of high-level tokens which are objects representing different types
of HTMLElements.  It does not even know that start tag and end tag are
related.  Also, it is based on MFC so it will not fit in with libwww.
However, it will support new tags at runtime so that you can decide to be
pure HTML 3, Netscape friendly, or whatever.  I will describe its design in
the near future so W3C can take advantage of it in its own HTML parser design.

A comment regarding use of Arena's HTML parser in libwww:

After looking at Arena's parser, I think using SP in libwww is the better
and safer route although SP is quite a bit bottom heavy and requires C++
template support.

I guess the question now is whether to have libwww take the full leap into
C++ by using templates, namespaces, dynamic-casting, etc., or stay by the
seaside :).  <-- (that is a period, not a mole)

Don


>>I know that there are several SGML parser frees on
>>the internet such as SP, YSP etc, why don't we
>>integrate one of them into W3C Lib so that W3C
>>Lib will support SGML and we don't worry about
>>the parser whenever  a new version of HTML appear.
>
>We are doing just that. However, due to the nature of SP's input
>model (it is a requesting stream, not a driven one) we have to rape
>the stream concept a bit in our first approach. The HTML input is
>completely dumped into a file. :-( SP will read from this file,
>parse it and push the tokens further down the stream stack.
>
>Best regards
>Rainer Klute
>
>  Dipl.-Inform. Rainer Klute        NADS - Advertising on nets
>  NADS GmbH
>  Emil-Figge-Str. 80                Tel.: +49 231 9742570
>D-44227 Dortmund                    Fax:  +49 231 9742573
>
>            <http://www.nads.de/~klute/>
>
>
>
Received on Tuesday, 23 January 1996 14:39:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:25 GMT