[Prev][Next][Index][Thread]

Re: More about structured streams, SGML/HTML parser...



In message <199601171449.PAA15796@heike.nads.de>, Rainer Klute writes:
>
>We are currently considering to experiment with Jim Clark's SGML
>parser SP in order to munge it into a libwww converter and to do
>some other things with it. Unfortunately it is still without
>documentation. Does anyone have experience with it? You are invited
>to join us guessing around!

Er... it's no small task. sp uses a "suck the data from the stream"
model, whereas libWWW converters use the "blow the data at
the converter" model. I know one development organization that
implemented coroutines in order to do the impedence mismatch.
Can you say assembler code?

And that wasn't the worst of it, from what I heard.

Granted, much of these folks experience got fed back to James
Clark, so you may get the benefit of it by this point.

But libwww will have the functionality you want before too long.

I've got the lexical part done, and I'm working on the structural
part. I'm borrowing some design ideas from the python community:
basically, each element type is modeled as a class that implements
a regular expression. Compose them together just right, and you've
effectively got a validating parser (SGML inclusion/exclusion exceptions
may complicate things, but...)

Stay tuned...

Dan


References: