[Prev][Next][Index][Thread]

More about structured streams, SGML/HTML parser...



Continuing my earlier thread about HTML parsing and structured streams

I wanted a support for something that would parse HTML as into
structured stream, and which would also give the missing end_element
calls, in proper places (so that actual browser part could concentrate
on the presentation instead of having to deal with syntactic things
that belong into the DTD). That is, separate the sometimes heuristic
methods of dealing with invalid HTML, omitted end tag rules or
different versions of HTML into a separate "unifier" stream.

I am considering writing a structured stream that would be activated
with something like (to give you idea of the context)..

	SGML_new(&HTMLP_dtd,
	  HTML_Normalize(request, NULL,
			 input_format, output_format,
			 output_stream));

HTML_Normalize would pass the normal content data (put_character,
put_string, put_block) as is to the output_output stream, but would
automaticly generate extra end_elements (and perhaps even
start_elements) to the output_stream (as outlined in my original
message).

	The question: how is the HTML_Normalize to test whether the
	output stream actually is a structured stream or not? (I don't
	want the code crash just because someone had ordinary stream
	there).

	Do I just check if the stream class name has a "/" in it?

Of course, if writing HTML_Normalize is not a sensible project, would
like to hear about it. (However, I don't see it as a big job either,
it would be just a small tool for separating syntax issues from the
presentation).

--
Markku Savela (msa@hemuli.tte.vtt.fi),     Technical Research Centre of Finland
Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/


Follow-Ups: