How to save the copy of source data & RE: stream return codes in HTTee from Maciej Puzio on 1996-01-12 (www-lib@w3.org from January to March 1996)

From: Maciej Puzio <puzio@zodiac1.mimuw.edu.pl>
Date: Fri, 12 Jan 1996 19:05:57 +-100
To: "'Henrik Frystyk Nielsen'" <frystyk@w3.org>
Cc: "'WWW Library Mailing List'" <www-lib@w3.org>
Message-Id: <01BAE121.02A5CA60@pc4180a.mimuw.edu.pl>
Hi Henrik,

[
  To www-lib@w3.org readers:
  If you don't know what I am writing about, please read previous mails.
  This is a fifth level reply, so I had to cut all the original text to make 
  this mail readable. :-)
]
 
> > But the upstream module can then check for both codes and then decide what to 
> > do... The important thing is that both codes (or in general all codes in case 
> > of multiple T streams) get propagated upstream.
> >
> > The resolving function could also be a direct part of the 
> > upstream module which will be activated each time the T stream returns. As I 
> > see it, both ways will do, but maybe a callback function is a more flexibel 
> > solution.
>
> I've got a point of view on it, but I need to think a little.
> I'll write to you soon.

I've just thought a little. I've also found another problem which is interesting by
itself, but is also a good argument in our discussion about HTTee.

BTW: I'm not sure whether the HTTee problem deserves such a long discussion.
It's getting more and more interesting for me, but if you are bored or don't have time,
please let me know. This will save work and time for both of us. :-)

The problem:

In my simple WWW browser all HTML documents were handled by the simple
stream stack, which led from the network throught the MIME parser to the HTML 
module presenting the document to the user. This stream had only one Tee, 
before the MIME parser, which pushed the copy of the data to the cache. One day 
however I decided to implement the "Save as HTML" menu command in my 
browser. For that I needed a copy of the unparsed (but without headers) source 
data. I didn't want to rely on the data from the cache, because the user of my 
browser can disable it.

Below I give the solution to the problem. I'm not sure whether there is no simpler
solution. Perhaps this can be done in some really trivial way? 

The solution:

What I needed was a Tee after the MIME parser. But how to insert it to the stream
stack which is created automatically according to the set of converters assigned to
the request? To do that I created the "converter tee".

HTStream* HTMLPresentAndSave ( 
        HTRequest*	request, 
        void*		param, 
        HTFormat		input_format, 
        HTFormat		output_format, 
        HTStream*		output_stream) 
{ 
    return HTTee 
        (HTMLPresent (request, param, input_format, output_format, output_stream), 
         HTSaveAndCallBack (request, param, input_format, output_format,
        		output_stream));
}

Then I used this new converter instead of HTMLPresent:

    HTConversion_add(c,"text/html", "www/present", HTMLPresentAndSave,  
    	1.0, 0.0, 0.0);

I did the similar trick for the HTPlainPresent converter.

BTW: I used here HTSaveAndCallBack to save the copy of the source data to
the file. I know it has been removed from the library, but I reintroduced it in my copy.
This converter is really useful. I know that using streams everywhere is better than
passing data through the file, but the operating system doesn't know this and 
in some case requires the file name. HTSaveAndCallBack is also useful as a
temporary file writer (in this role I used it above, HTFWriter is not good since it's
not a converter).

Instead of writing to a file I could have used the HTXParse converter, which would
put the data in the memory buffer.

This solution works perfectly, but has some drawbacks:

1. I'm not able to write a generic "converter tee". That is, I has to define very
similar converter tee functions for every pair of converters I want to be tee'ed.
This is beacuse I can only give a function pointer in HTConversion_add and
I can't pass the converter any parameters.

2. I don't have any access to the results of the conveter's work. For example, the
only way I can get the file name used for saving the data by HTSaveAndCallBack
is the callback function I assign to the request. This causes that I can
have only one callback function per request. In my application I use only one
HTSaveAndCallBack per request, so this doesn't matter, but it's possible to
have several HTSaveAndCallBacks used for different purposes in one request.
What's then? It's even worse in the HTXParse case. To return the pointer to the
memory buffer it uses only one callback function defined in the HTEPtoCl module
(so we can only define one callback function in the whole application).

How to improve this?

My proposition:

Converters are now functions returning streams. I propose to make them objects.

Below I used the C++ code to make my ideas more readable. Of course, it has
to be converted to plain C before using it in the library. Please consider the following
as the pseudocode.

class HTConverter
{
public:
	virtual HTStream* CreateStream (
	        HTRequest*	request, 
	        void*		param, 
	        HTFormat		input_format, 
	        HTFormat		output_format, 
	        HTStream*		output_stream) = 0;
};

Some examples of derived classes:

class HTSaveAndCallBack : public HTConverter
{
public:
	HTSaveAndCallBack()  { ...initialize... }

	HTStream* CreateStream ( ...parameters as above... )
		{ ...do what HTSaveAndCallBack function does now... }

	//callback for the stream, we can also make it a parameter to the constructor
	virtual void OnStreamClose (HTRequest* request, char* filename);

};

class HTConverterTee : public HTConverter
{
	HTConverter* converter1;
	HTConverter* converter2;
public:
	HTConverterTee (HTConverter* conv1, HTConverter* conv2)
		{ ...initialize... }

	HTStream* CreateStream ( ...parameters as above... )
		{ return HTTee (
			conv1->CreateStream ( ...arguments... ),
			conv2->CreateStream ( ...arguments... )) }

	//callback to resolve Tee result codes, we can also make it a parameter
	//to the constructor
	int ResolveResults (int result1, int result2)
		{ ...default implementation... }
};

These converter objects would be created before assigning to the request and
would be destroyed together with the request.

Drawback: this looks beautiful in C++, but in C? I'd rather not to think. :-)

Now let me go back to the HTTee result codes problem.
The "converter tee" is an example of the situation when the code pumping data
to the stream doesn't know whether it's pumping it to a Tee or to something else.
That's why it can't resolve which result is more important and what to do if it gets
success and failure together. That's the argument for passing the resolving function 
to a tee creation function as a parameter (callback). It also would be passed to the
converter tee (as a callback). In the example above it is a virtual member function 
(to override), but in C it will be difficult to program, I think.

Thanks a lot if you've managed to get to this line!  :-)


Maciej Puzio
puzio@laser.mimuw.edu.pl
Received on Friday, 12 January 1996 12:56:32 UTC