W3C home > Mailing lists > Public > www-lib@w3.org > April to June 2001

Re: Getting both a chunk and HText callbacks

From: Joel Young <jdy@godel.cs.brown.edu>
Date: Fri, 15 Jun 2001 19:52:38 -0400
Message-Id: <200106152352.TAA00086@godel.cs.brown.edu>
To: www-lib@w3.org
cc: jdy@cs.brown.edu

Since I seem to be talking to myself on this list, let me continue the
conversation by myself.

Yes, Joel, that is a pretty clever adaptation of the other guys work,
but you need to be careful with what you are doing.  Are you sure you
want to add that conversion everytime you make a request?  Did you
remember to remove it when it isn't needed?

Probably not.

Well if you want to just do it for a particular request you could
simply do:

        HTList* mylist = HTList_new();
        HTConversion_add( 
            mylist,
            "text/html", 
            "www/present", 
            (HTConverter*) &HTMLPresentAndChunk, 
            1.0, 0.0, 0.0);
        HTRequest_setConversion(request, mylist, YES);

instead of adding the conversion directly to the HTFormat_conversion
list.  Seems to work a little better.  

**** It still gets -902 interrupted errors on long pages.  The page
seems to parse just fine and be complete in the chunk but even so the
error arises.  Note that it doesn't arise just using the HText callbacks
without the above conversion.

As libwww doesn't seem to be actively maintained, nor do questions get
answered much on this list, maybe it is time to give libcurl/libghttp
and libxml2 a try.  They seem quite functional and not quite so overly
recherche (1970 I. Murdoch sense).


Joel
--------
From: Joel Young <jdy@godel.cs.brown.edu>
Date: Fri, 25 May 2001 17:14:36 -0400
  To: www-lib@w3.org
  Cc: jdy@cs.brown.edu
Subj: Re: Getting both a chunk and HText callbacks 


I figured out a hack using a hint from the archive:

http://lists.w3.org/Archives/Public/www-lib/msg00377.html

By adding a pointer to "chunk" as an element of the request context, 
the following "converter tee" (Maciej Puzio) does the trick.

//////
static HTStream* HTMLPresentAndChunk ( 
HTRequest*      request, 
void*           param, 
HTFormat        input_format, 
HTFormat        output_format, 
HTStream*       output_stream) 
{ 
requestcontext_t* context = 
reinterpret_cast<requestcontext_t*>(HTRequest_context(request));
return HTTee(
HTMLPresent(request,param,input_format,output_format,output_stream), 
HTStreamToChunk(request,&context->chunk,-1),0);
}
//////

When combined with

//////
HTRequest* request = HTRequest_new();
requestcontext_t* context = new requestcontext_t(this);
HTRequest_setContext(request,context);

HTNet_addAfter(&term_handler, 0, 0, HT_ALL, HT_FILTER_LAST);
HText_registerCDCallback(&RHText_new,&RHText_delete);
HText_registerTextCallback(&add_text);
HText_registerLinkCallback(&found_link);

HTAlert_setInteractive(NO);
HTHost_setEventTimeout(15000); // if can't load 15 secs, abort

HTAnchor* anchor = HTAnchor_findAddress(url);

HTConversion_add( 
HTFormat_conversion(),
"text/html", 
"www/present", 
(HTConverter*)&HTMLPresentAndChunk, 
1.0, 0.0, 0.0);

HTLoadAnchor(anchor, request);

HTEventList_newLoop();
//////

and now context->chunk contains the page.

Does this make sense?  

Why doesn't HTRequest_conversion(request) work instead of the
HTFormat_conversion()?  

Why doesn't RHText_delete ever get called?  Since the "system" doesn't
call it, where should I call it from?

How does one know that that is the correct place to do the
HTConversion_add?  Is there a better place?

I am still missing the "big picture" with this library.

Thanks,

Joel
jdy@cs.brown.edu
--------
From: Joel Young <jdy@godel.cs.brown.edu>
Date: Fri, 25 May 2001 15:30:24 -0400
To: www-lib@w3.org
Cc: jdy@cs.brown.edu
Subj: Getting both a chunk and HText callbacks

I am trying to use HTTee to get libwww to simultaneously from one
HTLoad to load the webpage into a chunk and also to call my HText
callbacks.  Here is the code I am using:

HTRequest* request = HTRequest_new();
HTNet_addAfter(&term_handler, 0, 0, HT_ALL, HT_FILTER_LAST);
HTAlert_setInteractive(NO);
HTHost_setEventTimeout(15000); // if can't load 15 secs, abort

HTAnchor* anchor = HTAnchor_findAddress(url);
HTRequest_setAnchor(request, anchor);

HTChunk* chunkchunk = 0; 
//  HTRequest_setOutputFormat(request,WWW_SOURCE);
HTStream* chunkstream = HTStreamToChunk(request,&chunkchunk,-1);

HText_registerCDCallback(&RHText_new,&RHText_delete);
HText_registerTextCallback(&add_text);
HText_registerLinkCallback(&found_link);

HTStream* target = HTTee(chunkstream, HTRequest_outputStream(request),0);

HTRequest_setOutputStream(request, target);

HTLoad(request, NO);

HTEventList_newLoop();

std::cerr << HTChunk_size(chunkchunk) << std::endl;
char* strchunk = HTChunk_toCString(chunkchunk);
std::cerr << strchunk << std::endl;


Depending on if the HTRequest_setOutputFormat line is commented or not I
either get the HText callbacks or I get the chunk, but I can't seem to
get both.

Any suggestions?

Joel
jdy@cs.brown.edu
Received on Friday, 15 June 2001 19:52:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:39 GMT