Re:[Q] Things in an Anchor? from James Gallagher on 1996-06-08 (www-lib@w3.org from April to June 1996)

From: James Gallagher <jimg@dcz.cvo.oneworld.com>
Date: Sat, 8 Jun 96 10:47:54 PDT
To: www-lib@w3.org
Message-Id: <9606081747.AA26730@dcz.cvo.oneworld.com>
 >"Sacha" == Sacha  <sacha@clip.dia.fi.upm.es> writes:

 > Hi,

 > What are the various fields in an Anchor as defined in HTAncMan.h?
 > ALso, can someone point out to me which are fields designed to be
 > set by the application and which are designed to be set after fetching
 > the document?

I can help with one...

 > physical address - is this the URI where the document came from?
 > absolute address - is this the IP address? Why is there no function to
 >                    access it in HTAnchor.h? Is it for internal use only?
 > content encoding - can someone give examples of what this might be? It
 > 		   seems to always be NULL. 

Content-Encoding is set after the document is fetched. BUT if you use
HTLoadRelative() it will not be set in the Anchor object passed to that call.
Instead it will be set in the Anchor object bound to the Request object
passed into the call. This is tricky because an Anchor is bound to the
request *inside* the HTLoadRelative() call and the Stream object sets that
Anchor's content-encoding field. So, even if you explicitly bind an Anchor to
a request, to get the MIME fields from an Anchor after loading a document,
you must use the anchor wwwlib bound to the Request - not the Anchor you
bound to the request.

I think that this is the case for *all* the HTLoad* functions *except*
HTLoadAnchor()).

[ Henrik, it that right? Can you explain why the library uses a different
Anchor from the one passed in? ]

Here is some code (C++ code, object members start with underscores. The code
assumes that the library has been initialized):

    HTRequest *request = HTRequest_new();

    HTRequest_setContext (request, this); // Bind THIS to request 

    HTRequest_setOutputFormat(request, WWW_SOURCE);

    // Set timeout on sockets
    HTEvent_registerTimeout(_tv, request, timeout_handler, NO);

    HTRequest_setAnchor(request, (HTAnchor *)_anchor);

    HTRequest_setOutputStream(request, HTFWriter_new(request, stream, YES));

    status = HTLoadRelative((const char *)url, _anchor, request);

    if (status != YES) {
	if (SHOW_MSG) cerr << "Can't access resource." << endl;
	return false;
    }

    // LoadRelative uses a different anchor than the one bound to the request
    // in this function. Extract what you need from the anchor used in that
    // call.
    HTEncoding enc = HTAnchor_encoding(HTRequest_anchor(request));
    _encoding = get_encoding((char *)HTAtom_name(enc));

    HTRequest_delete(request);

 > cte              - what is the difference between this and content
 >                    encoding? I presume this is set depending on the
 >                    document that is fetched?
 > content language - the comment says this should be a list - why? Could
 >                    a document be in multiple languages?

I believe that this is for the Accept header sent to the server. I don't
think it gets set for a response. Look in HTMIME.c:parseheader(...) to see
what gets parsed and where it goes.

 > charset          - this also always seems to be NULL after fetching a
 >                    document. Is it supposed to be set before fetching?
 > level            - ditto
 > derived from     - is this a relationship between anchors or something?
 > version          - of what? The anchor? the document? the library? the
 >                    protocol?

 > Lots of questions I know, but I can't find many answers in the documentation.
 > Hang on - I just found a bit about derived from and version, ok... ANd about
 > charset/level... are these NULL because neither of them is used by the HTML
 > parser (as stated in Library/User/Using/MIME.html)? Would it be very
 > difficult to extract them, seeing as you are getting content_type anyway?

Fields that are not parsed by wwwlib can be parsed by your own handler (which
you can register, ...).

Here's what I do to get the Content-Description header:

    int 
    header_handler(HTRequest *request, const char *token)
    {
	String field, value;
	istrstream line(token);
	line >> field; field.downcase();
	line >> value; value.downcase();

	if (field == "content-description:") {
	    DBG2(cerr << "Found content-description header" << endl);
	    Connect *me = (Connect *)HTRequest_context(request);
	    me->_type = get_type(value);
	}
    #if 0
	// Parsed by HTMIME.c (wwwlib).
	else if (field == "content-encoding:") {
	    DBG(cerr << "Found content-encoding header" << endl);
	    Connect *me = (Connect *)HTRequest_context(request);
	    me->_encoding = get_encoding(value);
	}
    #endif
	else {
	    if (SHOW_MSG)
		cerr << "Unknown header: " << token << endl;
	}

	return HT_OK;
    }

Then in the function I use to initialize the library:

    // Register our own MIME header handler for extra headers
    HTHeader_addParser("*", NO, header_handler);

Hope this helps.

James
Received on Saturday, 8 June 1996 13:48:15 UTC