Re: Trouble with message/rfc822 parser from Henrik Frystyk Nielsen on 1996-01-25 (www-lib@w3.org from January to March 1996)

From: Henrik Frystyk Nielsen <frystyk@w3.org>
Date: Thu, 25 Jan 1996 17:20:38 -0500
To: kennykb@cobweb.crd.ge.com
Cc: www-lib@w3.org
Message-Id: <9601252220.AA14329@www20>

kennykb@cobweb.crd.ge.com writes:
> I'm still struggling somewhat fruitlessly to hack up an RFC1867 parser
> using www-lib.  My latest problem is that a part within a multipart/* message
> that has no Content-Type: line seems always to get a Content-Type of
> multipart/* and inherit its parent's boundary, which confuses the parser
> horribly.  The default for MIME parts is supposed to be
> text/plain;charset="us-ascii".

I am aware that the the current implementation of MIME multipart parsing is 
not complete. One of the things missing is the inheritance of content types. 
However, it is not a good idea to asign the content type as you may get empty 
bodies and in that case the stream stack will still be created because it has 
a valid content type.

Another problem - and this in fact affects the first problem as well - is that 
there is only one anchor representing all bodyparts of a multipart message. 
This means that you can not refer to one specific bodypart - this anchor will 
always refer to the whole lot. It may be a good idea to include a mean of 
refering to each individual subtype, however, as there is no valid URL for a 
multipart bodypart it will cause trouble if we make it a parent anchor even 
though it does contains a full object. Maybe the solution would be to come up 
with an _internal_ naming scheme for the individual body parts and then link 
them together using the anchor links. Hmm - I don't have a good solution to 
this problem right now. I hope that this may help you to get some more ideas...

> in the `parseheader' function in HTMIME.c.  I'm not sure, though, what
> the implications might be, since there are obviously a number of other
> places that call HTMIME.c.  Is there a better way to handle the
> problem?
> 
> Also, can someone advise me on an appropriate code fragment to launch
> the converter?  I'm running in a CGI environment; that is, filedescriptor
> zero is a socket to the client, from which I'm supposed to read the form
> data, and the environment variable CONTENT_LENGTH contains the number of
> bytes of data.  I thought of using HTLoadSocket, but it
> doesn't seem to have any way to constrain the
> amount of data to read.

The way to do this is to return HT_LOADED from _your_ stream when you have 
read the number of bytes that you expect. Then this will be passed back to the 
socket read loop which will then terminate. You can see an example of this in 
the HTMIME.c module where we in order to support persistent connections - we 
return HT_LOADED when we have read content-length bytes.

-- 

Henrik Frystyk Nielsen, <frystyk@w3.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA

Received on Thursday, 25 January 1996 17:20:47 UTC