Re: Content-encoding and external message converters? from Henrik Frystyk Nielsen on 1996-01-11 (www-lib@w3.org from January to March 1996)

From: Henrik Frystyk Nielsen <frystyk@w3.org>
Date: Thu, 11 Jan 1996 18:25:11 -0500
To: kennykb@cobweb.crd.ge.com
Cc: Henrik Frystyk Nielsen <frystyk@w3.org>, www-lib@w3.org
Message-Id: <9601112325.AA12975@www20>
> 
> frystyk@w3.org (Henrik Frystyk Nielsen) said:
> > The MIME parser stream takes out the metainformation and adds it to 
> > the  anchor. When all the metainformation is parsed, thestream goes 
> > into  transparent mode and lets the body pass untouched. 
> 
> This comment raises something that I don't understand with the stream
> stack protocols.  There doesn't seem to be any means of handling
> conversions that relate to encodings and other envelope information.
> I'd like to expand the MIME parser to handle headers like
>
> 	Content-Transfer-Encoding: base64

The Library does currently not have any encodings/decoding algorithms but the 
encoding is registered as part of the anchor. Then the actual decoder stream 
can be inserted as part of the pipe stream. You can find a good example on how 
to insert a stream, for example in the HTTP module in the stream_pipe function.
It would be nice to have encodings being a part of the stream stack and this 
may be a new feature to add :-)

> 	Content-Type: message/partial

MIME multipart is handled in the Library in a stream on its own. You can find 
the implementation in HTBound.c. This stream is a converter just like the 
rfc822 message parser (HTMIME.c). When the full header has been parsed for a 
document has been parsed and a content type found, the stream stack is called 
to find a converter that handles this type. In the case of MIME multipart, we 
have a converter just like any other converter that goes through the stream 
and identifies the individual body parts. You can find the MIME multipart 
parser in HTBound.c. Each body part of the multipart message is then in turn 
parsed to a new RFC822 message parser and so on. This mechanism also works for 
nested Multipart messages.

> I'd also like to see support for the RFC1867 header line:
> 	Content-Disposition: form-data; name="xxx"; filename="yyy"
> but that looks easy to do using HTMIME_register.

There are two ways of handling new headers: a heavy and a light weight. The 
first is just like writing a new converter stream that can be registered in 
teh stream stack. This is often the situation when handling a new content 
type, for example

	messsage/external

The other is to use the existing extension mechanism in the MIME parser. The 
API for this is described in the HTHeader file or in the User's guide:

	http://www.w3.org/pub/WWW/Library/User/Using/MIME.html

I have included an example from the MIME spec (RFC 1521) where we have an 
example of nested multipart messages. The complete stream setup will then look
like something like:

	MIME
	  L____	Multipart
		    L__________ MIME
		    |		  L____	text/plain
		    |
		    L__________	MIME
		    |		  L____	Multipart
		    |			    L__________	MIME
		    |			    |		  L____ audio/basic
		    |			    |
		    |			    L__________ MIME
		    L__________ MIME			  L____ image/gif
		    |		  L____ text/plain

	etc...

The example looks like:

MIME-Version: 1.0
From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Subject: A multipart example
Content-Type: multipart/mixed;
     boundary=unique-boundary-1

This is the preamble area of a multipart message.
Mail readers that understand multipart format
should ignore this preamble.
If you are reading this text, you might want to
consider changing to a mail reader that understands
how to properly display multipart messages.
--unique-boundary-1

   ...Some text appears here...
[Note that the preceding blank line means
no header fields were given and this is text,
with charset US ASCII.  It could have been
done with explicit typing as in the next part.]

--unique-boundary-1
Content-type: text/plain; charset=US-ASCII

This could have been part of the previous part,
but illustrates explicit versus implicit
typing of body parts.

--unique-boundary-1
Content-Type: multipart/parallel;
     boundary=unique-boundary-2


--unique-boundary-2
Content-Type: audio/basic
Content-Transfer-Encoding: base64

   ... base64-encoded 8000 Hz single-channel
 mu-law-format audio data goes here....

--unique-boundary-2
Content-Type: image/gif
Content-Transfer-Encoding: base64

   ... base64-encoded image data goes here....

--unique-boundary-2--
THIS IS A EPILOG
--unique-boundary-1
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>Title</TITLE>
</HEAD>
This is some HTML text
</BODY>
</HTML>

-unique-boundary-1
Content-Type: message/rfc822

From: (mailbox in US-ASCII)
To: (address in US-ASCII)
Subject: (subject in US-ASCII)
Content-Type: Text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: Quoted-printable

   ... Additional text in ISO-8859-1 goes here ...

--unique-boundary-1
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>Title</TITLE>
</HEAD>
This is some MORE HTML text
</BODY>
</HTML>

--unique-boundary-1--

-- 

Henrik Frystyk Nielsen, <frystyk@w3.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA
Received on Thursday, 11 January 1996 18:25:45 UTC