[Prev][Next][Index][Thread]
Re: Content-encoding and external message converters?
-
To: kennykb@cobweb.crd.ge.com
-
Subject: Re: Content-encoding and external message converters?
-
From: Henrik Frystyk Nielsen <frystyk@w3.org>
-
Date: Thu, 11 Jan 1996 18:25:11 -0500
-
Cc: Henrik Frystyk Nielsen <frystyk@w3.org>, www-lib@w3.org
-
From frystyk@w3.org Thu Jan 11 18: 25:45 1996
-
Message-Id: <9601112325.AA12975@www20>
-
Reply-To: Henrik Frystyk Nielsen <frystyk@w3.org>
-
X-Mailer: exmh version 1.6.2 7/18/95
>
> frystyk@w3.org (Henrik Frystyk Nielsen) said:
> > The MIME parser stream takes out the metainformation and adds it to
> > the anchor. When all the metainformation is parsed, thestream goes
> > into transparent mode and lets the body pass untouched.
>
> This comment raises something that I don't understand with the stream
> stack protocols. There doesn't seem to be any means of handling
> conversions that relate to encodings and other envelope information.
> I'd like to expand the MIME parser to handle headers like
>
> Content-Transfer-Encoding: base64
The Library does currently not have any encodings/decoding algorithms but the
encoding is registered as part of the anchor. Then the actual decoder stream
can be inserted as part of the pipe stream. You can find a good example on how
to insert a stream, for example in the HTTP module in the stream_pipe function.
It would be nice to have encodings being a part of the stream stack and this
may be a new feature to add :-)
> Content-Type: message/partial
MIME multipart is handled in the Library in a stream on its own. You can find
the implementation in HTBound.c. This stream is a converter just like the
rfc822 message parser (HTMIME.c). When the full header has been parsed for a
document has been parsed and a content type found, the stream stack is called
to find a converter that handles this type. In the case of MIME multipart, we
have a converter just like any other converter that goes through the stream
and identifies the individual body parts. You can find the MIME multipart
parser in HTBound.c. Each body part of the multipart message is then in turn
parsed to a new RFC822 message parser and so on. This mechanism also works for
nested Multipart messages.
> I'd also like to see support for the RFC1867 header line:
> Content-Disposition: form-data; name="xxx"; filename="yyy"
> but that looks easy to do using HTMIME_register.
There are two ways of handling new headers: a heavy and a light weight. The
first is just like writing a new converter stream that can be registered in
teh stream stack. This is often the situation when handling a new content
type, for example
messsage/external
The other is to use the existing extension mechanism in the MIME parser. The
API for this is described in the HTHeader file or in the User's guide:
http://www.w3.org/pub/WWW/Library/User/Using/MIME.html
I have included an example from the MIME spec (RFC 1521) where we have an
example of nested multipart messages. The complete stream setup will then look
like something like:
MIME
L____ Multipart
L__________ MIME
| L____ text/plain
|
L__________ MIME
| L____ Multipart
| L__________ MIME
| | L____ audio/basic
| |
| L__________ MIME
L__________ MIME L____ image/gif
| L____ text/plain
etc...
The example looks like:
MIME-Version: 1.0
From: Nathaniel Borenstein <nsb@bellcore.com>
To: Ned Freed <ned@innosoft.com>
Subject: A multipart example
Content-Type: multipart/mixed;
boundary=unique-boundary-1
This is the preamble area of a multipart message.
Mail readers that understand multipart format
should ignore this preamble.
If you are reading this text, you might want to
consider changing to a mail reader that understands
how to properly display multipart messages.
--unique-boundary-1
...Some text appears here...
[Note that the preceding blank line means
no header fields were given and this is text,
with charset US ASCII. It could have been
done with explicit typing as in the next part.]
--unique-boundary-1
Content-type: text/plain; charset=US-ASCII
This could have been part of the previous part,
but illustrates explicit versus implicit
typing of body parts.
--unique-boundary-1
Content-Type: multipart/parallel;
boundary=unique-boundary-2
--unique-boundary-2
Content-Type: audio/basic
Content-Transfer-Encoding: base64
... base64-encoded 8000 Hz single-channel
mu-law-format audio data goes here....
--unique-boundary-2
Content-Type: image/gif
Content-Transfer-Encoding: base64
... base64-encoded image data goes here....
--unique-boundary-2--
THIS IS A EPILOG
--unique-boundary-1
Content-Type: text/html
<HTML>
<HEAD>
<TITLE>Title</TITLE>
</HEAD>
This is some HTML text
</BODY>
</HTML>
-unique-boundary-1
Content-Type: message/rfc822
From: (mailbox in US-ASCII)
To: (address in US-ASCII)
Subject: (subject in US-ASCII)
Content-Type: Text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: Quoted-printable
... Additional text in ISO-8859-1 goes here ...
--unique-boundary-1
Content-Type: text/html
<HTML>
<HEAD>
<TITLE>Title</TITLE>
</HEAD>
This is some MORE HTML text
</BODY>
</HTML>
--unique-boundary-1--
--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA