Re: Multiple Content-Location headers

Jacob,

> I am not very happy with changing an existing and already implemented
> IETF proposed standard in such a radical way. But maybe it is necessary.
> Let us examine the differences between how MHTML and HTTP uses Content-
> Location to see if they really need to be split into two different
> header fields.

The reason for changing from my previous align MHTML with HTTP 1.1 position
(which you also enunciate), to my employ a new header field position, was
because I was concerned that the MIME folding algorithm we apply to header
fields containing invalid URIs would be incompatible with HTTP 1.1. HTTP
1.1 can outlaw invalid URIs, MHTML has to both make them RFC822/MIME safe
and cope with them. The problematic text is:

4.4 Encoding and decoding of URIs in MIME header fields

4.4.1 Encoding of URIs containing inappropriate characters

Some documents may contain URIs with characters that are inappropriate for
an RFC 822 header, either because the URI itself has an incorrect syntax
according to [URL] or the URI syntax standard has been changed to allow
characters not previously allowed in MIME headers. These URIs cannot be
sent directly in a message header. If such a URI occurs, all spaces and
other illegal characters in it must be encoded using one of the methods
described in [MIME3] section 4. This encoding MUST only be done in the
header, not in the HTML text. Receiving clients must decode the [MIME3]
encoding in the heading before comparing URIs in body text to URIs in
Content-Location headers.

The charset parameter value "US-ASCII" SHOULD be used if the URI contains
no octets outside of the 7-bit range. If such octets are present, the
correct charset parameter value (derived e.g. from information about the
HTML document the URI was found in) SHOULD be used. If this cannot be
safely established, the value "UKNOWN-8BIT" [RFC 1428] MUST be used.

Note, that for the matching of URIs in text/html body parts to URIs in
Content-Location headers, the value of the charset parameter is irrelevant,
but that it may be relevant for other purposes, and that incorrect labeling
MUST, therefore, be avoided. Warning: Irrelevance of the charset parameter
may not be true in the future, if different character encodings of the same
non-English filename are used in HTML.


4.4.2 Folding of long URIs

Since MIME header fields have a limited length and long URIs can result in
Content-Location and Content-Base headers that exceed this length ,
Content-Location and Content-Base headers may have to be folded.

Encoding as discussed in clause 4.4.1 MUST be done before such folding.
After that, the folding can be done, using the algorithm defined in
[URLBODY] section 3.1.

4.4.3 Unfolding and decoding of received URLs in MIME header fields

Upon receipt, folded MIME header fields should be unfolded, and then any
MIME encoding should be removed, to retrieve the original URI.

Nick

Received on Friday, 16 January 1998 04:12:34 UTC