elaboration on MIME and HTTP issue

I thought I would elaborate on the issues I raised.  Roy and I have a
strong disagreement on what the 'consensus of the working group' might
be on this issue. I would like, therefore, to ask you all to respond.

Paul Hoffman has proposed some modified wording for the sections under
contention. I thought they were fine, except that they didn't go far
enough. Paul's proposed rewrite is marked with >, my comments are
unmarked, and my proposed rewrites are marked with #.

================================================================
> 1.4 HTTP and MIME

> HTTP/1.0 reuses many of the constructs defined for Multipurpose
> Internet Mail Extensions (RFC 1521, MIME [5]). Appendix C describes
> the differences between HTTP's use of Internet Media Types and those
> in RFC 1521, and the rationale for the differences. That appendix
> should be of interest to anyone who is implementing both HTTP and MIME
> applications, and anyone implementing gateways between HTTP and MIME
> environments.

I think 'reuses' should be changed to 'uses'. Appendix C describes how
HTTP uses those media types. As is becoming clear in the proposed
revisions to MIME, there will be a separation between Internet Media
Type registration and mail transport, so that 'the difference between
HTTP and MIME' is confusing and misleading.  I would prefer:

# HTTP/1.0 uses many of the constructs defined for MIME, as defined in
# RFC 1521. Appendix C describes the ways in which the context of HTTP
# allows for different use of Internet Media Types than in typical 
# transportation in email network, and gives the rationale for those
# differences.

================================================================
> 3.6  Media Types

> HTTP uses Internet Media Types [13] in the Content-Type header field
> (Section 10.5) in order to provide open and extensible data typing.
> The use of Internet Media Types in HTTP is similar to that used in
> MIME [5].

There's no need to distinguish (here) HTTP's use of Internet Media
Types. The sentence 'The use of Internet Media Types in HTTP ...' can
and should be dropped.

================================================================
>    media-type     = type "/" subtype *( ";" parameter )
>    type           = token
>    subtype        = token

> Parameters may follow the type/subtype in the form of
> attribute/value pairs.

>    parameter      = attribute "=" value
>    attribute      = token
>    value          = token | quoted-string

> The type, subtype, and parameter attribute names are
> case-insensitive. Parameter values may or may not be
> case-sensitive, depending on the semantics of the parameter name.
> LWS must not be generated between the type and subtype, nor between
> an attribute and its value.

> Some older HTTP applications do not recognize media type parameters.
> HTTP/1.0 applications should only use media type parameters when they
> are necessary to define the content of a message.

this is all fine.

================================================================
> If a given media-type value has been registered by the IANA, any
> use of that value must be indicative of the registered data format.
> Although HTTP allows the use of non-registered media types, such
> usage must not conflict with the IANA registry. Data providers are
> strongly encouraged to register their media types with IANA via the
> procedures outlined in RFC 1590 [13]. However, data providers are
> even more strongly encouraged to use registered types in order to
> prevent non-interoperability.

This is needlessly confusing. Non-standard media types are
non-standard. Standard media types are standard. Unregistered types
are not registered.  Interoperability suffers if senders send types to
recipients who don't understand what the types mean. There's no need
to tie ourselves into knots about the whole 'x-' issue, when we can
say:

# Media-type values are registered with the Internet Assigned Number
# Authority (IANA). The media type registration process is outlined in
# RFC 1590 [13]. Use of non-registered media types is discouraged.

================================================================

> 3.6.1 Canonicalization and Text Defaults

> Media types are registered in a canonical form. In general, entity
> bodies transferred via HTTP must be represented in the appropriate
> canonical form prior to transmission. If the body has been encoded
> via a Content-Encoding, the data must be in canonical form prior to
> that encoding.

> HTTP defines a canonical form for text media. Text media are any media
> of primary type "text", as well "application" types consisting of
> text-like records. The HTTP canonical form for text media allows three
> different octet sequences to indicate a text line break: CRLF, CR, or LF.
> The CRLF form is the preferred form.

As I stated earlier, there can only be one 'canonical' form for a
single object. It would seem to be a bad idea to disallow translation
of a text/ object from CR to CRLF encoding as a valid alternate
representation of the object.

> In addition to the preferred form of CRLF, HTTP applications must
> accept a bare CR or LF alone as representing a single line break in
> text media. 

The CRLF convention for end of line is not necessarily 'preferred' it
is just 'canonical'. 

>               Furthermore, if the text media is represented in a
> character set which does not use octets 13 and 10 for CR and LF
> respectively, as is the case for some multi-byte character sets, HTTP
> allows the use of whatever octet sequence(s) is defined by that
> character set to represent the equivalent of CRLF, bare CR, and bare
> LF. It is assumed that any recipient capable of using such a character
> set will know the appropriate octet sequence for representing line
> breaks within that character set.

Whether or not you might go along with this, it certainly doesn't
correspond to current practice.

>   Note: This interpretation of line breaks applies only to the
>   contents of an Entity-Body and only after any
>   Content-Encoding has been removed. All other HTTP constructs
>   use CRLF exclusively to indicate a line break. Content
>   codings define their own line break requirements.

> HTTP defines the default character set for text media in an entity
> body to be "ISO-8859-1". If a textual media type defines a charset
> parameter with a registered default value of "US-ASCII", an HTTP
> program changes the default to be "ISO-8859-1". Since the ISO-8859-1
> [18] character set is a superset of US-ASCII [17], this has no effect
> upon the interpretation of entity bodies which only contain octets
> within the US-ASCII set (0 - 127). The presence of a charset parameter
> value in a Content-Type header field overrides the default.

> It is recommended that the character set of an entity body be
> labelled as the lowest common denominator of the character codes
> used within a document, with the exception that no label is
> preferred over the labels US-ASCII or ISO-8859-1.

As we've seen, current practice is really that the recipient guesses
the character set if it isn't labelled. And we've already recommended
that parameters not be used, so we shouldn't then turn around and
recommend that the character set be labelled.

# Internet media types are registered with a canonical form.  In
# general, Entity-Bodies transferred via HTTP must be represented in
# the appropriate canonical form prior to the application of
# Content-Encoding, if any, and transmission.

# Media of-top level type "text" use CRLF as the text line break when
# in canonical form. However, HTTP allows the transport of text media
# not only in the canonical form with CRLF line breaks, but also with
# CR or LF alone, used consistently within the Entity-Body. This
# flexibility only applies to Entity-Bodies and not HTTP or multipart
# headers.

and then to add something to the following effect:

# Internet media types of primary type "text" are defined to have
# a default charset parameter of "US-ASCII", and that other charset
# parameters should be labelled. In practice, HTTP servers frequently
# send text data without a charset parameter, and expect clients to
# guess the character set of the result.

================================================================
> 3.6.2 Multipart Types

> MIME provides for a number of "multipart" types -- encapsulations of
> several entities within a single message's Entity-Body. The multipart
> types registered by IANA [15] do not have any special meaning for
> HTTP, though user agents may need to understand each type in order to
> correctly interpret the purpose of each body-part. An HTTP user agent
> should follow the same or similar behavior as a MIME user agent does
> upon receipt of a multipart type. HTTP servers should not assume that
> all HTTP clients are prepared to handle multipart types.

> All multipart types share a common syntax and must include a boundary
> parameter as part of the media type value. The message body is itself
> a protocol element and must therefore use only CRLF to represent line
> breaks between body-parts. Multipart body-parts may contain HTTP
> header fields which are significant to the meaning of that part.

This isn't entirely accurate, since many HTTP servers do have a 
special meaning to 'multipart' types, e.g., multipart/form-data for
file upload. I guess this is OK, though.

================================================================
> 10.12  MIME-Version

> As described in Appendix C, HTTP differs from MIME in some ways.
> HTTP/1.0 messages may include a single MIME-Version general-header
> field to indicate what version of the MIME protocol was used to
> construct the message. Use of the MIME-Version header field should
> indicate that the message is in full compliance with the MIME
> protocol (as defined in [5]).

> HTTP/1.0 applications must only use MIME-Version when the message is
> fully MIME-compliant. Unfortunately, some older versions of HTTP/1.0
> clients and servers use this field indiscriminately, and thus
> recipients must not take it for granted that the message is indeed in
> full compliance with MIME just because it contains this field. Proxies
> and gateways are responsible for ensuring this compliance (where
> possible) when exporting HTTP messages to strict MIME environments.

>    MIME-Version   = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT

Why don't we just say that some HTTP/1.0 applications send a
MIME-Version field, but it is meaningless? (And say this in the
appendix).

================================================================
I have numerous comments about the appendix C ("Relationship to MIME")
but they are all derivative of the changes implied by those in the
specification. If we can get some clarity on the main issues, perhaps
we can work the appendix into shape, too.

Received on Thursday, 18 January 1996 16:53:13 UTC