Round 4: moving HTTP 1.0 to informational

The following is what I really, really hope is the last round on the
HTTP/1.0 draft. Everything here has already been around the WG, and I've
tried to synthesize where there was eventual agreement. I admit that I
may have made some mistakes in choosing what was agreed on, but please
don't go into this looking for new things to haggle about.

The goal is to get a document out that someone can use to create
HTTP/1.0 software that can interoperate with most of today's existing
HTTP/1.0 software. Roy is ready to start incorporating the changes
into the master document. I'd like to have him start on this soon, so
please make known any places where I missed (or misstated) the
consensus of the WG.

A final note: there was some confusion at the end of the thread about
what kind of document this will be. It will be an "Informational" document,
not a "Best Current Practice" document. We moved away from BCP many
months ago.

--Paul Hoffman
--Internet Mail Consortium

*****The following sentences replace the last two sentences ("This
specification reflects..." and "This specification is not...") in the first
paragraph of section 1.1:*****

This specification describes the features that seem to be consistently
implemented in most HTTP/1.0 clients and servers. The specification is
split into two sections. Those features of HTTP for which implementations
are usually consistent are described in the main body of this document.
Those features which have few implementations or inconsistent ones are
listed in Appendix D.

*****Section 1.4 is added:*****

1.4 HTTP and MIME

HTTP/1.0 uses many of the constructs defined for MIME, as defined in RFC
1521. Appendix C describes the ways in which the context of HTTP allows for
different use of Internet Media Types than is typically found in
Internet mail, and gives the rationale for those differences.

*****Section 3.6 and its subsections are changed to:*****

3.6  Media Types

HTTP uses Internet Media Types [13] in the Content-Type header field
(Section 10.5) in order to provide open and extensible data typing.

   media-type     = type "/" subtype *( ";" parameter )
   type           = token
   subtype        = token

Parameters may follow the type/subtype in the form of
attribute/value pairs.

   parameter      = attribute "=" value
   attribute      = token
   value          = token | quoted-string

The type, subtype, and parameter attribute names are
case-insensitive. Parameter values may or may not be
case-sensitive, depending on the semantics of the parameter name.
LWS must not be generated between the type and subtype, nor between
an attribute and its value. Upon receipt of a media type with an
unrecognized parameter, a user agent should treat the media type
as if the unrecognized parameter and its value were not present.

Some older HTTP applications do not recognize media type parameters.
HTTP/1.0 applications should only use media type parameters when they
are necessary to define the content of a message.

Media-type values are registered with the Internet Assigned Number
Authority (IANA). The media type registration process is outlined in
RFC 1590 [13]. Use of non-registered media types is discouraged.

3.6.1 Canonicalization and Text Defaults

Internet media types are registered with a canonical form.  In
general, Entity-Bodies transferred via HTTP must be represented in
the appropriate canonical form prior to the transmission. If the body
has been encoded with a Content-Encoding, the underlying data should
be in canonical form prior to being encoded.

Media subtypes of the "text" type use CRLF as the text line break when
in canonical form.  However, HTTP allows the transport of text media
with plain CR or LF alone representing a line break when used consistently
within the Entity-Body. HTTP applications must accept CRLF, plain CR, and
plain LF as being representative of a line break in text media received
via HTTP.

In addition, if the text media is represented in a character set that
does not use octets 13 and 10 for CR and LF respectively, as is the
case for some multi-byte character sets, HTTP allows the use of
whatever octet sequences are defined by that character set to
represent the equivalent of CR and LF for line breaks. Because of
this, HTTP software should only convert CR, LF, and CRLF to the local
encoding for line breaks if the software recognizes the the charset
parameter. This flexibility regarding line breaks applies only to text
media in the Entity-Body; a bare CR or LF should not be substituted
for CRLF within any of the HTTP control structures (such as header
fields and multipart boundaries).

The "charset" parameter is used with some media types to define the
character set (Section 3.4) of the data.  When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
subtype are defined to have a default charset value of "ISO-8859-1"
when received via HTTP.  Data in character sets other than
"ISO-8859-1" or its subsets must be labelled with an appropriate
charset value in order to be consistently interpreted by user agents.

Note: Many current HTTP servers provide data using charsets other than
"ISO-8859-1" without proper labelling.  This situation reduces
interoperability and is not recommended. To compensate for this, some
HTTP user agents provide a configuration option to allow the user to
change the default interpretation of the media type character set when
no charset parameter is given.

3.6.2 Multipart Types

MIME provides for a number of "multipart" types -- encapsulations of
several entities within a single message's Entity-Body. The multipart types
registered by IANA [15] do not have any special meaning for HTTP/1.0, though
user agents may need to understand each type in order to correctly
interpret the purpose of each body-part. An HTTP user agent should follow
the same or similar behavior as a MIME user agent does upon receipt of a
multipart type. HTTP servers should not assume that all HTTP clients are
prepared to handle multipart types.

All multipart types share a common syntax and must include a boundary
parameter as part of the media type value. The message body is itself a
protocol element and must therefore use only CRLF to represent line breaks
between body-parts. Multipart body-parts may contain HTTP header fields
which are significant to the meaning of that part.

*****Section 9.3 has two additions*****

At the end of each of the 301 and 302 error descriptions, the
following paragraph is added:

Note: When automatically redirecting a POST request after receiving a
[301|302] status code, some HTTP/1.0 user agents will erroneously
change the method of the request to GET.

*****Section 10.12 is moved to Appendix D.2.7*****

*****Section 12.5 is added:*****

12.5  Attacks Based On File and Path Names

Implementations of the HTTP origin servers should be careful to
restrict the documents returned by HTTP requests to be only those that
were intended by the administrators. If an HTTP server translates HTTP
URIs directly into file system calls, the server must take special
care not to serve files that were not intended to be delivered to HTTP
clients. For example, Unix, Microsoft Windows, and other operating
systems use ".." as a path component to indicate a directory level
above the current one. On such a system, an HTTP server must disallow
any such construct in the Request-URI if it would otherwise allow
access to a resource outside those intended to be accessible via the
HTTP server. Similarly, files intended for reference only internally
to the server (such as access control files, configuration files, and
script code) must be protected from inappropriate retrieval, since
they might contain sensitive information. Experience has shown that
minor bugs in such HTTP server implementations have turned into
security risks.

*****Appendix C is changed to:*****

C.  Relationship to MIME

HTTP/1.0 uses many of the constructs defined for Internet Mail (RFC 822
[7]) and the Multipurpose Internet Mail Extensions (RFC 1521, MIME [5]) to
allow entities to be transmitted in an open variety of representations and
with extensible mechanisms. However, RFC 1521 discusses email, and HTTP has
a few features that are different than those described in RFC 1521. These
differences were carefully chosen to optimize performance over 8-bit
networks, to give greatest freedom for creating new media-types, to make
date comparisons easier, and to acknowledge the practice of some early HTTP
servers and clients.

At the time this document was written, it is expected that RFC 1521 will be
revised. The revisions may include some of the practices found in HTTP/1.0
but not in RFC 1521.

This appendix describes specific areas where HTTP differs from RFC 1521.
Proxies and gateways to strict MIME environments should be aware of these
differences and provide the appropriate conversions where necessary.
Proxies and gateways from MIME environments to HTTP also need to be aware
of the differences because some conversions may be required.

C.1 Canonical Form and Line Breaks

RFC 1521 requires that an email entity be converted to canonical form prior
to being transferred, as described in Appendix G of RFC 1521 [5]. Section
3.6.1 of this document describes the forms allowed for subtypes of
the "text" media type
when transmitted over HTTP.

RFC 1521 requires that content that has the primary media type "text"
represent line breaks as CRLF and forbids the use of CR or LF outside of
line break sequences. HTTP allows CRLF, bare CR, and bare LF to indicate a
line break within text content when a message is transmitted over HTTP.

Where it is possible, a proxy or gateway from HTTP to a strict RFC 1521
environment protocol should translate all line breaks within the text media
types described in section 3.6.1 of this document to the RFC 1521 canonical
form of CRLF. Note, however, that this may be complicated by the presence
of HTTP content encoding and by the fact that HTTP allows the use of some
character sets which do not use octets 13 and 10 to represent CR and LF, as
is the case for some multi-byte character sets.

C.2  Conversion of Date Formats

HTTP/1.0 uses a small set of date formats to simplify the process of date
comparison; these are described in section 3.3 of this document. RFC 1521
allows a larger set of date formats. Proxies and gateways from other
protocols to HTTP should ensure that any Date header field present in a
message conforms to one of the HTTP/1.0 formats and rewrite the date if
necessary.

C.3  Introduction of Content-Encoding

RFC 1521 does not include any concept equivalent to HTTP/1.0's
Content-Encoding header field. Since this acts as a modifier on the media
type, proxies and gateways from HTTP to MIME-compliant protocols must
either change the value of the Content-Type header field or decode the
Entity-Body before forwarding the message. (Some experimental applications
of Content-Type for Internet mail have used a media-type parameter of
";conversions=<content-coding>" to perform an equivalent function as
Content-Encoding. However, this parameter is not part of RFC 1521.)

C.4  No Content-Transfer-Encoding

HTTP/1.0 does not use the Content-Transfer-Encoding (CTE) field of RFC
1521. Proxies and gateways from MIME-compliant protocols to HTTP must
remove any non-identity CTE ("quoted-printable" or "base64") encoding prior
to delivering the response message to an HTTP client.

Proxies and gateways from HTTP to MIME-compliant protocols are responsible
for ensuring that the message is in the correct format and encoding for
safe transport on that protocol. "Safe transport" is defined by the
limitations of the protocol being used. At a minimum, the CTE field of

Content-Transfer-Encoding: binary

should be added by the HTTP-to-MIME proxy or gateway if the gateway is
unwilling to apply a content transfer encoding.

C.5  HTTP Header Fields in Multipart Body-Parts

In RFC 1521, most header fields in multipart body-parts are generally
ignored unless the field name begins with "Content-". In HTTP/1.0,
multipart body-parts may contain HTTP header fields which are
significant to the meaning of that part.

*****Appendix D is added:*****

D.  Additional Features

This appendix documents protocol elements used by some existing HTTP
implementations, but not consistently and correctly across most HTTP/1.0
applications.  Implementors should be aware of these features, but cannot
rely upon their presence in, or interoperability with, other HTTP/1.0
applications.

D.1 Additional Request Methods

D.1.1 PUT

The PUT method requests that the enclosed entity be stored under the
supplied Request-URI. If the Request-URI refers to an already existing
resource, the enclosed entity should be considered as a modified version
of the one residing on the origin server. If the Request-URI does not
point to an existing resource, and that URI is capable of being defined
as a new resource by the requesting user agent, the origin server can
create the resource with that URI.

The fundamental difference between the POST and PUT requests is reflected
in the different meaning of the Request-URI. The URI in a POST request
identifies the resource that will handle the enclosed entity as data to be
processed. That resource may be a data-accepting process, a gateway to some
other protocol, or a separate entity that accepts annotations. In contrast,
the URI in a PUT request identifies the entity enclosed with the request --
the user agent knows what URI is intended and the server must not attempt
to apply the request to some other resource.

D.1.2 DELETE

The DELETE method requests that the origin server delete the
resource identified by the Request-URI.

D.1.3 LINK

The LINK method establishes one or more Link relationships between
the existing resource identified by the Request-URI and other
existing resources.

D.1.4 UNLINK

The UNLINK method removes one or more Link relationships from the
existing resource identified by the Request-URI.

D.2  Additional Header Field Definitions

D.2.1  Accept

The Accept request header field can be used to indicate a list of
media ranges which are acceptable as a response to the request. The
asterisk "*" character is used to group media types into ranges, with
"*/*" indicating all media types and "type/*" indicating all subtypes
of that type. The set of ranges given by the client should represent
what types are acceptable given the context of the request.

D.2.2  Accept-Charset

The Accept-Charset request header field can be used to indicate a
list of preferred character set encodings other than the default
US-ASCII and ISO-8859-1. This field allows clients capable of
understanding more comprehensive or special-purpose character set
encodings to signal that capability to a server which is capable of
representing documents in those character set encodings.

D.2.3  Accept-Encoding

The Accept-Encoding request header field is similar to Accept, but
restricts the content-coding values which are acceptable in the
response.

D.2.4  Accept-Language

The Accept-Language request header field is similar to Accept, but
restricts the set of natural languages that are preferred as a
response to the request.

D.2.5  Content-Language

The Content-Language entity header field describes the natural
language(s) of the intended audience for the enclosed entity. Note
that this may not be equivalent to all the languages used within the
entity.

D.2.6  Link

The Link entity header field provides a means for describing a
relationship between the entity and some other resource. An entity may
include multiple Link values. Links at the metainformation level
typically indicate relationships like hierarchical structure and
navigation paths.

D.2.7 MIME-Version

HTTP messages may include a single MIME-Version general-header field
to indicate what version of the MIME protocol was used to construct
the message. Use of the MIME-Version header field, as defined by RFC
1521 [5], should indicate that the message is MIME-conformant.
Unfortunately, some older HTTP/1.0 servers send it indiscriminately,
and thus this field should be ignored.

D.2.8  Retry-After

The Retry-After response header field can be used with a 503
(service unavailable) response to indicate how long the service is
expected to be unavailable to the requesting client. The value of
this field can be either an HTTP-date or an integer number of
seconds (in decimal) after the time of the response.

D.2.9  Title

The Title entity header field indicates the title of the entity.

D.2.10  URI

The URI-header entity field may contain some or all of the Uniform
Resource Identifiers (Section 3.2) by which the Request-URI
resource can be identified. There is no guarantee that the resource
can be accessed using the URI(s) specified.

Received on Friday, 16 February 1996 09:12:59 UTC