Round 3: moving HTTP 1.0 to informational

The following is what I hope is the last round on the HTTP/1.0 draft.
Everything here *except for the Appendix C material* has already been
around the WG with no objection. The material from Appendix C is my
own rewrite that (hopefully) matches the material that Larry posted
last week about HTTP and RFC 1521.

Just as a brief (but pointed) reminder, this draft is supposed to say
approximately what is common to HTTP/1.0 software. Please limit your
comments to what is true, not what we would have wanted to be true. If
there is little new discussion, it can go to a final I-D in the middle of
next week, which would be nice for all of us.

--Paul Hoffman
--Internet Mail Consortium



*****The following paragraph is added after the first paragraph of section 1.1 (Purpose):*****

This specification reflects the approximate state of those features which
are found in most HTTP/1.0 implementations. The specification is split into
two sections. Those features of HTTP for which implementations are usually
consistent are described in the main body of this document. Those features
which have few implementations or inconsistent ones are listed in Appendix
D.

*****Section 1.4 is added:*****

1.4 HTTP and MIME

HTTP/1.0 uses many of the constructs defined for MIME, as defined in RFC
1521. Appendix C describes the ways in which the context of HTTP allows for
different use of Internet Media Types than in typical transportation in
email network, and gives the rationale for those differences.

*****Section 3.6 and its subsections are changed to:*****

3.6  Media Types

HTTP uses Internet Media Types [13] in the Content-Type header field
(Section 10.5) in order to provide open and extensible data typing.

   media-type     = type "/" subtype *( ";" parameter )
   type           = token
   subtype        = token

Parameters may follow the type/subtype in the form of
attribute/value pairs.

   parameter      = attribute "=" value
   attribute      = token
   value          = token | quoted-string

The type, subtype, and parameter attribute names are
case-insensitive. Parameter values may or may not be
case-sensitive, depending on the semantics of the parameter name.
LWS must not be generated between the type and subtype, nor between
an attribute and its value.

Some older HTTP applications do not recognize media type parameters.
HTTP/1.0 applications should only use media type parameters when they
are necessary to define the content of a message.

Media-type values are registered with the Internet Assigned Number
Authority (IANA). The media type registration process is outlined in
RFC 1590 [13]. Use of non-registered media types is discouraged.

3.6.1 Canonicalization and Text Defaults

Internet media types are registered with a canonical form.  In
general, Entity-Bodies transferred via HTTP must be represented in
the appropriate canonical form prior to the application of
Content-Encoding, if any, and transmission.

Media types of "text/*" use CRLF as the text line break when in canonical
form. However, HTTP allows the transport of text media not only in the
canonical form with CRLF line breaks, but also with CR or LF alone, used
consistently within the Entity-Body. This flexibility only applies to
Entity-Bodies and not HTTP or multipart headers.

Media types of "text/*" are defined to have a default charset parameter of
"US-ASCII", and that other charset parameters should be labelled. In
practice, HTTP servers frequently send text data without a charset
parameter, and expect clients to guess the character set of the result.
This has caused a great deal of confusion and lack of interoperability in
HTTP 1.0 clients and servers.

3.6.2 Multipart Types

MIME provides for a number of "multipart" types -- encapsulations of
several entities within a single message's Entity-Body. The multipart types
registered by IANA [15] do not have any special meaning for HTTP, though
user agents may need to understand each type in order to correctly
interpret the purpose of each body-part. An HTTP user agent should follow
the same or similar behavior as a MIME user agent does upon receipt of a
multipart type. HTTP servers should not assume that all HTTP clients are
prepared to handle multipart types.

All multipart types share a common syntax and must include a boundary
parameter as part of the media type value. The message body is itself a
protocol element and must therefore use only CRLF to represent line breaks
between body-parts. Multipart body-parts may contain HTTP header fields
which are significant to the meaning of that part.

*****Section 10.12 is changed to:*****

10.12  MIME-Version

Some HTTP/1.0 applications send a MIME-Version field in the following
format:

   MIME-Version   = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT

However, this field has not been well-defined should be ignored.


*****Section 12.5 is added:*****

12.5  Attacks Based On File and Path Names

Implementations of the HTTP servers should be careful to restrict the
documents returned by HTTP requests to be only those that were intended
by the administrators. If an HTTP server translates HTTP URIs directly
into file system calls, the server must take special care not to serve
files that were not intended to be delivered to HTTP clients. For
example, Unix, Microsoft Windows, and other operating systems use ".."
as a path component to indicate a directory level above the current one.
On such a system, an HTTP server must disallow any such construct in the
Request-URI if it would otherwise allow access to a resource outside
those intended to be accessible via the HTTP server. Similarly, files
intended for reference only internally to the server (such as access
control files, configuration files, and script code) must be protected
from inappropriate retrieval, since they might contain sensitive
information. Experience has shown that minor bugs in such HTTP server
implementations have turned into security risks.

*****Appendix C is changed to:*****

C.  Relationship to MIME

HTTP/1.0 uses many of the constructs defined for Internet Mail (RFC 822
[7]) and the Multipurpose Internet Mail Extensions (RFC 1521, MIME [5]) to
allow entities to be transmitted in an open variety of representations and
with extensible mechanisms. However, RFC 1521 discusses email, and HTTP has
a few features that are different than those described in RFC 1521. These
differences were carefully chosen to optimize performance over 8-bit
networks, to give greatest freedom for creating new media-types, to make
date comparisons easier, and to acknowledge the practice of some early HTTP
servers and clients.

At the time this document was written, it is expected that RFC 1521 will be
revised. The revisions may include some of the practices found in HTTP/1.0
but not in RFC 1521.

This appendix describes specific areas where HTTP differs from RFC 1521.
Proxies and gateways to strict MIME environments should be aware of these
differences and provide the appropriate conversions where necessary.
Proxies and gateways from MIME environments to HTTP also need to be aware
of the differences because some conversions may be required.

C.1 Canonical Form and Line Breaks

RFC 1521 requires that an email entity be converted to canonical form prior
to being transferred, as described in Appendix G of RFC 1521 [5]. Section
3.6.1 of this document describes the forms allowed for "text/*" media types
when transmitted over HTTP.

RFC 1521 requires that content that has the primary media type "text"
represent line breaks as CRLF and forbids the use of CR or LF outside of
line break sequences. HTTP allows CRLF, bare CR, and bare LF to indicate a
line break within text content when a message is transmitted over HTTP.

Where it is possible, a proxy or gateway from HTTP to a strict RFC 1521
environment protocol should translate all line breaks within the text media
types described in section 3.6.1 of this document to the RFC 1521 canonical
form of CRLF. Note, however, that this may be complicated by the presence
of HTTP content encoding and by the fact that HTTP allows the use of some
character sets which do not use octets 13 and 10 to represent CR and LF, as
is the case for some multi-byte character sets. If HTTP-to-MIME
canonicalization is performed, the value of a Content-Length header field
of the HTTP data must be updated to reflect the new body length.

C.2  Conversion of Date Formats

HTTP/1.0 uses a small set of date formats to simplify the process of date
comparison; these are described in section 3.3 of this document. RFC 1521
allows a larger set of date formats. Proxies and gateways from other
protocols to HTTP should ensure that any Date header field present in a
message conforms to one of the HTTP/1.0 formats and rewrite the date if
necessary.

C.3  Introduction of Content-Encoding

RFC 1521 does not include any concept equivalent to HTTP/1.0's
Content-Encoding header field. Since this acts as a modifier on the media
type, proxies and gateways from HTTP to MIME-compliant protocols must
either change the value of the Content-Type header field or decode the
Entity-Body before forwarding the message. (Some experimental applications
of Content-Type for Internet mail have used a media-type parameter of
";conversions=<content-coding>" to perform an equivalent function as
Content-Encoding. However, this parameter is not part of RFC 1521.)

C.4  No Content-Transfer-Encoding

HTTP/1.0 does not use the Content-Transfer-Encoding (CTE) field of RFC
1521. Proxies and gateways from MIME-compliant protocols to HTTP must
remove any non-identity CTE ("quoted-printable" or "base64") encoding prior
to delivering the response message to an HTTP client.

Proxies and gateways from HTTP to MIME-compliant protocols are responsible
for ensuring that the message is in the correct format and encoding for
safe transport on that protocol. "Safe transport" is defined by the
limitations of the protocol being used. At a minimum, the CTE field of

Content-Transfer-Encoding: binary

should be added by the HTTP-to-MIME proxy or gateway if the gateway is
unwilling to apply a content transfer encoding.

An HTTP client may include a Content-Transfer-Encoding as an extension
Entity-Header in a POST request when it knows the destination of that
request is a proxy or gateway to a MIME-compliant protocol.

C.5  HTTP Header Fields in Multipart Body-Parts

In RFC 1521, the header fields in multipart body-parts are generally
ignored. In HTTP/1.0, multipart body-parts may contain HTTP header fields
which are significant to the meaning of that part.

*****Appendix D is added:*****

D.  Additional Features

This appendix documents features which were was not strong consensus in
the IETF HTTP Working Group, or for which there were not a sufficient
number of interoperable implementations. In some cases, there was strong
consensus that the feature was needed but disagreement about how it
should be implemented. In other cases, there was no general agreement on
the feature. Implementors who add the features in the Appendix should be
aware that software using these features are less likely to be
interoperable than software using the features from the main part of
this specification.

The specifications in this section are shorter than they were in earlier
drafts of the HTTP 1.0 specification. Some implementions of the features
in this appendix are based on fuller descriptions of the features.

D.1 Additional Request Methods

D.1.1 PUT

The PUT method requests that the enclosed entity be stored under the
supplied Request-URI. If the Request-URI refers to an already existing
resource, the enclosed entity should be considered as a modified version
of the one residing on the origin server. If the Request-URI does not
point to an existing resource, and that URI is capable of being defined
as a new resource by the requesting user agent, the origin server can
create the resource with that URI.

The fundamental difference between the POST and PUT requests is reflected
in the different meaning of the Request-URI. The URI in a POST request
identifies the resource that will handle the enclosed entity as data to be
processed. That resource may be a data-accepting process, a gateway to some
other protocol, or a separate entity that accepts annotations. In contrast,
the URI in a PUT request identifies the entity enclosed with the request --
the user agent knows what URI is intended and the server must not attempt
to apply the request to some other resource.

D.1.2 DELETE

The DELETE method requests that the origin server delete the 
resource identified by the Request-URI.

D.1.3 LINK

The LINK method establishes one or more Link relationships between 
the existing resource identified by the Request-URI and other 
existing resources.

D.1.4 UNLINK

The UNLINK method removes one or more Link relationships from the 
existing resource identified by the Request-URI.

D.2  Additional Header Field Definitions

This section defines the syntax and semantics of all standard 
HTTP/1.0 header fields. For Entity-Header fields, both sender and 
recipient refer to either the client or the server, depending on 
who sends and who receives the entity.

D.2.1  Accept

The Accept header field can be used to indicate a list of media 
ranges which are acceptable as a response to the request. The 
asterisk "*" character is used to group media types into ranges, 
with "*/*" indicating all media types and "type/*" indicating all 
subtypes of that type. The set of ranges given by the client should 
represent what types are acceptable given the context of the 
request.

D.2.2  Accept-Charset

The Accept-Charset request header field can be used to indicate a 
list of preferred character set encodings other than the default
US-ASCII and ISO-8859-1. This field allows clients capable of 
understanding more comprehensive or special-purpose character set 
encodings to signal that capability to a server which is capable of 
representing documents in those character set encodings.

D.2.3  Accept-Encoding

The Accept-Encoding request header field is similar to Accept, but 
restricts the encoding-mechanism values which are acceptable in the 
response.

D.2.4  Accept-Language

The Accept-Language request header field is similar to Accept, but 
restricts the set of natural languages that are preferred as a 
response to the request.

D.2.5  Content-Language

The Content-Language field describes the natural language(s) of the 
intended audience for the enclosed entity. Note that this may not 
be equivalent to all the languages used within the entity.

D.2.6  Link

The Link header provides a means for describing a relationship 
between the entity and some other resource. An entity may include 
multiple Link values. Links at the metainformation level typically 
indicate relationships like hierarchical structure and navigation 
paths.

D.2.7  Retry-After

The Retry-After response header field can be used with a 503 
(service unavailable) response to indicate how long the service is 
expected to be unavailable to the requesting client. The value of 
this field can be either an HTTP-date or an integer number of 
seconds (in decimal) after the time of the response.

D.2.8  Title

The Title header field indicates the title of the entity.

D.2.9  URI

The URI-header field may contain some or all of the Uniform 
Resource Identifiers (Section 3.2) by which the Request-URI 
resource can be identified. There is no guarantee that the resource 
can be accessed using the URI(s) specified. 

Received on Thursday, 25 January 1996 20:10:32 UTC