- From: Paul Hoffman <paulh@imc.org>
- Date: Fri, 16 Feb 1996 09:11:42 -0800
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
The following is what I really, really hope is the last round on the HTTP/1.0 draft. Everything here has already been around the WG, and I've tried to synthesize where there was eventual agreement. I admit that I may have made some mistakes in choosing what was agreed on, but please don't go into this looking for new things to haggle about. The goal is to get a document out that someone can use to create HTTP/1.0 software that can interoperate with most of today's existing HTTP/1.0 software. Roy is ready to start incorporating the changes into the master document. I'd like to have him start on this soon, so please make known any places where I missed (or misstated) the consensus of the WG. A final note: there was some confusion at the end of the thread about what kind of document this will be. It will be an "Informational" document, not a "Best Current Practice" document. We moved away from BCP many months ago. --Paul Hoffman --Internet Mail Consortium *****The following sentences replace the last two sentences ("This specification reflects..." and "This specification is not...") in the first paragraph of section 1.1:***** This specification describes the features that seem to be consistently implemented in most HTTP/1.0 clients and servers. The specification is split into two sections. Those features of HTTP for which implementations are usually consistent are described in the main body of this document. Those features which have few implementations or inconsistent ones are listed in Appendix D. *****Section 1.4 is added:***** 1.4 HTTP and MIME HTTP/1.0 uses many of the constructs defined for MIME, as defined in RFC 1521. Appendix C describes the ways in which the context of HTTP allows for different use of Internet Media Types than is typically found in Internet mail, and gives the rationale for those differences. *****Section 3.6 and its subsections are changed to:***** 3.6 Media Types HTTP uses Internet Media Types [13] in the Content-Type header field (Section 10.5) in order to provide open and extensible data typing. media-type = type "/" subtype *( ";" parameter ) type = token subtype = token Parameters may follow the type/subtype in the form of attribute/value pairs. parameter = attribute "=" value attribute = token value = token | quoted-string The type, subtype, and parameter attribute names are case-insensitive. Parameter values may or may not be case-sensitive, depending on the semantics of the parameter name. LWS must not be generated between the type and subtype, nor between an attribute and its value. Upon receipt of a media type with an unrecognized parameter, a user agent should treat the media type as if the unrecognized parameter and its value were not present. Some older HTTP applications do not recognize media type parameters. HTTP/1.0 applications should only use media type parameters when they are necessary to define the content of a message. Media-type values are registered with the Internet Assigned Number Authority (IANA). The media type registration process is outlined in RFC 1590 [13]. Use of non-registered media types is discouraged. 3.6.1 Canonicalization and Text Defaults Internet media types are registered with a canonical form. In general, Entity-Bodies transferred via HTTP must be represented in the appropriate canonical form prior to the transmission. If the body has been encoded with a Content-Encoding, the underlying data should be in canonical form prior to being encoded. Media subtypes of the "text" type use CRLF as the text line break when in canonical form. However, HTTP allows the transport of text media with plain CR or LF alone representing a line break when used consistently within the Entity-Body. HTTP applications must accept CRLF, plain CR, and plain LF as being representative of a line break in text media received via HTTP. In addition, if the text media is represented in a character set that does not use octets 13 and 10 for CR and LF respectively, as is the case for some multi-byte character sets, HTTP allows the use of whatever octet sequences are defined by that character set to represent the equivalent of CR and LF for line breaks. Because of this, HTTP software should only convert CR, LF, and CRLF to the local encoding for line breaks if the software recognizes the the charset parameter. This flexibility regarding line breaks applies only to text media in the Entity-Body; a bare CR or LF should not be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries). The "charset" parameter is used with some media types to define the character set (Section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" subtype are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets must be labelled with an appropriate charset value in order to be consistently interpreted by user agents. Note: Many current HTTP servers provide data using charsets other than "ISO-8859-1" without proper labelling. This situation reduces interoperability and is not recommended. To compensate for this, some HTTP user agents provide a configuration option to allow the user to change the default interpretation of the media type character set when no charset parameter is given. 3.6.2 Multipart Types MIME provides for a number of "multipart" types -- encapsulations of several entities within a single message's Entity-Body. The multipart types registered by IANA [15] do not have any special meaning for HTTP/1.0, though user agents may need to understand each type in order to correctly interpret the purpose of each body-part. An HTTP user agent should follow the same or similar behavior as a MIME user agent does upon receipt of a multipart type. HTTP servers should not assume that all HTTP clients are prepared to handle multipart types. All multipart types share a common syntax and must include a boundary parameter as part of the media type value. The message body is itself a protocol element and must therefore use only CRLF to represent line breaks between body-parts. Multipart body-parts may contain HTTP header fields which are significant to the meaning of that part. *****Section 9.3 has two additions***** At the end of each of the 301 and 302 error descriptions, the following paragraph is added: Note: When automatically redirecting a POST request after receiving a [301|302] status code, some HTTP/1.0 user agents will erroneously change the method of the request to GET. *****Section 10.12 is moved to Appendix D.2.7***** *****Section 12.5 is added:***** 12.5 Attacks Based On File and Path Names Implementations of the HTTP origin servers should be careful to restrict the documents returned by HTTP requests to be only those that were intended by the administrators. If an HTTP server translates HTTP URIs directly into file system calls, the server must take special care not to serve files that were not intended to be delivered to HTTP clients. For example, Unix, Microsoft Windows, and other operating systems use ".." as a path component to indicate a directory level above the current one. On such a system, an HTTP server must disallow any such construct in the Request-URI if it would otherwise allow access to a resource outside those intended to be accessible via the HTTP server. Similarly, files intended for reference only internally to the server (such as access control files, configuration files, and script code) must be protected from inappropriate retrieval, since they might contain sensitive information. Experience has shown that minor bugs in such HTTP server implementations have turned into security risks. *****Appendix C is changed to:***** C. Relationship to MIME HTTP/1.0 uses many of the constructs defined for Internet Mail (RFC 822 [7]) and the Multipurpose Internet Mail Extensions (RFC 1521, MIME [5]) to allow entities to be transmitted in an open variety of representations and with extensible mechanisms. However, RFC 1521 discusses email, and HTTP has a few features that are different than those described in RFC 1521. These differences were carefully chosen to optimize performance over 8-bit networks, to give greatest freedom for creating new media-types, to make date comparisons easier, and to acknowledge the practice of some early HTTP servers and clients. At the time this document was written, it is expected that RFC 1521 will be revised. The revisions may include some of the practices found in HTTP/1.0 but not in RFC 1521. This appendix describes specific areas where HTTP differs from RFC 1521. Proxies and gateways to strict MIME environments should be aware of these differences and provide the appropriate conversions where necessary. Proxies and gateways from MIME environments to HTTP also need to be aware of the differences because some conversions may be required. C.1 Canonical Form and Line Breaks RFC 1521 requires that an email entity be converted to canonical form prior to being transferred, as described in Appendix G of RFC 1521 [5]. Section 3.6.1 of this document describes the forms allowed for subtypes of the "text" media type when transmitted over HTTP. RFC 1521 requires that content that has the primary media type "text" represent line breaks as CRLF and forbids the use of CR or LF outside of line break sequences. HTTP allows CRLF, bare CR, and bare LF to indicate a line break within text content when a message is transmitted over HTTP. Where it is possible, a proxy or gateway from HTTP to a strict RFC 1521 environment protocol should translate all line breaks within the text media types described in section 3.6.1 of this document to the RFC 1521 canonical form of CRLF. Note, however, that this may be complicated by the presence of HTTP content encoding and by the fact that HTTP allows the use of some character sets which do not use octets 13 and 10 to represent CR and LF, as is the case for some multi-byte character sets. C.2 Conversion of Date Formats HTTP/1.0 uses a small set of date formats to simplify the process of date comparison; these are described in section 3.3 of this document. RFC 1521 allows a larger set of date formats. Proxies and gateways from other protocols to HTTP should ensure that any Date header field present in a message conforms to one of the HTTP/1.0 formats and rewrite the date if necessary. C.3 Introduction of Content-Encoding RFC 1521 does not include any concept equivalent to HTTP/1.0's Content-Encoding header field. Since this acts as a modifier on the media type, proxies and gateways from HTTP to MIME-compliant protocols must either change the value of the Content-Type header field or decode the Entity-Body before forwarding the message. (Some experimental applications of Content-Type for Internet mail have used a media-type parameter of ";conversions=<content-coding>" to perform an equivalent function as Content-Encoding. However, this parameter is not part of RFC 1521.) C.4 No Content-Transfer-Encoding HTTP/1.0 does not use the Content-Transfer-Encoding (CTE) field of RFC 1521. Proxies and gateways from MIME-compliant protocols to HTTP must remove any non-identity CTE ("quoted-printable" or "base64") encoding prior to delivering the response message to an HTTP client. Proxies and gateways from HTTP to MIME-compliant protocols are responsible for ensuring that the message is in the correct format and encoding for safe transport on that protocol. "Safe transport" is defined by the limitations of the protocol being used. At a minimum, the CTE field of Content-Transfer-Encoding: binary should be added by the HTTP-to-MIME proxy or gateway if the gateway is unwilling to apply a content transfer encoding. C.5 HTTP Header Fields in Multipart Body-Parts In RFC 1521, most header fields in multipart body-parts are generally ignored unless the field name begins with "Content-". In HTTP/1.0, multipart body-parts may contain HTTP header fields which are significant to the meaning of that part. *****Appendix D is added:***** D. Additional Features This appendix documents protocol elements used by some existing HTTP implementations, but not consistently and correctly across most HTTP/1.0 applications. Implementors should be aware of these features, but cannot rely upon their presence in, or interoperability with, other HTTP/1.0 applications. D.1 Additional Request Methods D.1.1 PUT The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity should be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI. The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity as data to be processed. That resource may be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server must not attempt to apply the request to some other resource. D.1.2 DELETE The DELETE method requests that the origin server delete the resource identified by the Request-URI. D.1.3 LINK The LINK method establishes one or more Link relationships between the existing resource identified by the Request-URI and other existing resources. D.1.4 UNLINK The UNLINK method removes one or more Link relationships from the existing resource identified by the Request-URI. D.2 Additional Header Field Definitions D.2.1 Accept The Accept request header field can be used to indicate a list of media ranges which are acceptable as a response to the request. The asterisk "*" character is used to group media types into ranges, with "*/*" indicating all media types and "type/*" indicating all subtypes of that type. The set of ranges given by the client should represent what types are acceptable given the context of the request. D.2.2 Accept-Charset The Accept-Charset request header field can be used to indicate a list of preferred character set encodings other than the default US-ASCII and ISO-8859-1. This field allows clients capable of understanding more comprehensive or special-purpose character set encodings to signal that capability to a server which is capable of representing documents in those character set encodings. D.2.3 Accept-Encoding The Accept-Encoding request header field is similar to Accept, but restricts the content-coding values which are acceptable in the response. D.2.4 Accept-Language The Accept-Language request header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. D.2.5 Content-Language The Content-Language entity header field describes the natural language(s) of the intended audience for the enclosed entity. Note that this may not be equivalent to all the languages used within the entity. D.2.6 Link The Link entity header field provides a means for describing a relationship between the entity and some other resource. An entity may include multiple Link values. Links at the metainformation level typically indicate relationships like hierarchical structure and navigation paths. D.2.7 MIME-Version HTTP messages may include a single MIME-Version general-header field to indicate what version of the MIME protocol was used to construct the message. Use of the MIME-Version header field, as defined by RFC 1521 [5], should indicate that the message is MIME-conformant. Unfortunately, some older HTTP/1.0 servers send it indiscriminately, and thus this field should be ignored. D.2.8 Retry-After The Retry-After response header field can be used with a 503 (service unavailable) response to indicate how long the service is expected to be unavailable to the requesting client. The value of this field can be either an HTTP-date or an integer number of seconds (in decimal) after the time of the response. D.2.9 Title The Title entity header field indicates the title of the entity. D.2.10 URI The URI-header entity field may contain some or all of the Uniform Resource Identifiers (Section 3.2) by which the Request-URI resource can be identified. There is no guarantee that the resource can be accessed using the URI(s) specified.
Received on Friday, 16 February 1996 09:12:59 UTC