W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 1996

(MEDIATYPES) consensus.

From: <jg@w3.org>
Date: Fri, 29 Mar 96 16:27:23 -0500
Message-Id: <9603292127.AA22026@zorch.w3.org>
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Included below are diffs made to the HTTP 1.0 specification between
Draft 4 (which the HTTP 1.1 specification was based on) to draft 5 to
clarify its relationship to MIME.  The same modifications will be
made to the HTTP 1.1 specification to bring it up to the same
explanations as already exist in the 1.0 draft.

If there are any problems you have with this, please let me know, or
I will believe consensus exists to make the same changes to the 1.1 draft
to close this issue.
				- Jim


***************
*** ???,??? ****
+ 1.4  HTTP and MIME
+ 
+    HTTP/1.0 uses many of the constructs defined for MIME, as defined 
+    in RFC 1521 [5]. Appendix C describes the ways in which the context 
+    of HTTP allows for different use of Internet Media Types than is 
+    typically found in Internet mail, and gives the rationale for those 
+    differences.
+ 
***************
*** 656,658 ****
  
!        pchar          = uchar | ":" | "@" | "&" | "="
         uchar          = unreserved | escape
--- 696,698 ----
  
!        pchar          = uchar | ":" | "@" | "&" | "=" | "+"
         uchar          = unreserved | escape
***************
*** 660,670 ****
  
!        escape         = "%" hex hex
!        hex            = "A" | "B" | "C" | "D" | "E" | "F"
!                       | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT
! 
!        reserved       = ";" | "/" | "?" | ":" | "@" | "&" | "="
!        safe           = "$" | "-" | "_" | "." | "+"
         extra          = "!" | "*" | "'" | "(" | ")" | ","
!        national       = <any OCTET excluding CTLs, SP,
!                          ALPHA, DIGIT, reserved, safe, and extra>
  
--- 700,708 ----
  
!        escape         = "%" HEX HEX
!        reserved       = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
         extra          = "!" | "*" | "'" | "(" | ")" | ","
!        safe           = "$" | "-" | "_" | "."
!        unsafe         = CTL | SP | <"> | "#" | "%" | "<" | ">"
!        national       = <any OCTET excluding ALPHA, DIGIT,
!                         reserved, extra, safe, and unsafe>
  
***************
*** 790,791 ****
--- 828,834 ----
  
+        Note: This use of the term "character set" is more commonly 
+        referred to as a "character encoding." However, since HTTP 
+        and MIME share the same registry, it is important that the 
+        terminology also be shared.
+ 
     HTTP character sets are identified by case-insensitive tokens. The 
***************
*** 814,819 ****
  
!        Note: This use of the term "character set" is more commonly 
!        referred to as a "character encoding." However, since HTTP 
!        and MIME share the same registry, it is important that the 
!        terminology also be shared.
  
--- 857,862 ----
  
!    The character set of an entity body should be labelled as the 
!    lowest common denominator of the character codes used within that 
!    body, with the exception that no label is preferred over the labels 
!    US-ASCII or ISO-8859-1.
  
***************
*** 845,849 ****
         "gzip" (GNU zip) developed by Jean-loup Gailly. This format is 
!        typically a Lempel-Ziv coding (LZ77) with a 32 bit CRC. Gzip is 
!        available from the GNU project at 
!        <URL:ftp://prep.ai.mit.edu/pub/gnu/>.
  
--- 888,890 ----
         "gzip" (GNU zip) developed by Jean-loup Gailly. This format is 
!        typically a Lempel-Ziv coding (LZ77) with a 32 bit CRC.
  
***************
*** 863,871 ****
     field (Section 10.5) in order to provide open and extensible data 
!    typing. For mail applications, where there is no type negotiation 
!    between sender and recipient, it is reasonable to put strict limits 
!    on the set of allowed media types. With HTTP, where the sender and 
!    recipient can communicate directly, applications are allowed more 
!    freedom in the use of non-registered types. The following grammar 
!    for media types is a superset of that for MIME because it does not 
!    restrict itself to the official IANA and x-token types.
  
--- 904,906 ----
     field (Section 10.5) in order to provide open and extensible data 
!    typing.
  
***************
*** 886,908 ****
     LWS must not be generated between the type and subtype, nor between 
!    an attribute and its value.
  
!    Many current applications do not recognize media type parameters. 
!    Since parameters are a fundamental aspect of media types, this must 
!    be considered an error in those applications. Nevertheless, 
!    HTTP/1.1 applications should only use media type parameters when 
!    they are necessary to define the content of a message.
  
!    If a given media-type value has been registered by the IANA, any 
!    use of that value must be indicative of the registered data format. 
!    Although HTTP allows the use of non-registered media types, such 
!    usage must not conflict with the IANA registry. Data providers are 
!    strongly encouraged to register their media types with IANA via the 
!    procedures outlined in RFC 1590 [13].
  
-    All media-type's registered by IANA must be preferred over 
-    extension tokens. However, HTTP does not limit applications to the 
-    use of officially registered media types, nor does it encourage the 
-    use of an "x-" prefix for unofficial types outside of explicitly 
-    short experimental use between consenting applications.
- 
  3.6.1 Canonicalization and Text Defaults
--- 921,936 ----
     LWS must not be generated between the type and subtype, nor between 
!    an attribute and its value. Upon receipt of a media type with an 
!    unrecognized parameter, a user agent should treat the media type as 
!    if the unrecognized parameter and its value were not present.
  
!    Some older HTTP applications do not recognize media type 
!    parameters. HTTP/1.0 applications should only use media type 
!    parameters when they are necessary to define the content of a 
!    message.
  
!    Media-type values are registered with the Internet Assigned Number 
!    Authority (IANA [15]). The media type registration process is 
!    outlined in RFC 1590 [13]. Use of non-registered media types is 
!    discouraged.
  
  3.6.1 Canonicalization and Text Defaults
***************
*** 909,958 ****
  
!    Media types are registered in a canonical form. In general, entity 
!    bodies transferred via HTTP must be represented in the appropriate 
!    canonical form prior to transmission. If the body has been encoded 
!    via a Content-Encoding, the data must be in canonical form prior to 
!    that encoding. However, HTTP modifies the canonical form 
!    requirements for media of primary type "text" and for "application" 
!    types consisting of text-like records.
  
!    HTTP redefines the canonical form of text media to allow multiple 
!    octet sequences to indicate a text line break. In addition to the 
!    preferred form of CRLF, HTTP applications must accept a bare CR or 
!    LF alone as representing a single line break in text media. 
!    Furthermore, if the text media is represented in a character set 
!    which does not use octets 13 and 10 for CR and LF respectively, as 
     is the case for some multi-byte character sets, HTTP allows the use 
!    of whatever octet sequence(s) is defined by that character set to 
!    represent the equivalent of CRLF, bare CR, and bare LF. It is 
!    assumed that any recipient capable of using such a character set 
!    will know the appropriate octet sequence for representing line 
!    breaks within that character set.
  
!        Note: This interpretation of line breaks applies only to the 
!        contents of an Entity-Body and only after any 
!        Content-Encoding has been removed. All other HTTP constructs 
!        use CRLF exclusively to indicate a line break. Content 
!        codings define their own line break requirements.
  
!    A recipient of an HTTP text entity should translate the received 
!    entity line breaks to the local line break conventions before 
!    saving the entity external to the application and its cache; 
!    whether this translation takes place immediately upon receipt of 
!    the entity, or only when prompted by the user, is entirely up to 
!    the individual application.
  
-    HTTP also redefines the default character set for text media in an 
-    entity body. If a textual media type defines a charset parameter 
-    with a registered default value of "US-ASCII", HTTP changes the 
-    default to be "ISO-8859-1". Since the ISO-8859-1 [18] character set 
-    is a superset of US-ASCII [17], this has no effect upon the 
-    interpretation of entity bodies which only contain octets within 
-    the US-ASCII set (0 - 127). The presence of a charset parameter 
-    value in a Content-Type header field overrides the default.
- 
-    It is recommended that the character set of an entity body be 
-    labelled as the lowest common denominator of the character codes 
-    used within a document, with the exception that no label is 
-    preferred over the labels US-ASCII or ISO-8859-1.
- 
  3.6.2 Multipart Types
--- 937,978 ----
  
!    Internet media types are registered with a canonical form. In 
!    general, an Entity-Body transferred via HTTP must be represented in 
!    the appropriate canonical form prior to its transmission. If the 
!    body has been encoded with a Content-Encoding, the underlying data 
!    should be in canonical form prior to being encoded.
  
!    Media subtypes of the "text" type use CRLF as the text line break 
!    when in canonical form. However, HTTP allows the transport of text 
!    media with plain CR or LF alone representing a line break when used 
!    consistently within the Entity-Body. HTTP applications must accept 
!    CRLF, bare CR, and bare LF as being representative of a line break 
!    in text media received via HTTP.
! 
!    In addition, if the text media is represented in a character set 
!    that does not use octets 13 and 10 for CR and LF respectively, as 
     is the case for some multi-byte character sets, HTTP allows the use 
!    of whatever octet sequences are defined by that character set to 
!    represent the equivalent of CR and LF for line breaks. This 
!    flexibility regarding line breaks applies only to text media in the 
!    Entity-Body; a bare CR or LF should not be substituted for CRLF 
!    within any of the HTTP control structures (such as header fields 
!    and multipart boundaries).
  
!    The "charset" parameter is used with some media types to define the 
!    character set (Section 3.4) of the data. When no explicit charset 
!    parameter is provided by the sender, media subtypes of the "text" 
!    type are defined to have a default charset value of "ISO-8859-1" 
!    when received via HTTP. Data in character sets other than 
!    "ISO-8859-1" or its subsets must be labelled with an appropriate 
!    charset value in order to be consistently interpreted by the 
!    recipient.
  
!        Note: Many current HTTP servers provide data using charsets 
!        other than "ISO-8859-1" without proper labelling. This 
!        situation reduces interoperability and is not recommended. 
!        To compensate for this, some HTTP user agents provide a 
!        configuration option to allow the user to change the default 
!        interpretation of the media type character set when no 
!        charset parameter is given.
  
  3.6.2 Multipart Types
***************
*** 964,975 ****
     each type in order to correctly interpret the purpose of each 
!    body-part. Ideally, an HTTP user agent should follow the same or 
!    similar behavior as a MIME user agent does upon receipt of a 
!    multipart type.
  
!    As in MIME [5], all multipart types share a common syntax and must 
!    include a boundary parameter as part of the media type value. The 
!    message body is itself a protocol element and must therefore use 
!    only CRLF to represent line breaks between body-parts. Unlike in 
!    MIME, multipart body-parts may contain HTTP header fields which are 
!    significant to the meaning of that part.
  
--- 984,996 ----
     each type in order to correctly interpret the purpose of each 
!    body-part. An HTTP user agent should follow the same or similar 
!    behavior as a MIME user agent does upon receipt of a multipart 
!    type. HTTP servers should not assume that all HTTP clients are 
!    prepared to handle multipart types.
  
!    All multipart types share a common syntax and must include a 
!    boundary parameter as part of the media type value. The message 
!    body is itself a protocol element and must therefore use only CRLF 
!    to represent line breaks between body-parts. Multipart body-parts 
!    may contain HTTP header fields which are significant to the meaning 
!    of that part.
  
***************
*** 1085,1088 ****
         General-Header = Date                     ; Section 10.6
!                       | MIME-Version             ; Section 10.12
!                       | Pragma                   ; Section 10.13
  
--- 1106,1108 ----
         General-Header = Date                     ; Section 10.6
!                       | Pragma                   ; Section 10.12
  
***************
*** 1190,1192 ****
     The Request-URI is transmitted as an encoded string, where some 
!    characters may be escaped using the "% hex hex" encoding defined by 
     RFC 1738 [4]. The origin server must decode the Request-URI in 
--- 1210,1212 ----
     The Request-URI is transmitted as an encoded string, where some 
!    characters may be escaped using the "% HEX HEX" encoding defined by 
     RFC 1738 [4]. The origin server must decode the Request-URI in 
***************
*** 1333,1337 ****
     information about the response which cannot be placed in the 
!    Status-Line. These header fields are not intended to give 
!    information about an Entity-Body returned in the response, but 
!    about the server itself.
  
--- 1356,1360 ----
     information about the response which cannot be placed in the 
!    Status-Line. These header fields give information about the server 
!    and about further access to the resource identified by the 
!    Request-URI.
  
***************
*** 1648,1649 ****
--- 1673,1678 ----
  
+        Note: When automatically redirecting a POST request after 
+        receiving a 301 status code, some existing user agents will 
+        erroneously change it into a GET request.
+ 
     302 Moved Temporarily
***************
*** 1663,1664 ****
--- 1692,1697 ----
  
+        Note: When automatically redirecting a POST request after 
+        receiving a 302 status code, some existing user agents will 
+        erroneously change it into a GET request.
+ 
     304 Not Modified
***************
*** 2090,2112 ****
  
- 10.12  MIME-Version
- 
-    HTTP is not a MIME-compliant protocol (see Appendix C). However, 
-    HTTP/1.0 messages may include a single MIME-Version general-header 
-    field to indicate what version of the MIME protocol was used to 
-    construct the message. Use of the MIME-Version header field should 
-    indicate that the message is in full compliance with the MIME 
-    protocol (as defined in [5]). Unfortunately, some older versions of 
-    HTTP/1.0 clients and servers use this field indiscriminately, and 
-    thus recipients must not take it for granted that the message is 
-    indeed in full compliance with MIME. Proxies and gateways are 
-    responsible for ensuring this compliance (where possible) when 
-    exporting HTTP messages to strict MIME environments. Future 
-    HTTP/1.0 applications must only use MIME-Version when the message 
-    is fully MIME-compliant.
- 
-        MIME-Version   = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
- 
-    MIME version "1.0" is the default for use in HTTP/1.0. However, 
-    HTTP/1.0 message parsing and semantics are defined by this document 
-    and not the MIME specification.
- 
--- 2123,2123 ----
  
***************
*** 2427,2428 ****
--- 2444,2463 ----
  
+ 12.5  Attacks Based On File and Path Names
+ 
+    Implementations of HTTP origin servers should be careful to 
+    restrict the documents returned by HTTP requests to be only those 
+    that were intended by the server administrators. If an HTTP server 
+    translates HTTP URIs directly into file system calls, the server 
+    must take special care not to serve files that were not intended to 
+    be delivered to HTTP clients. For example, Unix, Microsoft Windows, 
+    and other operating systems use ".." as a path component to 
+    indicate a directory level above the current one. On such a system, 
+    an HTTP server must disallow any such construct in the Request-URI 
+    if it would otherwise allow access to a resource outside those 
+    intended to be accessible via the HTTP server. Similarly, files 
+    intended for reference only internally to the server (such as 
+    access control files, configuration files, and script code) must be 
+    protected from inappropriate retrieval, since they might contain 
+    sensitive information. Experience has shown that minor bugs in such 
+    HTTP server implementations have turned into security risks.
+ 
***************
*** 2633,2635 ****
  
!    HTTP/1.0 reuses many of the constructs defined for Internet Mail 
     (RFC 822 [7]) and the Multipurpose Internet Mail Extensions 
--- 2668,2670 ----
  
!    HTTP/1.0 uses many of the constructs defined for Internet Mail 
     (RFC 822 [7]) and the Multipurpose Internet Mail Extensions 
***************
*** 2636,2649 ****
     (MIME [5]) to allow entities to be transmitted in an open variety 
!    of representations and with extensible mechanisms. However, HTTP is 
!    not a MIME-compliant application. HTTP's performance requirements 
!    differ substantially from those of Internet mail. Since it is not 
!    limited by the restrictions of existing mail protocols and SMTP 
!    gateways, HTTP does not obey some of the constraints imposed by 
!    RFC 822 and MIME for mail transport.
  
!    This appendix describes specific areas where HTTP differs from 
!    MIME. Proxies/gateways to MIME-compliant protocols must be aware of 
!    these differences and provide the appropriate conversions where 
!    necessary.
  
  C.1  Conversion to Canonical Form
--- 2671,2691 ----
     (MIME [5]) to allow entities to be transmitted in an open variety 
!    of representations and with extensible mechanisms. However, 
!    RFC 1521 discusses mail, and HTTP has a few features that are 
!    different than those described in RFC 1521. These differences were 
!    carefully chosen to optimize performance over binary connections, 
!    to allow greater freedom in the use of new media types, to make 
!    date comparisons easier, and to acknowledge the practice of some 
!    early HTTP servers and clients.
  
!    At the time of this writing, it is expected that RFC 1521 will be 
!    revised. The revisions may include some of the practices found in 
!    HTTP/1.0 but not in RFC 1521.
  
+    This appendix describes specific areas where HTTP differs from RFC 
+    1521. Proxies and gateways to strict MIME environments should be 
+    aware of these differences and provide the appropriate conversions 
+    where necessary. Proxies and gateways from MIME environments to 
+    HTTP also need to be aware of the differences because some 
+    conversions may be required.
+ 
  C.1  Conversion to Canonical Form
***************
*** 2650,2733 ****
  
!    MIME requires that an entity be converted to canonical form prior 
!    to being transferred, as described in Appendix G of RFC 1521 [5]. 
!    Although HTTP does require media types to be transferred in 
!    canonical form, it changes the definition of "canonical form" for 
!    text-based media types as described in Section 3.6.1.
  
! C.1.1 Representation of Line Breaks
  
!    MIME requires that the canonical form of any text type represent 
!    line breaks as CRLF and forbids the use of CR or LF outside of line 
!    break sequences. Since HTTP allows CRLF, bare CR, and bare LF (or 
!    the octet sequence(s) to which they would be translated for the 
!    given character set) to indicate a line break within text content, 
!    recipients of an HTTP message cannot rely upon receiving 
!    MIME-canonical line breaks in text.
  
!    Where it is possible, a proxy or gateway from HTTP to a 
!    MIME-compliant protocol should translate all line breaks within 
!    text/* media types to the MIME canonical form of CRLF. However, 
!    this may be complicated by the presence of a Content-Encoding and 
!    by the fact that HTTP allows the use of some character sets which 
!    do not use octets 13 and 10 to represent CR and LF, as is the case 
!    for some multi-byte character sets. If canonicalization is 
!    performed, the Content-Length header field value must be updated to 
!    reflect the new body length.
  
! C.1.2 Default Character Set
  
!    MIME requires that all subtypes of the top-level Content-Type 
!    "text" have a default character set of US-ASCII [17]. In contrast, 
!    HTTP defines the default character set for "text" to be 
!    ISO-8859-1 [18] (a superset of US-ASCII). Therefore, if a text/* 
!    media type given in the Content-Type header field does not already 
!    include an explicit charset parameter, the parameter
  
!        ;charset="iso-8859-1"
  
!    should be added by the proxy/gateway if the entity contains any 
!    octets greater than 127.
  
! C.2  Conversion of Date Formats
  
!    HTTP/1.0 uses a restricted subset of date formats to simplify the 
!    process of date comparison. Proxies/gateways from other protocols 
!    should ensure that any Date header field present in a message 
!    conforms to one of the HTTP/1.0 formats and rewrite the date if 
!    necessary.
  
! C.3  Introduction of Content-Encoding
  
!    MIME does not include any concept equivalent to HTTP's 
!    Content-Encoding header field. Since this acts as a modifier on the 
!    media type, proxies/gateways to MIME-compliant protocols must 
!    either change the value of the Content-Type header field or decode 
!    the Entity-Body before forwarding the message.
  
!        Note: Some experimental applications of Content-Type for 
!        Internet mail have used a media-type parameter of 
!        ";conversions=<content-coding>" to perform an equivalent 
!        function as Content-Encoding. However, this parameter is not 
!        part of the MIME specification at the time of this writing.
  
! C.4  No Content-Transfer-Encoding
  
!    HTTP does not use the Content-Transfer-Encoding (CTE) field of 
!    MIME. Proxies/gateways from MIME-compliant protocols must remove 
!    any non-identity CTE ("quoted-printable" or "base64") encoding 
!    prior to delivering the response message to an HTTP client. 
!    Proxies/gateways to MIME-compliant protocols are responsible for 
!    ensuring that the message is in the correct format and encoding for 
!    safe transport on that protocol, where "safe transport" is defined 
!    by the limitations of the protocol being used. At a minimum, the 
!    CTE field of
  
!        Content-Transfer-Encoding: binary
  
!    should be added by the proxy/gateway if it is unwilling to apply a 
!    content transfer encoding.
  
!    An HTTP client may include a Content-Transfer-Encoding as an 
!    extension Entity-Header in a POST request when it knows the 
!    destination of that request is a proxy/gateway to a MIME-compliant 
!    protocol.
--- 2692,2877 ----
  
!    RFC 1521 requires that an Internet mail entity be converted to 
!    canonical form prior to being transferred, as described in Appendix 
!    G of RFC 1521 [5]. Section 3.6.1 of this document describes the 
!    forms allowed for subtypes of the "text" media type when 
!    transmitted over HTTP.
  
!    RFC 1521 requires that content with a Content-Type of "text" 
!    represent line breaks as CRLF and forbids the use of CR or LF 
!    outside of line break sequences. HTTP allows CRLF, bare CR, and 
!    bare LF to indicate a line break within text content when a message 
!    is transmitted over HTTP.
  
!    Where it is possible, a proxy or gateway from HTTP to a strict RFC 
!    1521 environment should translate all line breaks within the text 
!    media types described in Section 3.6.1 of this document to the RFC 
!    1521 canonical form of CRLF. Note, however, that this may be 
!    complicated by the presence of a Content-Encoding and by the fact 
!    that HTTP allows the use of some character sets which do not use 
!    octets 13 and 10 to represent CR and LF, as is the case for some 
!    multi-byte character sets.
  
! C.2  Conversion of Date Formats
! 
!    HTTP/1.0 uses a restricted set of date formats (Section 3.3) to 
!    simplify the process of date comparison. Proxies and gateways from 
!    other protocols should ensure that any Date header field present in 
!    a message conforms to one of the HTTP/1.0 formats and rewrite the 
!    date if necessary.
! 
! C.3  Introduction of Content-Encoding
! 
!    RFC 1521 does not include any concept equivalent to HTTP/1.0's 
!    Content-Encoding header field. Since this acts as a modifier on the 
!    media type, proxies and gateways from HTTP to MIME-compliant 
!    protocols must either change the value of the Content-Type header 
!    field or decode the Entity-Body before forwarding the message. 
!    (Some experimental applications of Content-Type for Internet mail 
!    have used a media-type parameter of ";conversions=<content-coding>" 
!    to perform an equivalent function as Content-Encoding. However, 
!    this parameter is not part of RFC 1521.)
! 
! C.4  No Content-Transfer-Encoding
! 
!    HTTP does not use the Content-Transfer-Encoding (CTE) field of RFC 
!    1521. Proxies and gateways from MIME-compliant protocols to HTTP 
!    must remove any non-identity CTE ("quoted-printable" or "base64") 
!    encoding prior to delivering the response message to an HTTP client.
! 
!    Proxies and gateways from HTTP to MIME-compliant protocols are 
!    responsible for ensuring that the message is in the correct format 
!    and encoding for safe transport on that protocol, where "safe 
!    transport" is defined by the limitations of the protocol being 
!    used. Such a proxy or gateway should label the data with an 
!    appropriate Content-Transfer-Encoding if doing so will improve the 
!    likelihood of safe transport over the destination protocol.
  
! C.5  HTTP Header Fields in Multipart Body-Parts
  
!    In RFC 1521, most header fields in multipart body-parts are 
!    generally ignored unless the field name begins with "Content-". In 
!    HTTP/1.0, multipart body-parts may contain any HTTP header fields 
!    which are significant to the meaning of that part.
  
Received on Friday, 29 March 1996 13:44:55 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:31:49 EDT