Re: Multiple Content-Location headers from Jim Gettys on 1998-01-15 (ietf-http-wg@w3.org from January to March 1998)

From: Jim Gettys <jg@pa.dec.com>
Date: Thu, 15 Jan 1998 12:57:56 -0800
To: Jacob Palme <jpalme@dsv.su.se>
Cc: Nick Shelness <shelness@lotus.com>, Jim Gettys <jg@pa.dec.com>, IETF working group on HTML in e-mail <mhtml@segate.sunet.se>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9801152057.AA21866@pachyderm.pa.dec.com>
>  From: Jacob Palme <jpalme@dsv.su.se> >  Date: Thu, 15 Jan 1998 20:55:42 
+0100 >  To: Nick Shelness <shelness@lotus.com>, jg@pa.dec.com (Jim Gettys) 
>  Cc: IETF working group on HTML in e-mail <mhtml@SEGATE.SUNET.SE>, >          
http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com >  Subject: Re: Multiple 
Content-Location headers >  >  At 17.21 +0000 98-01-15, 
Nick_Shelness@motorcity2.lotus.com wrote: >  > Could I suggest that to break 
this impasse, that MHTML switches to a new >  > header field Content-Label 
to replace its use of Content-Location. This >  > would better capture the 
MHTML role of the header field, and would also >  > allow the simplifications 
I argued for last week on the MHTML list to >  > proceed. I.e., Content-Label 
could only specify an absolute URI, and would >  > not establish a base. 
>  >  I am not very happy with changing an existing and already implemented 
>  IETF proposed standard in such a radical way. But maybe it is necessary. 
>  Let us examine the differences between how MHTML and HTTP uses Content- 
>  Location to see if they really need to be split into two different >  
header fields. >  >  HTTP 1.1 spec says                  MHTML spec says 
(I have removed >                                      the controversial 
text allowing >                                      multiple Content-Location 
headers, >                                      since we all agree to remove 
>                                      this.) >  >  In HTTP, multipart 
body-parts MAY   A Content-Location header >  contain header fields which 
are     specifies an URI that labels the >  significant to the meaning of 
that  content of a body part in whose >  part. A Content-Location header     
heading it is placed. Its value >  field SHOULD be included in the     CAN 
be an absolute or a relative >  body-part of each enclosed entity   URI. 
>  that can be identified by a URL. >                                      
A Content-Location header field is >                                      
allowed in any message or content >                                      
heading, in addition to one >                                      Content-ID 
header (as specified in >                                      [MIME1]) 
and, in Message headings, >                                      one Message-ID 
(as specified in >                                      [RFC822]) >  >  
The Content-Location entity-header  An URI in a Content-Location >  field 
MAY be used to supply the     header need not refer to an >  resource location 
for the entity    resource which is globally >  enclosed in the message  
when that  available for retrieval using this >  entity is accessible from 
a         URI (after resolution of relative >  location separate from 
the          URIs). However, URI-s in >  requested resource's URI.           
Content-Location headers (if >                                      absolute, 
or resolvable to >                                      absolute URIs) SHOULD 
still be >                                      globally unique. >  >  A 
cache cannot assume that an       When processing (rendering) a >  entity 
with a Content-Location      text/html body part in an MHTML >  different 
from the URI used to      multipart/related structure, all >  retrieve it 
can be used to respond  URIs in that text/html body part >  to later requests 
on that Content-  which reference subsidiary >  Location URI. However, the 
Content- resources within the same >  Location can be used to             
multipart/related structure SHALL >  differentiate between multiple      
be satisfied by those resources >  entities retrieved from a single    and 
not by resources from any >  requested resource, as described    another 
local or remote source. >  in section Caching Negotiated >  
Responses.                          Therefore, If a sender wishes a 
>                                      recipient to always retrieve an >  
...                                 URI referenced resource from its 
>                                      source, an URI labeled copy of >  
If a single server supports         that resource MUST NOT be included >  
multiple organizations that do not  in the same multipart/related >  trust 
one another, then it must     structure. >  check the values of Location 
and >  Content-Location headers in         In addition, since the source 
of a >  responses that are generated under  resource received in >  control 
of said organizations to    multipart/related structure can be >  make sure 
that they do not attempt  misrepresented (see 12.1 above), >  to invalidate 
resources over which  if a resource received in >  they have no 
authority.             multipart/related structure is 
>                                      stored in a cache, it MUST NOT be 
>                                      retrieved from that cache other 
>                                      than by a reference contained in 
a >                                      body part of the same 
>                                      multipart/related structure. 
>                                      Failure to honor this directive 
>                                      will allow a multipart/related 
>                                      structure to be employed as a 
>                                      Trojan Horse. For example, to 
>                                      inject bogus resources (i.e. a 
>                                      misrepresentation of a 
>                                      competitor's Web site) into a 
>                                      recipient's generally accessible 
>                                      Web cache. 
>  
>  My feeling is that the use of Content-Location as defined in the HTTP 
>  and MHTML spec is not so different as to require us to use different 
>  headers. But could the HTTP people please examine the quotes above 
>  and check what you feel about this. 
> 

The problem we have is syntax and implementation, not semantics.  
Lets clear this hurdle before we get into the meat of what you are trying 
to achieve, and whether your suggestion fits into the architecture of the 
Web, and my apologies of jumping into the meat in some of my early messages 
on this topic.

Roy Fielding's point is that the syntax change required to allow the header 
name Content-Location to have multiple fields (needed as that is what proxies 
typically do if they find multiple headers of the same name), is a problem, 
and one that may (likely) break exisiting implementations.  It is also 
possible/likely this would break existing applications of HTTP, particularly 
clients and proxies.  To include the URI in a comma separated list would 
require quoting of the URI's, as Roy points out; parsers may not be coded 
correctly to deal with this.  It is quite likely that existing implementations 
will get the wrong answer, or even die, if one attempts to have multiple 
Content-Location headers, or that would not understand the quoting that
this would require.  And then there are the proxy issues....

To quote from section 4.2 of the HTTP spec:

"Multiple message-header fields with the same field-name may be present in 
a message if and only if the entire field-value for that header field is 
defined as a comma-separated list [i.e., #(values)]. It MUST be possible 
to combine the multiple header fields into one "field-name: field-value" 
pair, without changing the semantics of the message, by appending each 
subsequent field-value to the first, each separated by a comma. The order 
in which header fields with the same field-name are received is therefore 
significant to the interpretation of the combined field value, and thus 
a proxy MUST NOT change the order of these field values when a message is 
forwarded."

These are the cruxes of the problem.  So we're trying to follow the doctor's 
maxim "first, do no harm". We aren't worrying (yet) about the semantic issues 
that may or may not exist between how Content-Location is defined in the 
two different specs, but pointing out that allowing multiple of 
Content-Location headers is an incompatible change which may break 
implementations, and we have no data which shows this change is harmless.

So until it is shown to be harmless, we must presume harm.  IETF process
attempts to avoid regression; we're worried that existing, deployed software
would stop working, possibly in significant ways.

So, please, as in my previous message, either present data that it
doesn't break implementations, or don't argue about the name.  Otherwise
we're going to continue to bog down.  I think that will let us all
make faster progress.

I hope this clarifies where the difficulty lies.

			- Jim Gettys


--
Jim Gettys
Industry Standards and Consortia
Digital Equipment Corporation
Visting Scientist, World Wide Web Consortium, M.I.T.
http://www.w3.org/People/Gettys/
jg@w3.org, jg@pa.dec.com
Received on Thursday, 15 January 1998 13:01:45 UTC