Re: Recursive look up of base in outer headers

Roy T. Fielding (fielding@kiwi.ics.uci.edu)
Thu, 04 Sep 1997 00:14:51 -0700


To: mhtml@segate.sunet.se, uri@bunyip.com
Subject: Re: Recursive look up of base in outer headers 
In-reply-to: Your message of "Thu, 04 Sep 1997 00:08:46 CDT."
             <v04001308b033ef744b53@resnick2.qualcomm.com> 
Date: Thu, 04 Sep 1997 00:14:51 -0700
From: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>
Message-ID:  <9709040021.aa08127@paris.ics.uci.edu>

I must say that this whole discussion is a bit weird to me.
Content-Base takes precedence over Content-Location within the same
header field set because that is the only reason why Content-Base
exists --- to provide a way to say that the embedded links are
relative to something other than the document location.

Unfortunately, I can't see the confusion because I wrote the words.
Given the choice of removing the recursive definition or removing
Content-Base, I would remove Content-Base without a second thought.

The recursive definition enables efficient handling of encapsulated
content without any "searching" whatsoever; it is simply a matter of
peeling through layers of context, and that occurs during the handling
of any message.  If that is not how your software works now, then I
guarantee you that making it work that way will improve the extensibility
and robustness of your software.

The original Base header field was invented before Content-Location,
which is probably why the new wording is confusing.  Since it is
reasonable to expect the base URL to be different from the location
only within the innermost layer (the embedded content), it would be
reasonable to eliminate the Content-Base header field from MHTML and
HTTP and simply stick with the less confusing Content-Location.

>I think this demonstrates, in part, why there was so much worry in the WG
>about allowing recursion of these things: Base can be specified, but if
>it's not specified, it's taken from the location, and if that's not
>specified you take it from the base of the parent. Which, BTW, brings up an
>interesting question: Let's say I have the following:
>
>Content-Type: multipart/related
>Content-Base: foo://bar/biff/
>
>    Content-Type: multipart/mixed
>    Content-Location: blah://blee/blue.bar
>
>        Content-Type: text/html
>
>What is the base for the text/html, which has neither Content-Location nor
>Content-Base? Is it <blah://blee/> (the base we use for its parent since it
>has not Content-Base) or is it <foo://bar/biff/> (the specific base of its
>parent's parent)?

It is <blah://blee/blue.bar>.  I'm sorry I can't think of a better way
of explaining it, but it really is a simple definition.  In order for any
software to read a message, it must start from the outermost layer and
work its way in, just like any encapsulated data type.  At each layer
you have a current base URL, and at each layer that base URL may be
set to something different.  That resetting could be done by a
Content-Base or a Content-Location, but only the first if both are
present at that level.

Please note that there is no way to implement a MIME content-type handler
without parsing message and multipart types from the outside-in.  Likewise,
a valid handler for text/html must be passed a single URL to be used as
the base for relative URL parsing.  My specification simply matches
the most reliable implementation of those handlers within user agents,
and does so in a way that is independent of the innermost media type.

....Roy