- From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
- Date: Mon, 23 Jan 1995 19:47:50 -0800
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: uri@bunyip.com
Larry writes: > The reason for putting 'establishing a base' in an appendix is that it > isn't exhaustive. It doesn't define how to establish the base URL for > all situations, nor can it. What's there now is a set of examples; > some of them correspond to well-established practice (like the base > for things transported by http, while others are things that you've > made up while writing this document (e.g., establishing a base for > multipart mail messages). I disagree with all three sentences. First, a standard does not need to be exhaustive -- it needs to be complete. Second, what this standard defines is the order of precedence for ascertaining the base URL from: 1) inside the content 2) part of the message 3) the retrieval context 4) the default Section 3.5 was just a special case of (3). This order of precedence applies equally well to all Internet protocols, and is thus complete from the standpoint of an Internet standard. Third, there are known gaps in well-established practice that are fixed by this specification, and at the same time it does not conflict with any established practice. If you have a problem with that, we might as well shut down the WG. In particular, the question of "what is the base URL of a component in a multipart message?" is one that was discovered while defining the handling of multipart documents for HTTP. The solution I "made up" is both simple and technically sound given the possible presence of multiple levels of "base" headers (and thus multiple levels of retrieval context). I added that section to resolve any ambiguity regarding that situation. If anyone has a better solution, please let me know. Ignoring the ambiguity is not acceptable. In an attempt to clarify it, I have changed section 3 as follows: ====================================================================== 3. Establishing a Base URL The term "relative URL" implies that there exists some absolute "base URL" against which the relative reference is applied. Indeed, the base URL is necessary to define the semantics of any embedded relative URLs; without it, a relative reference is meaningless. In order for relative URLs to be usable within a document, the base URL of that document must be known to the parser. The base URL of a document can be established in one of four ways, listed below in order of precedence. The order of precedence can be thought of in terms of layers, where the innermost defined base URL has the highest precedence. This can be visualized graphically as: .---------------------------------------------------------. | .---------------------------------------------------. | | | .---------------------------------------------. | | | | | .---------------------------------------. | | | | | | | (3.1) Base URL embedded in the | | | | | | | | document's content | | | | | | | `---------------------------------------' | | | | | | (3.2) URL defined by a "Base" message | | | | | | header (or equivalent) | | | | | `---------------------------------------------' | | | | (3.3) URL of the document's retrieval context | | | `---------------------------------------------------' | | (3.4) Base URL = "" (undefined) | `---------------------------------------------------------' 3.1. Base URL within Document Content Within certain document media types, the base URL of the document can be embedded within the content itself such that it can be readily obtained by a parser. This can be useful for descriptive documents, such as tables of content, which may be transmitted to others through protocols other than their usual retrieval context (e.g. E-Mail or USENET news). It is beyond the scope of this document to specify how, for each media type, the base URL can be embedded. However, an example of how this is done for the Hypertext Markup Language (HTML) [3] is provided in an Appendix (Section 10). 3.2. Base URL within Message Headers A second method for identifying the base URL of a document is to specify it within the message headers (or equivalent tagged metainformation) of the message enclosing the document. For protocols that make use of message headers like those described in RFC 822 [5], it is recommended that the format of this header be: base-header = "Base" ":" "<URL:" absoluteURL ">" where "Base" is case-insensitive. For example, the header Base: <URL:http://www.ics.uci.edu/Test/a/b/c> would indicate that any relative URLs found within the document should be parsed relative to <URL:http://www.ics.uci.edu/Test/a/b/c>. Any whitespace (including that used for line folding) inside the angle brackets should be ignored. Protocols which do not use the RFC 822 message header syntax, but which do allow some form of tagged metainformation to be included within messages, may define their own syntax for passing the base URL as part of a message. Describing the syntax for all possible protocols is beyond the scope of this document. It is assumed that user agents using such a protocol will be able to obtain the appropriate syntax from that protocol's specification. In situations where both an embedded base URL (as described in Section 3.1) and a base-header are present, the embedded base URL takes precedence. 3.3. Base URL from the Retrieval Context If neither an embedded base URL nor a base-header is present, then, if a URL was used to retrieve the base document, that URL shall be considered the base URL. Note that if the retrieval was the result of a redirected request, the last URL used (i.e., that which resulted in the actual retrieval of the document) is the base URL. Composite media types, such as the "multipart/*" and "message/*" media types defined by MIME (RFC 1521, [4]), require special processing in order to determine the retrieval context of an enclosed document. For these types, the base URL of the composite entity must be determined first; this base is then considered the retrieval context for its component parts, and thus the base URL for any part that does not define its own base via one of the methods described in Sections 3.1 and 3.2. This logic is applied recursively for component parts that are themselves composite entities. In other words, the retrieval context (Section 3.3) of a component part is the base URL of the composite entity of which it is a part. Thus, a composite entity can redefine the retrieval context of its component parts via inclusion of a base-header, and this redefinition applies recursively for a hierarchy of composite parts. Note that this is not necessarily the same as defining the base URL of the components, since each component may include an embedded base URL or base-header that takes precedence over the retrieval context. 3.4. Default Base URL If none of the conditions described in Sections 3.1 -- 3.3 apply, then the base URL is considered to be the empty string and all embedded URLs within that document are assumed to be absolute URLs. It is the responsibility of the distributor(s) of a document containing relative URLs to ensure that the base URL for that document can be established. It must be emphasized that relative URLs cannot be used reliably in situations where the object's base URL is not well-defined. ====================================================================== > I don't believe that your assertion "The method of establishing a base > must be part of the standard" holds up. You assert it, but you don't > justify it. In any case, even it if must be part of 'a' standard, it > isn't clear that it must be part of *this* standard, which defines the > syntax and semantics of relative URLs. I'm sorry, I thought that was clear. The base URL defines the semantics of all embedded relative URLs. Without the base, all embedded relative references are meaningless. Obviously, it must be part of *this* standard. > Two things: most importantly, the defined syntax for news and nntp > URLs don't include any semantics for "/". At best, you're left saying > that a raw "<message-id>" is a relative URL to a "news:<message-id>" > URL. The syntax for available groups doesn't allow you to say that > applying ".." as a relative URL to "news:alt.binaries.parsers" would > get you "news:alt.binaries". Ooops, sorry. I keep forgetting that news: URLs do not include the article numbers found in libwww. I'll move news to the paragraph above it. > And using "../3" in > "nntp://news.org:119/alt.binaries/12" doesn't seem particularly useful. nntp URLs do use "/" as hierarchy, follow the generic-RL syntax, and I know of several examples where relative URLs could be useful in such circumstances. In fact, it should be in the bottom group with http. That still doesn't mean that they *must* be used -- only that there are no inherent restrictions on their use. > I hadn't really gone over your BNF, but I'm puzzled how: > > ! absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ) ) > > + generic-RL = scheme ":" [ relativeURL ] > + > > leads one to allow a relative URL as a kind of absoluteURL. Eh? No, it just defines the production rules for the generic-RL syntax. Actually, it should be generic-RL = scheme ":" relativeURL but that's a technicality. This is the same as saying: generic-RL = ( scheme ":" "//" net_loc [ abs_path ] ) | ( scheme ":" "/" rel_path ) | ( scheme ":" [ path ] [ ";" params ] [ "?" query ] ) The dual use of production names is meaningful and corresponds to the way existing parsers handle relative URLs as part of the generic-RL parsing process. ......Roy Fielding ICS Grad Student, University of California, Irvine USA <fielding@ics.uci.edu> <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Monday, 23 January 1995 22:49:48 UTC