Re: Byte ranges -- formal spec proposal from Brian Behlendorf on 1995-05-18 (www-talk@w3.org from May to June 1995)

From: Brian Behlendorf <brian@organic.com>
Date: Thu, 18 May 1995 00:01:15 -0700 (PDT)
To: www-talk@w3.org
Message-Id: <Pine.3.89.9505172203.h15253-0100000@eat.organic.com>
On Wed, 17 May 1995, Larry Masinter wrote:
> I'm getting this discussion 3 times, http-wg, www-talk, and now on
> uri.  I suggest keeping the discussion on www-talk for now.

Done.  Are John Franks and Ari aboard?  

> The proposal is to add byte ranges to URLs (in general, it seems). I
> don't think it belongs there; at best, byte ranges make sense as an
> addon to the HTTP protocol.

Then how does one build a URL to point to minutes 6 through 8 of a 1-hour
60-megabyte DJ set?  Or to message 2064 (byte range 2254322-2257934) in a
4 megabyte mailbox archive?  Sure, if I'm using an HTTP-aware mailbox
reader or audio viewer that's a possibility... but then I can only launch
a range request from that type of viewer.  Ick. 

I do see what you're getting at, though.  There is *not* necessarily a 
direct mapping between a URL and the representation of the object that 
URL refers to as returned by the server.  Key definitions (and correct me 
if I'm wrong, but at least this is how I think most developers are 
conceiving the web):

1) A URL is a pointer to an *object* somewhere. (well, okay, object=resource)

2) An object can be anything - only *representations* of an object are 
directly viewable (flat files or program output, for example, are 
both representations).  

3) A *representation* is what we get back when we perform an action on a URL
(GET, POST, etc) in the context of an HTTP request. You do not "GET" the 
object itself.

4) *Representations* can be influenced by any part of the request, not 
just the URI.  For example, headers which are known to make a difference: 
Accept (content negotiation), If-Modified-Since, WWW-Authorization, hell 
even User-Agent in certain situations.

(quick question - isn't it inherently more scalable to distribute and 
cache the objects themselves rather than their (possibly 
numerous) representations?  Hmm..)

So, does a "byte range" constitute a variation of the object, or a new
object itself, which deserves a unique URL?  Compelling cases could be
made on either side, but I think in this situation it truely is a
variation of the object.  But now we have a problem - the WWW Link Model
(hi roy!) only lets me link to *objects* (i.e., URL's), not particular
variations/representations of objects, if I understand things correctly. 
For example, if I have an object that represents my home page, and my home
page object returns both HTML 2.0 and HTML 3.0 representations of itself,
there's no way for me to *force* an HTML 2.0 browser to see the HTML 3.0
representation without giving the HTML 3.0 representation its own, 
un-content-negotiated URL.  Feh. 

Okay, so here's the problem.  A URL must be able, not required, but able,
to *completely* describe the request for an object.  In other words, URL's
must be able to point to particular representations of webbable objects. 
The protocol "method" used.  The additional headers.  In fact, in most
situations today URL's are used to point to representations instead of
objects - content providers are simply creating unique URL's to every
representation.  So, we're not breaking anything fundamental here, it
seems.  Further more: 

1) There must be a clear distinction between the part of the URL that 
describes the *object*, and the part of the URL that describes its 
representation.

2) User-agents must be able to deal with the part of the URL that 
describes the representation at a higher level - for example, when a user 
goes to "bookmark" the object, they are asked to chose whether they want 
to bookmark the object in general or the particular representation of 
that object.  

3) Responses need to indicate which parts of that representation request 
influenced the output, so that caches know what to key on (and don't 
needlessly key on everything in the request.)  I think there's a "vary" 
header proposed somewhere....

4) There must be a defined list of "sanctimonious" headers in HTTP, ones 
which are always part of the request and are *not* modifiable by the 
representation-part of the URL.  For example, User-Agent:, or From:.  
Likewise, content providers should not vary content based on these headers.

Phew.

(btw, the CD I'm listening to now seems highly conducive to these kind of 
thought processes - Air, by Pete Namlook, on FAX)

So, here's how I think things should look.  The format:

  http://host/path/to/object?object_arguments;request_headers

  object_arguments: a url-encoded list of name-value pairs 
	i.e. name=brian&age=22

  request_headers: a url-encoded list of request headers, which only
        make sense in the context of the protocol used (in this case HTTP)
	This generality is so that URL's aren't hindered by HTTP-only
	specifications.

So that the browser's request looks something like

  (connect to host port 80)
  GET /path/to/object?object_arguments HTTP/1.0
  User-Agent: Godzilla
  request_header.name1=request_header.value1
  request_header.name2=request_header.value2

For the purposes of this exposition, the HTTP header referring to 
byteranges would be something like "ByteRange:".  Something more general 
is needed for other segments of course.

Some sample URL's:

   a pointer to a sound file of clinton's weekly radio address:
	http://www.npr.org/clinton/week23
   a pointer to an MPG version of clinton's weekly radio address:
        http://www.npr.org/clinton/week23;Accept=audio/x-mpeg
   a pointer to byte range 10234234-13244212 of clinton's weekly radio address:
	http://www.npr.org/clinton/week23;Accept=audio/x-mpeg&Byterange=10234234-13244212
   
I can already sense some problems.  Here's an interesting URL:

http://whitehouse.gov:25/;MAIL+FROM=madmad@bomber.org&RCPT+TOpresident&DATA\nFrontLawn,2pm,May16th\n.\n

Though I suppose some catches could be put in place for this situation, 
can we protect against that for every protocol?  At what point does a 
sufficiently obfuscated (to the human eye) extended URL become a malicious
virus-ish mechanism for mayhem?

Food for thought, hopefully I'm not too far off base on some of these.
Dan, Roy, let me have it.  :)

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  brian@hyperreal.com  http://www.[hyperreal,organic].com/
Received on Thursday, 18 May 1995 03:01:22 UTC