Re: Byte ranges -- formal spec proposal from Brian Behlendorf on 1995-05-21 (www-talk@w3.org from May to June 1995)

From: Brian Behlendorf <brian@organic.com>
Date: Sat, 20 May 1995 20:49:20 -0700 (PDT)
To: www-talk@www10.w3.org
Message-Id: <Pine.3.89.9505202036.T15253-0100000@eat.organic.com>
One last thing I'd like to throw into the conversation here, and then I 
feel all has been said that needs to be said, and we should try and find 
rough consensus. (BTW, are there any "tools" for finding what 
can be considered rough consensus?  A straight WWW-based poll is 
probably out of line...)

************************************************************************
SUMMARY: all we really need, standards-wise, is a new 300-level HTTP 
response, "Contains".
************************************************************************

Several good examples have been brought up of files that can be comprised of
segments, where each of those segments is a valid file of the same data-type,
as an argument for this proposal.  However, in almost all of the examples,
there were only *specific* byte ranges which would work, in which the
requested object would really be usable.  Thus, for most of these examples,
you could just ask for "parts 0-3" or "2-5" or "3-end", and the right thing
would happen.  In only one of the examples was *true* random access
necessary, and that was to resume downloading of a file if it was interrupted
part of the way through.  Keep this example off to the side for the next few
paragraphs. 

Instead of thinking about one URL that represents a collection of objects,
why not give each object their own unique URL, and devise a way of addressing
a collection of URL's?  This is similar to byterange, but more general. 
Let's say somewhere a mapping takes place that translates URL1 into a
container for URL2, URL3, URL4, etc.  I have a hunch this is URC/URI
territory, but I don't know enough yet about the specific URC proposals
floating around yet to know if this is already being considered. 

So, it works like this:

Client asks for URL1. URL1 gets mapped at a server somewhere into a composite
body whose parts are URL2, URL3, and URL4. * If it doesn't find a place to
either inline or link URL3, URL4, etc., it's up to the browser to figure out
how to represent that "auxiliary" file.  Maybe it just keeps it around until
it can be represented later. 

Caches work just as they always have.  If they can cache that container 
mapping, so much the better.  The important thing is that URL2, URL3, 
URL4, etc., can be ANYTHING THEY WANT TO BE - there's no need to give 
them some sort of formal syntax, caches know from the mapping from URL1 
how they assemble together.  If the server prefers knowing them as 
byteranges, it doesn't matter.  I.e., we can have

  http://host/path/file 
    is-a-container-for
      http://host/path/file;byterange=0-30
      http://host/path/file;byterange=31-60

or

  http://host/path/file
    is-a-container-for
      http://host/path/file?part1
      http://host/path/file?part2

or even

  http://host/path/file
    is-a-container-for
      http://host/path/file2
      http://host2/path/script
      ftp://host3/path/file3

and either way the client or proxy will know when it has the whole 
object, or just its parts.

Finally, this also allows "parts" to be members of more than one 
container, something none of the byterange proposals had considered.  I 
think this is a good thing, can anyone think of a situation where this 
isn't?  In fact they can even be on completely separate servers.


Yes, THIS REQUIRES CHANGES TO BROWSERS AND SERVERS.  Minimally.  Why 
are we so afraid of that?

There are a couple really good side effects now that I think about it.
For example, right now Netscape's progressive-rendering algorithm has to 
wait until it recognizes a reference to an inlined image before it can 
start grabbing it.  If it could be told that "URL1 contains this HTML 
page and these inlined images" then it could possibly be more efficient 
in what it does.  Additionally, a content provider could "bundle" icons 
with one page that weren't necessarily inlined on that page, but which 
are used by subsequent pages, so that when visitors go to that subsequent 
page, the icons are already loaded.

I can give plenty of examples of how this could work for just about 
every application discussed so far.  It would seem to be pretty 
straightforward for servers to generate these mappings for a large PDF 
file, presuming there's some way for it to query the PDF file to know 
where it can be segmented.

So, now, back to the resume-downloading-at-point-x.  This is 
semantically a much different operation than "give me part x",
so let's just give it its own request header:

Startbyte: 204567

....would mean start the post-response-header transmission at byte 204567 into
the response, counting from the end of the response headers (\r\n\r\n, or
\n\n).  Who cares if this is a CGI script or actual file, eh?  :)


********************************************************************

So, I suppose in the end I'm proposing a new 300-level HTTP header, 
something like

  305 Contains Mapping

     o Following: anything
  
     o Required Headers: none

  The server returns an HTTP object comprised of a newline-delimited 
  list of URI's which this URL is said to "contain".  The client is expected
  to fetch these URL's and plug them together, representing this 
  requested URL as the canonical URL for this collection.  The other HTTP 
  headers on this object apply *only* to this object, and this response 
  should be cached where possible.

*******************************************************************

*Feedback*, please.  I hate having all these ideas and no time to 
implement them in a browser (though I'd be happy to implement this on the 
server side in Apache).

Roy?  Dan?  Henrik?

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  brian@hyperreal.com  http://www.[hyperreal,organic].com/


* - Order is insignificant - a browser first starts rendering URL2 and looks
for where to start plugging in URL3, etc, but that should just be an
optimization, browsers can plug things together however they wish.  Some
network-aware file formats like VRML already have the concept of nesting
inlines, which HTML doesn't have (yet), so that order could to be
created by a depth- or breadth-first traversal of the scene to aid 
rendering, but in a real directed graph that's not necessary.
Received on Saturday, 20 May 1995 23:49:24 UTC