Re: Byte ranges -- formal spec proposal from Chuck Shotton on 1995-05-18 (ietf-http-wg@w3.org from April to June 1995)

From: Chuck Shotton <cshotton@biap.com>
Date: Thu, 18 May 1995 15:17:42 -0500
To: David - Morris <dwm@shell.portal.com>, John Franks <john@math.nwu.edu>
Cc: luotonen@netscape.com, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <v02110111abe15858a039@[129.106.201.2]>
At 12:12 PM 5/18/95, David - Morris wrote:
>On Thu, 18 May 1995, John Franks wrote:
>
>[...]
>> are required and it works fine with all current browsers.  Indeed,
>> essentially this proposal has been implemented and is widely used in
>> both the GN and the WN servers (in the case of GN for over two years).
>> It works fine with all browsers.
>
>In that case, I would like to see some background on how the capability
>has been used and the motivation. I have a real hard time
>understanding why a user would want to type byte ranges, either as
>a browser URL modifier or in a anchor.  Keeping such data in sync
>would seem like a horrendous problem. So more on usage experience.

It is typically NOT the case that a user would EVER type a byte range. It
is more likely that a client helper app would request that a WWW browser
retrieve a selected range of bytes from a server on behalf of the viewer. A
perfect example of this is Adobe's work with Acrobat. Acrobat needs to be
able to retrieve a range of bytes without dragging down the entire
(potentially) multi-megabyte PDF file. Prototypes allow the Acrobat viewer
to request a URL that is a CGI, through Netscape to the server. There are
arguments to the URL that specify a byte range for the CGI (server) to
retrieve from the PDF file.

In their case, the solution would be much more portable if the CGI could be
eliminated and servers supported byte ranges directly.

Users will simply never enter these things. Byte ranges are useful for URLs
that are generated on the fly by CGIs and servers. Not URLs that are typed
in by humans. Gotta get out of that static HTML document mind set.

>> Some other clarifications:
>>
>> 2) If a server chooses not to support byteranges for one document or
>> all documents and for whatever reasons, it is quite appropriate to
>> send a "document not found" status.   The server should not parse the
>
>I don't agree ... this was one clear weakness in my mind in the
>original proposal.  "document not found" is a quite different state
>from "this document doesn't support byterange".

Not at all. The server that doesn't understand byte ranges views the entire
URL as a file name. A file name with a bunch of numbers after it that
wasn't found in the local file system. This is a perfect response to
support backward compatibility with servers that don't implement or
understand byte range syntax. Remember, the URL path is something that is
ONLY understood by the server. A URL path is supposed to be opaque to
EVERYTHING else, user, client, viewer, etc. It is strictly up to a server
to determine if it can service a given URL and respond accordingly.

>> Should byteranges be 0 based or 1 based.  My initial view was that
>
>It really depends on who is providing the values. Only a subset of
>programmers think of the first object in an ordered set as being
>the 0th object. People think of '1' (one) as the first object. If
>there is any expectation that people will enter the values, one
>is the correct choice.

I agree. The first byte in a file is byte number 1. If I want the second
thru fourth bytes of a file, I want to specify the range 2-4, not 1-3. The
latter is hardly intuitive.

>> The second issue is also about possible future extensions.  Dan Connoly
>> pointed out that an '&' for multiple parameters would have to be
>> escaped in an anchor in an HTML document.  This is indeed a problem,
>> on the other hand it would be nice to have the same syntax as HTML form
>> URLs.
>
>I commented on & already and believe it to be the better choice.
>Related is the choice of ';'.  Seems to me that the '#' is already
>designated as a fragment identifier and what is being proposed is
>a new form of fragment specification.

This is a good point. However, currently the # fragment identifier is
strictly for the benefit of clients. It hasn't been used as something that
makes sense to a server. It is clients that interpret this portion of URLs,
not servers. If servers suddenly start parsing info after "#" in a URL,
what will this mean for links within documents? How will clients know which
URLs to deal with locally and which to send to servers?

I still maintain that "?" is the appropriate separator for byte range
syntax. You are asking the server to search for a particular range of bytes
in a file, which is consistent with searching for keywords in a file, or
coordinates on a map. As I mentioned earlier, semicolon has a specific
meaning in many file systems that will conflict with its use as a separator
for byte range info. The use of ";" is a bad choice.

> If this proposal remains
>as a http: URL rather than a new http header (which I would favor
>depending on answers to the who&why question) then # would seem
>more logical.

Making a new HTTP header means that it will never gain support. Allowing it
to be part of the URL (where it belongs in my opinion) means that it can be
retrofitted into existing servers with the addition of a CGI. And as for #,
? is a better choice than that or ";".

-----------------------------------------------------------------------
Chuck Shotton
cshotton@biap.com                                  http://www.biap.com/
cshotton@oac.hsc.uth.tmc.edu                           "I am NOT here."
Received on Thursday, 18 May 1995 13:21:18 UTC