Re: Comments on Byte range draft from Benjamin Franz on 1995-11-13 (ietf-http-wg@w3.org from October to December 1995)

From: Benjamin Franz <snowhare@netimages.com>
Date: Mon, 13 Nov 1995 11:41:38 -0800 (PST)
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <Pine.LNX.3.91.951113103559.166E-100000@ns.viet.net>
On Mon, 13 Nov 1995, Chuck Shotton wrote:

> At 8:53 AM 11/13/95, Benjamin Franz wrote:
> >On Sun, 12 Nov 1995, Gavin Nicol wrote:
> 
> >On consideration - the EOL conversion issue is just another red herring.
> 
> Look, can we cut out the "red herring" red herrings? I want to see some
> constructive discussion about why a CGI-based implementation of byte ranges
> is unacceptable. Nobody has presented a coherent argument against it. I
> would like to see some rationale behind why the URL standard needs to be
> changed to support an application-specific form of query for a specific
> subset of data types. I'd like to see a well-reasoned discussion on your
> part about the valid concerns that server implementors have regarding the
> need to re-render entire documents to serve you a byte range for low-value
> reasons like interrupted file transfers, etc. and why you think it is OK to
> ignore these concerns.

A) The server has to re-render the document regardless.
   What is being asked for here is that it only send *part* of that
   rendered docuement. Something that should always take
   less resources than retransmitting the *entire* document -
   including the sections already received by the client.
   Given that it only has to transmit part of the document,
   a server *could* optimize its rendering in some cases for
   partial documents as well. Clear win.

B) CGI solutions are 'one off' solutions that fail to address the 
   *general* problem of restarting interrupted transfers.
   Clear lose.

C) CGI adds completely unnecessary overhead to server operation for what is
   fundamentally a *simple* request of bytes X - Y of the rendered byte
   stream. Clear lose.

D) It *ISN'T* application specific. Byte ranges are a function of
   the byte stream between the server and the client - not of the
   servers or the clients local representation of that byte stream.
   That byte stream is not dependant on the client/server *interpretation*
   of it. It is an entity in its own right.

> >It simply doesn't matter to the question of byte ranges. Byte ranges
> >*clearly* should apply to the transmitted byte stream - not the server side
> >representation of said byte stream. And one way or another the byte
> >stream is *always* generated. And unless your conversion is
> >non-deterministic the byte-stream will always be the same from the same
> >source document for a given GET request. The byte is inherently the
> >atomic unit of information in HTTP.
> 
> This is a trivial academic point. Please stop fixating on it. 

Only when you comprehend it. You are locked on the *meaning* of the byte 
stream - which is utterly irrelvant to a transport level issue. This is 
NOT about the meaning of the information - it is about *how* it is 
transported. It is stupid to retransmit an entire document when you 
already have part of it.

> Of course all data flows across a HTTP connection as a linear stream of bytes.
> That's the whole legacy of a Von Neumann architecture. Now that we are 
> past Data Comm 101, let's talk about whether or not it makes sense to 
> force byte stream manipulations on applications that deal with pages, 
> images, database records, complicated relational objects, real-time 
> generated displays, etc.

Are you for real? You concede that what moves between the server and the 
client is a byte stream and that a GET reguest should always return the 
same byte stream for the same document - and then you dive right back 
into what the *meaning* of the byte stream is? It does not matter. Bytes 
200-400 of the byte stream is bytes 200-400 of the byte stream whether 
the byte stream is images, database search results or cookie recipies 
spoken aloud by Julia Childs.

Obviously a client should not make a request of a byte range on a 
*dynamic* document - that is implicit in the 'deterministic' condition. 
The base request must give the same results every time or a byte range is 
meaningless. But this is a non-issue! A dynamic document should be giving 
a 'Pragma: no-cache' or Expires: <yesterday> header in the first place to 
indicate it is not safe to depend on the documents contents being 
unchanged between requests. At the last line - a server should ignore 
the byte range request when clearly inappropriate (such as at the 
invocation of a CGI script). This is no different that a server's handling 
of a Conditional GET. This is a *trivial* issue.

> You seem to want to trivialize the function of the Web and associated
> applications to the lowest common denominator of moving the contents of a
> file from point A to point B, a byte at a time. This is an absurdly
> low-level of abstraction. It's like talking about building a house by
> aligning protein and sugar molecules so that they form 2x4s, lining up iron
> atoms into nails, etc.

But it is the level we are *discussing* when talking about byte ranges.

> There is a LOT more to the problem space than simply bytes, and a
> one-size-fits-all byte range proposal is unsuitable from an implementation
> perspective as well as a philosophical one. Clients have no business
> knowing about the internal representation of items stored at a given URL
> and allowing them to ask for specific chunks out of a file with something
> as coarse as a byte range violates the entire principle of a URL.

What are you talking about? One more time: BYTE RANGES 
SHOULD REFER TO POSITIONING WITHIN THE BYTE STREAM BETWEEN THE SERVER 
AND THE CLIENT NOT WITHIN THE SERVER'S OR CLIENT'S LOCAL REPRESENTATION OF 
THAT STREAM. I have no idea how to make the statement any simpler. This 
is not about *what* the data is - it is about *how* the data is transported.

> Specifically, a URL is supposed to be a client-opaque method for requesting
> information from a server. URLs are given to clients by servers or by
> humans. The path information contained in a URL is supposed to be the
> private domain of the server. Allowing a client to generate or manipulate
> this portion of a URL outside the bounds of the current hierarchy
> manipulations allowed in the URL standard is WRONG.

And is not what is being discussed. I have no idea why you are so locked 
into the idea that byte range requests should to refer to anything other 
than the byte stream.

> > What connects a server to a client
> >*is* a byte stream. End point representations of that byte stream
> >should be completely irrelevant to the issue of byte ranges.
> 
> Well, technically it's a bit stream. But even so, why are we continuing to
> twaddle about in the low level bits and bytes of this problem space when we
> could be building easier to use application level standards?

> If you are
> intent on forcing server implementors to pre-render all content so you can
> be served a "byte stream" of particular offset and length, why not ask for
> something that makes a little more sense in the context of the WWW, like
> pages, images, records, etc.? And what's more, why not allow servers to
> tell the client FIRST what representation is has chosen for partial
> document transfers (byte, page, record, etc.) and tell the client HOW to
> make the request for these document portions? This last point is, I think,
> the most important one to consider.

You are off in the wild blue yonder again:

   Questions of URL -> Dataspace representation are orthoganal to this
   issue. You are considering a *different* problem than what is being
   discussed here and one that the WG simply has no control over. That 
   issue is far beyond the scope of the HTTP group - which is *precisely* 
   concerned with the issues you dismiss as "twaddling about in the low 
   level bits and bytes of this problem space." HTTP is a *transport* 
   mechanism - this proposal regards a *transport* issue: The problem of 
   restarting incomplete document transfers without having to re-transfer
   the *entire* document again.

As presently done: The server must re-render the entire document and 
re-transmit the *entire* document. The proposed change would allow (not 
require) a server to partially render and/or partially transmit a document 
that had been previously been sent but that did not finish transmission 
for whatever reason. This clearly can not impose a higher work load on 
a server (which *at worst* re-renders and re-sends the entire document - 
as it does right now) but has the potential in many cases to 
*decrease* the load on both the server and the network as information is 
not redundantly sent to the client.

This has zero, nada, zilch to do with what that information represents.
Those questions are simply irrelevant.

-- 
Benjamin Franz
Received on Monday, 13 November 1995 11:35:10 UTC