RE: Partial Puts from Yaron Goland on 1997-03-21 (w3c-dist-auth@w3.org from January to March 1997)

From: Yaron Goland <yarong@microsoft.com>
Date: Fri, 21 Mar 1997 00:49:09 -0800
To: "'masinter@parc.xerox.com'" <masinter@parc.xerox.com>
Cc: w3c-dist-auth@w3.org
Message-ID: <11352BDEEB92CF119F3F00805F14F485026B721C@RED-44-MSG.dns.microsoft.com>
As anyone running on a 28.8 modem or less will tell you, this isn't an
optimization, this features determines if the user can function. I sat
down with the Office people and they showed me some of their user
scenarios, they involve having files on servers edited remotely by users
sitting behind modems of all speeds and sizes. Office's latency problems
are especially bad because they sell somewhere around 40% of their
product outside the US. I believe you will find that anyone on this
group who ships a commercial distributed authoring system will tell you
that partial write isn't an optimization, it is a critical piece of
their functionality. I know that is certainly true for Microsoft.

But the need for partial PUTs isn't the issue here. With the STRUCTURE
method it is possible to do a de facto partial put, regardless of
anyone's feelings on the subject. The real question is - should a Range
header be used? I had originally thought to use a RANGE method where one
would submit a Range header and get back a URI. The problem with that is
the performance hit. After talking with the IIS and Office folks we
figured that the performance hit is so large that we had to use a Range
header rather than a Range method. Again, I strongly suspect that Office
and IIS's experience is representative. If not, I'm sure that the other
members of the group who ship distributed authoring server products will
speak up. I can only speak for Microsoft's experience in this area.

The clean is the RANGE method. I think everyone can agree on that much.
The problem is, its performance is no where near good enough for anyone
who needs to sell a system to implement. So the question we need to ask
ourselves is - Are we willing to sacrifice some cleanliness for
interoperability? If everyone is going to implement something like a
range header and if they all do it separately and thus differently, have
we aided the cause of interoperability? I am not saying that we should
sacrifice good design on the alter of performance. I am saying that
sometimes there are close calls when the clean solution has not so great
performance, and the close to clean solution has excellent performance.
I believe this is one of those times.

As for your observation on range lengths. If I understand your point
then you would want to see the range header specify the range to be
inserted into and then have the body placed in that range. If the range
to be inserted into is say, 15-20 and the body is 10 bytes long, then
the first 5 bytes of the range will be overwritten and the remaining 5
bytes of the request entity will be inserted. This is a nice feature but
it complicates the implementation of the server. Keeping the operations
as insert or overwrite not only makes for simpler code, it also makes it
easier to leverage services already present in OS's.

As for the issue of entity tags, there are times when one intentionally
wants to overwrite a byte range where the resource has been altered
since one last used it. However, if someone wants to make sure such a
change has not occurred, then an entity tag is a great choice. It is
certainly cheaper and easier than a lock.

		Yaron


> -----Original Message-----
> From:	Larry Masinter [SMTP:masinter@parc.xerox.com]
> Sent:	Thursday, March 20, 1997 11:23 PM
> To:	Yaron Goland
> Cc:	w3c-dist-auth@w3.org
> Subject:	Re: Partial Puts
> 
> Yaron Goland wrote:
> > 
> > 1       Problem Description
> > 
> > Clients who make small changes to resources do not wish to have to
> > upload an entire entity. As such, some sort of partial write
> capability
> > is needed.
> 
> The nice thing about the separate "problem description" is that it
> lets
> you consider whether this is a real problem, serious, and worth making
> the protocol more complex. Is the "partial write capability" actually
> required ("needed") or just an optimization? Is it so important that
> interoperable clients cannot be written without it, or is it just a
> convenient
> optimization?
> 
> > There are two types of partial writes, insertion writes and over
> writes.
> 
> Wait, there are writes that insert more, less, or exactly the same as
> the range that they're replacing. It's not clear that you would really
> need or want
> to distinguish these. An insert is just replacing a zero-length range
> with one
> that isn't zero length.
> 
> Also, there's something really troublesome about partial writes unless
> the partial write is against a particular entity (as identified by a
> strong
> entity tag) rather than against a resource.
> 
> > 2       Proposal
> > 
> > I propose that the range header be used with the write-type header
> to
> > specify that a partial PUT.
> > 
> > WriteType = "Write-Type" ":" 1#("INSERT" | "OVERWRITE" | Token) CRLF
> > 
> > The INSERT and OVERWRITE values must not be used together.
> > 
> > An INSERT indicates that the included entity should be inserted into
> the
> > location identified in the Range header, causing content already in
> the
> > resource to be moved forward.
> > 
> > An OVERWRITE indicates that the request body should overwrite
> whatever
> > exists in the range specified by the range header.
> > 
> > In both cases the Range header must only identify a single point.
> For
> > example, to specify that the request body should be inserted at byte
> 30
> > one would include "Range: bytes= 30-30".
> > 
> > An insertion at the beginning of a resource causes the entire
> resource
> > to be shifted forward to make room for the insertion. However an
> > insertion must not specify an entry point beyond the end of the
> > resource.
> > 
> > An over write may have a range that is just beyond the end of the
> > resource to indicate appending. In the case of bytes, the range
> should
> > specify exactly one byte beyond the end of the resource.
> > 
> > If the content-type of the request body is multipart/byte-ranges
> then
> > the previous behavior may be generalized across the multipart
> entries.
> > The server may ignore any entry that does not have both a range and
> > write-type header. The response should indicate that a range was
> skipped
> > due to the lack of either or both headers.
> > 
> > The Write-Type header may only be used in conjunction with the Range
> > header.
> > 
> > In addition there are any number of resources where the use of range
> and
> > write-type make no sense. In such a case the resource should return
> a
> > 412 Precondition Failed.
> > 
> > 3       Discussion
> > 
> > Clearly having the Write-Type header dropped would be a very bad
> thing.
> > As such it is necessary to use a PEP extension in order to guarantee
> > that the server will not process the method if it does not
> understand
> > the write-type or range headers.
> 
> Just as range locking might be considered a case of locking a resource
> which happens to be a range of another resource, perhaps range
> (over)writing
> might be instead characterized as PUT-ing to a resource which just
> happens
> to be a part of another resource.
> 
> That is, for the resource "MyDocument" you ask for the URL for the
> resource
> which corresponds to "Page1". The server gives you back a URL for
> Page1,
> which
> you PUT. The relationship between MyDocument and the Page1 resources
> is
> such
> that any update to Page1 of course updates the first page of
> MyDocument.
> 
> This will work when the resource is stored as a file, as streams in an
> SGML database, or with separate files per page without the client
> having
> to be aware of the server's internal representation of resources.
> Perhaps
> there's a round trip in URL discovery, but it keeps the semantics of
> part/whole independent of the internal representation.
>
Received on Friday, 21 March 1997 04:17:43 UTC