Re: [google-gears-eng] Re: Deploying new expectation-extensions from Mark Nottingham on 2008-09-15 (ietf-http-wg@w3.org from July to September 2008)

From: Mark Nottingham <mnot@yahoo-inc.com>
Date: Tue, 16 Sep 2008 09:25:45 +1000
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Charles Fry <fry@google.com>, <gears-eng@googlegroups.com>, Alex Rousskov <rousskov@measurement-factory.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <66B5EA2A-6330-4E89-842C-1F2EAB470B07@yahoo-inc.com>
On 15/09/2008, at 11:13 PM, Julian Reschke wrote:

> Mark Nottingham wrote:
>> On 12/09/2008, at 5:25 PM, Julian Reschke wrote:
>>>>> As far as I can tell, if you use the ETag based approach, and  
>>>>> multiple clients try to post to the same collection (POST URI),  
>>>>> then you'll have to disambiguate the requests. That problem  
>>>>> would go away if each of them would use a different URL.
>>>> I read the doc as saying that the server would provide unique  
>>>> ETags somehow...
>>>
>>> Disambiguating by ETag probably would work, but that doesn't feel  
>>> right to me. If multiple resumable transfers can be in progress at  
>>> the same point of time, then this really sounds like multiple  
>>> resources (thus multiple URIs), not multiple variants of the same  
>>> resource to me.
>> Huh. That's very revealing, I think (if unintentional :) POST can  
>> already create a new resource with a new, server-selected URI, and  
>> the pattern for doing so is already described with POST, 201 and  
>> Location.
>
> Of course POST can do that. Did anybody argue something else?

No, just laying the groundwork...


>> Question: If I want to make this sort of request resumeable, do I  
>> do this?
>> REQ: POST /a
>> REQ: Content-Range: bytes */100
>> RES: 308 Resume Incomplete
>> RES: Location: /b
>> REQ: POST /b
>> REQ: Content-Range: 0-100/100
>> REQ: [bytes]
>> RES: 200 OK
>> or this?
>> REQ: POST /a
>> REQ: Content-Range: bytes */100
>> RES: 308 Resume Incomplete
>> RES: Location: /z
>> REQ: POST /z
>> REQ: Content-Range: 0-100/100
>> REQ: [bytes]
>> RES: 201 Created
>> RES: Location: /b
>> ?
>> The important part here is: is this protocol defining a "temporary"  
>> resource (with a very specific interface) for the Location in a 308  
>> refers to, or is the Location in a 308 referring to a "regular"  
>> resource that's used for more than that?
>
> If I understand correctly, in the first example the server  
> immediately assigns the URI for the resource-to-be-created, let's  
> the client know it, and lets it transfer the remaining bytes to that  
> resource. In this case, the server would need to make sure that this  
> resource is only available to the client until the transfer is  
> completed.

... or use ETags on the requests to differentiate them.

> In the second case, the server assigns a temporary URI which is just  
> used to complete the transfer. Once that's done, the "final"  
> resource is being created. This looks similar to what Roy proposed  
> in <http://lists.w3.org/Archives/Public/ietf-http-wg/2008AprJun/0082.html 
> >.
>
> I think the second approach is more versatile, because it also  
> covers cases where the server wouldn't create a new URI upon POST.

... as would using ETags to differentiate requests. :)   Understand  
I'm not pushing that solution strongly (yet), It's just that they're  
functionally equivalent so far.


>> It's interesting to note that the second approach (with the temp  
>> resource) preserves the 201 status code in the interchange, while  
>> in the former approach, it's not there (308 usurps it).
>> Now look at it with PUT (to a not-yet-existent resource);
>> REQ: PUT /a
>> REQ: Content-Range: bytes */100
>> RES: 308 Resume Incomplete
>> RES: Location: /b
>> REQ: PUT /b
>> REQ: Content-Range: 0-100/100
>> REQ: [bytes]
>> RES: 201 Created
>> RES: Location: /a
>> Here, if we use URIs, /b *has* to be a "temporary" resource with a  
>> very specifically defined behaviour; it accepts PUTs and has a side  
>> effect of having its bytes copied to /a (presumably when the final  
>> 201 is sent).
>
> Yes.
>
> BTW: it wouldn't necessarily be PUT; it could be anything that  
> allows "appending", such as POST or PATCH.

Good point.


>> My point here is that there are actually some pretty deep  
>> differences between the URI approach and the ETag approach; the URI  
>> approach is much more intrusive and needs to be specified in a  
>> different way (e.g., talking about what methods to use, the nature  
>> of the resource created, etc.).
>
> Well, not entirely.
>
> Let's say you've got three concurrent resumable transfers started  
> to /a, and the server has assigned the etags "C1-1", "C2-1" and  
> "C3-1" to the three clients.
>
> Client 1 starts its upload:
>
> REQ: POST /a
> REQ: If-Match: "C1-1"
> REQ: Content-Range: 0-50/100
> REQ: [bytes]
> RES: 200 OK
> RES: ETag: "C1-2"
>
> ...what I'm concerned if is that we're essentially introduce ETag- 
> based variant-selection here.
>
> So what do these ETags represent in requests *other* than resumable  
> uploads, such as in:
>
> REQ: GET/a
> REQ: If-Match: "C1-1"
>
> ?
>
> Will it have an effect?
>
> So just avoiding new URIs may look simpler first, but it also  
> requires additional specification work.

It will require additional spec work if this use of ETags is  
incompatible with existing ones. If it is, they shouldn't be used for  
this purpose at all, but I don't *think* they are (happy to be proven  
wrong, as always).

I.e., yes, if-match will work as specified; what else needs to be said?


>> Back to your comment;
>>> Disambiguating by ETag probably would work, but that doesn't feel  
>>> right to me. If multiple resumable transfers can be in progress at  
>>> the same point of time, then this really sounds like multiple  
>>> resources (thus multiple URIs), not multiple variants of the same  
>>> resource to me.
>> I don't know that I agree; with PUT, it's very natural to use ETags  
>> (you avoid creating the temporary resource, and have the option of  
>> 409'ing any concurrent PUTs after the first), whereas with POST,  
>> you're just
>
> That assumes that PUT with Content-Range can be used today, which  
> really isn't the case, unless the client can be confident that the  
> server actually understands PUT with ranges.

That seems to be a problem with all the approaches on the table,  
according to the flows in the current document. By the letter of the  
law, if the server doesn't understand a Content-* header on a PUT  
request, it should refuse it, but we already have an open issue or two  
(#79, #102) on that...


>> pushing the assignment of a final identifier for the created  
>> resource until the entire request entity is received (which is the  
>> case with the URI-based approach anyway, unless you're arguing that  
>> POST is a special case and *doesn't* create a temporary resource,  
>> unlike PUT), and you still have the option of not assigning it any  
>> identity (just as many POST processors do today).
>> So, I'm firmly leaning in the direction of the ETags-only approach  
>> now; I think the selection of a URI for created resources is  
>> separable, and should be separate.
>
> I agree with that part; the URI assigned for the upload really  
> should be temporary.
>
> With that, the approaches are almost identical; in both cases unique  
> identifiers are minted (ETags or URIs), the server needs to deal  
> with house keeping, and the impact of other methods must be  
> understood and specified.
>
> BR, Julian
>



BTW, we should really be talking about an Internet-Draft at this  
point, rather than a wiki page. Google guys, when will that happen?


--
Mark Nottingham       mnot@yahoo-inc.com
Received on Monday, 15 September 2008 23:27:12 UTC