Comments on draft-ietf-httpbis-p2-semantics-07

Hi all,

There's a common use case HTTP has not addressed well: asynchronous
file uploads. Many web browsers have "download managers", but none
have "upload managers". Large file uploads are left to application
developer, because browsers  have poor guidance from the spec on how
they should interact with servers and intermediaries, during an
asynchronous upload.

Example problem:  My web form allows  the user to upload a file. My
application will create a resource -- say, a PDF document -- from it.
It might take my server 2 seconds or 20 minutes to do so. If my server
can process the request quickly, it returns 201 Created, and sets the
Location header to the URI of the new PDF. (And lots of web apps
return 303 in this case). It might also return the PDF as the entity
response. If my server must process the request asynchronously,
HTTPbis semantics say I  should return 202 Accepted. In this case the
server can return an entity that gives the current status, and links
to a status monitor.

First, notice that the in the asynchronous case HTTP suggests the
entity should both give the current status and link to a status
monitor. That's duplicative, and makes the user click again to go to a
real status monitor. But it's the best we can do -- the UA POSTed to
this URL. It cannot POST again just to get an updated status.

Second, HTTP has not suggested sending a Location header as part of
the 202 response. Although that might have been a good idea, it's now
moot. Browsers have never followed Location headers in 2xx responses
-- they just display the returned entity. Changing that semantic now
could be harmful -- that understanding is baked into millions of web
applications and browsers.

In fact, the spec (HTTPbis 8.2.2 and 8.2.3) is out of sync with what
web apps actually do. They do not return 202, display an entity
showing a status requiring a user to click again. A typical webapp
will return 303, redirecting the user to a page having the status
monitor, saving the user a useless click.

So, my first suggestion is that the first paragraph of 8.2.2 (201
Created) ought  to be amended to suggest both 202 and 303 as
appropriate responses.

Now, in the case the server responds 303 to the POST,  no browser or
other agent  is harmed. 303 is a common response to POST. The browser
now does GET on the redirect URI. Ordinary web applications might
return 200 and a status monitor entity. Other applications might
return 202. After all, 202 is the "real" response to the original
request -- and that is the meaning of the 303.  Furthermore, returning
202 in response to a GET suggests that if the UA does a GET on this
URL later, the status could change to indicate the outcome of the
original POST. In the example problem, a UA redirected in the response
to the POST, to a status page  that returns 202, could GET that url
until the status changes -- in the case of a successful PDF
generation, either a 201 Created response, or another 303 or 302; in
any case, the Location URI finally will contain the URL of the newly
created resource.

My second suggestion is that that sec 8.2.3 (202 Accepted) should
elaborate on the meaning of 202 in response to GET. Up to now, 202 has
seen limited use, and it's generally thought of as a response to POST,
PUT, or DELETE. But as a response to GET, it should have this added
meaning that the returned status code is likely to change in the
future, so please try again.

Clarifying these semantics could allow agents to take over a lot of
the automation of asynchronous file uploads. Web browsers could have
an "Uploads" window where you check the status of your uploads.  Web
client libraries could also take over  automating this function from
application code.


Hugh

Received on Saturday, 18 July 2009 22:37:21 UTC