Re: DataCache API - editor's draft available from Nikunj R. Mehta on 2009-07-20 (public-webapps@w3.org from July to September 2009)

From: Nikunj R. Mehta <nikunj.mehta@oracle.com>
Date: Mon, 20 Jul 2009 09:22:30 -0700
To: Mark Nottingham <mnot@yahoo-inc.com>
Cc: public-webapps WG <public-webapps@w3.org>
Message-Id: <B7121644-CAD8-4119-97AD-578F895D2C3D@oracle.com>
Hi Mark,

I am happy to see your feedback on DataCache. Forgive me for the delay  
in responding.

On Jul 17, 2009, at 4:50 PM, Mark Nottingham wrote:

> I think this work is in an interesting space but, unfortunately,  
> it's doing it without reference to the existing HTTP caching model,  
> resulting in a lot of duplicated work, potential conflicts and  
> ambiguities, as well as opportunity cost.

I don't understand this fully, can you please explain? From what I  
know, the Gears implementation can be easily extended to support  
DataCache. Of course, one doesn't need all of Gears - only LocalServer  
and browser integration is required. I don't see that as a lot of  
duplicated work.

>
> Furthermore, it's specifying an API as the primary method of  
> controlling caches. While that's understandable if you look at the  
> world as a collection of APIs, it's also quite limiting; it  
> precludes reuse of information, unintended uses, and caching by  
> anything except the browser.

FWIW, DataCache is not the first attempt at obtaining an API to  
control a browser's HTTP cache. That was already the case with  
ApplicationCache in HTML5.

I don't quite understand what problems you foresee with DataCache's  
approach. It does not ask the implementor to violate any HTTP caching  
semantics. If anything, it suggests that the implementation can offer  
an off-line response should an on-line response be infeasible.

This is based on my reading of the following pieces of text from  
RFC2616.

 From §13,
[[
Requirements for performance, availability, and disconnected operation  
require us to be able to relax the goal of semantic transparency.
...
Protocol features that allow a cache to attach warnings to responses  
that do not preserve the requested approximation of semantic  
transparency.
]]

 From §13.1.6
[[
A client MAY also specify that it will accept stale responses, up to  
some maximum amount of staleness. This loosens the constraints on the  
caches, and so might violate the origin server's specified constraints  
on semantic transparency, but might be necessary to support  
disconnected operation, or high availability in the face of poor  
connectivity.
]]
Can you please correct me if I have misinterpreted or misapplied these  
provisions of HTTP? Alternatively, can you point me to a valid  
interpretation of these portions in the context of an open  
implementation/application?c
>
> A much better solution would be to declaratively define what URIs  
> delineate an application (e.g., in response headers and/or a  
> separate representation), and then allow clients to request an  
> entire application to be fetched before they go offline (for  
> example). I'm aware that there are other use cases and capabilities  
> here, this is just one example.

Am I correct in understanding that you find pre-fetching the entire  
application to be better than pre-fetching parts of it.  In any case,  
are you also suggesting a data format for specifying a collection of  
such URIs that the user agent should pin down in cache? How does a  
data format form a better solution as opposed to an API?

Additionally, it is not always possible to statically define the  
collection of URIs that are of interest to an application. Let me take  
an example -

*Sales force automation*
My sales reps work in parts of the world where assuming a reliable  
network connection is not a good assumption to make. Still I would  
like to deploy order entry applications that work reliably in the face  
of poor network connection on a small mobile computer with a Web  
browser. Today I am going on a round of my customers in Fallujah and I  
need to have information about customers in that area, including their  
names, addresses, and order history (and status). This information  
changes regularly and my sales reps benefit from up-to-the-minute  
order history information if I can connect to the server at the time I  
am at the customer's office. If I don't have network access, I at  
least have up-to-the-date information. Finally, I want to enable the  
sales rep to take orders when they are out in the field and provided  
they don't lose the device, I want to assure them that their orders  
will make it to the company's servers. If connectivity is available at  
that instant, then the order will be confirmed immediately and  
processing would begin. If not, it would be kept pending.

Developers until now have developed and deployed such off-line  
applications outside the context of the Web architecture - i.e., no  
URIs, no uniform methods, etc. They will continue to do the same with  
SQL databases inside Web browsers - still no URIs, a single method -  
POST - and an off-line only solution (meaning it cannot take  
opportunistic advantage of available networks). Is this a more  
desirable approach than to provide an API to a subset of the browser's  
HTTP cache?

>
> Doubtless there's still a need for some new APIs here, but I think  
> they should be minimal (e.g., about querying the state of the cache,  
> in terms of offline/online, etc.), not re-defining the cache itself.

Can you elaborate a little more? What do you mean by re-defining the  
cache? Can you provide specific reasons why the DataCache API seems  
like redefining the cache?

>
> FWIW, I'd be very interested in helping develop protocols and APIs  
> along what's outlined above.

Sorry, but I didn't see any outline. May be I missed something and  
would appreciate if you can specifically provide an outline.

In any case, I welcome you to offer your counsel on better addressing  
the requirements of DataCache that I have previously stated [1]. It  
would be best if these requirements can be addressed through the  
correct use of HTTP as opposed to API magic.

>
> Cheers,
>
> P.S. This draft alludes to automatic prefetching without user  
> intervention. However, there is a long history of experimentation  
> with pre-fetching on the Web, and the general consensus is that it's  
> of doubtful utility at best, and dangerous at worst (particularly in  
> bandwidth-limited deployments, where bandwidth is charged for, as  
> well as when servers are taken down because of storms of prefetch  
> requests).

There is now also a fairly large amount of experience with prefetching  
outside of the regular HTTP ambit. Siebel CRM (one of the most popular  
enterprise non-productivity off-line applications) as well as MySpace  
and GMail both pre-fetch thousands if not more pieces of data and  
store them locally. Have you considered this experience as relevant?

I may not be wrong in saying that the general observation you are  
making is not the relevant in DataCache's case.

While in the general case, pre-fetching is not a good idea, but why  
kill the messenger? Let programmers make the right choice for their  
applications and learn from their own experience. IMHO, not doing  
DataCache like things turns people away from using (and I mean not  
abusing) the Web for more brittle and less widely deployable as well  
as far more laboriously crafted architectures.

[1] http://lists.w3.org/Archives/Public/public-webapps/2008OctDec/0104.html
Received on Monday, 20 July 2009 16:24:49 UTC