[whatwg] Asynchronous database API feedback from Maciej Stachowiak on 2007-12-09 (public-whatwg-archive@w3.org from December 2007)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 9 Dec 2007 03:52:23 -0800
Message-ID: <AEE06643-4E79-4112-BA0C-008E19BA21DD@apple.com>
On Dec 9, 2007, at 2:34 AM, Aaron Boodman wrote:

> On Dec 9, 2007 1:59 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>>> a) Disk access is typically going to be a lot faster than network
>>> access
>>
>> I think this assumption is, if not exactly incorrect, somewhat
>> misleading.
>>
>> For users on network home directories, disk access /is/ network
>> access. This is a common setup at large corporations and educational
>> institutions. We have specific experience with WebKit that doing
>> sqlite database access from the UI thread resulted in frequent long  
>> UI
>> stalls for Apple's users on network home directories.(*)
>>
>> Now, you might argue that network home directories are not "typical"
>> and this is true, but the web application has no way to know when it
>> might hit the atypical case and thereby block the user's UI, just as
>> with synchronous XMLHttpRequest you have no way to know when the user
>> is on a slow network connection.
>>
>> So I continue to believe that it's not safe to do synchronous I/O  
>> from
>> the UI thread.
>
> In the case of Firefox and IE on Windows and Mac at least (I don't
> recall the situation with Safari), Gears' sqlite database is stored in
> the Caches or "Local Settings" folder which, as I understand it, is
> meant to be on the local drive.

I'm not sure what  you mean specifically in the Mac case, but ~/ 
Library/Caches isn't guaranteed to be on a local drive on the Mac. I'm  
also not sure that's quite the right place semantically. It is  
intended that you should be able to delete all of ~/Library/Caches  
without any behavior changes besides possibly performance, so for  
instance http cookies are not stored there.

> Since part of the purpose of this API is to allow offline access, it
> doesn't seem to make sense to put the data on a network drive, at
> least for devices that are mobile.

On a mobile device, it doesn't make sense for your home directory to  
be on a network drive either. However, if you use one of several  
shared workstations, you probably want your local data to be there  
when the rest of your homedir is.

>> Another important consideration: even ignoring distributed
>> filesystems, how do web application developers decide when the writes
>> they are doing are definitely small enough that it's safe to use the
>> sync API?
>>
>> Your test shows 3KB written in a tenth of a second, but datastores
>> could easily be much larger than that. If the time scales linearly,
>> then even a modest 300KB of data could take 10 seconds to write,
>> clearly an unacceptable amount of time to stall the UI (I hope it
>> doesn't scale linearly because that would be alarmingly slow, but
>> clearly at some size it gets slow).
>
> There are many different use cases for the local database and a
> developer can make reasonable assumptions about how large the queries
> are going to be in many cases. For example, pulling up all the data
> required to render the first view of Reader is a totally different
> kind of query than updating the read or starred state of an individual
> item.

This doesn't really convince me that web developers have the tools at  
hand to make the right choice. Even an expert on the topic like you  
may not test in a very wide variety of hardware and software  
configurations, and so may assume a particular request is safe because  
it seems to work.

I'll be more convinced if there's a better answer to this than "make  
reasonable assumptions".

Thinking about it now, I can imagine a way to make this more concrete:  
give synchronous transactions a time limit, and if they exceed it they  
report an error and fail. We can be generous and say the limit is 5  
seconds, although that's awful close to unacceptable UI lockup  
territory. Possible drawbacks:

- I'm not sure it will be any easier to handle timeout errors than it  
would be to use the async API (since, if your request is too big to  
complete in reasonable time on this device, you probably have to use  
the async API as your fallback).

- In practice web developers probably won't handle the timeout error  
correctly, if it doesn't happen to them on their test setup, so web  
apps are likely to fail mysteriously when it does occur. But arguably  
this is better than a long UI hang, since that risks the user's whole  
browsing session, not just a single web app.

Thoughts? Would you be willing to use a synchronous API with a  
timeout? Do you think it's reasonable for other web developers? (I'm  
honestly not sure, I haven't thought it through in detail.)

>> Given how wildly hardware varies, I'm not sure how web developers can
>> safely make this choice. It seems likely that they'd choose whatever
>> seems to work for them in simple cases, and not test at all on slower
>> filesystems. If the queries they do vary in size, they may not test  
>> at
>> extreme sizes. These are the same kinds of cases where synchronous  
>> XHR
>> creates surprising problems - it seems ok on the developer's fast
>> local network, so why expect that end users will see a problem?
>
> It's a similar situation to XHR, but I think the parameters are  
> different:
>
> a) Synchronous network access is almost never be a good idea, but in
> our experience synchronous local disk access via SQLite is frequently
> fine.

I don't think "in our experience ... is frequently fine" translates to  
"is a good idea", when dealing something as variable as filesystems.  
It's like saying "multithreaded programs with complex locking are  
frequently fine", having only tested on single-CPU machines. (I've  
seen programs that go from "never deadlocks" to "deadlocks within  
seconds" when going from single-core to quad-core machines). And I  
hope most of us would agree that concurrency with shared read-write  
state is probably not a good API to offer in the browser, even though  
it's possible a few developers can sometimes get it right.

Getting back to storage, consider devices with a Flash drive as the  
primary disk. Most web developers won't test on these (did you?), but  
they have very different performance characteristics than hard drives.  
While there are no seek latencies to content with, and reads can be  
pretty fast, the write throughput can be quite a bit worse, especially  
for scattered small writes. In many cases, such devices have special  
filesystems that try to spread writes over the entire device, to  
increase Flash lifetime. But the result can be that write latencies  
get much worse than usual at unpredictable times.

I do think a sync API with timeout would adequately handle the variety  
in hardware, but it would have the significant drawbacks mentioned  
above.

Regards,
Maciej
Received on Sunday, 9 December 2007 03:52:23 UTC