Re: [WebSQLDatabase] Adding a vacuum() call from Dumitru Daniliuc on 2010-03-12 (public-webapps@w3.org from January to March 2010)

From: Dumitru Daniliuc <dumi@google.com>
Date: Fri, 12 Mar 2010 13:15:59 -0800
To: shane@sqlite.org
Cc: public-webapps <public-webapps@w3.org>
Message-ID: <e753f47f1003121315l184f4454q9a200068d9cde952@mail.gmail.com>
thanks for your comments, shane. a few questions:


> (1) VACUUM is never "necessary", though can sometimes reduce the size of
> the database file and provide a performance improvement if the database is
> large and fragmented.
>

large and fragmented databases is exactly the use case that made us think
about a vacuum() call.


> (2) Use PRAGMA freelist_count to estimate how much VACUUM will shrink a
> database.
>

if i understand the docs correctly, PRAGMA freelist_count will only return
the number of empty pages. so if we had many pages with very little data on
each one, vacuuming the database would considerably reduce its size, but
PRAGMA freelist_node would return 0. is this correct? is there a better way
to estimate the amount of space we'd get back if we vacuumed the database?
in particular, is there a way to figure out how fragmented each page is?


> (3) VACUUM requires O(N) time where N is the size of the database file, and
> can require up to 2N temporary space while processing.
>

i'm not so worried about the space: hard drives are much much bigger than
any DB a web app will have. the time it takes to vacuum a DB though is
something that UAs would probably want to take into account. do you have any
data on how long it takes to vacuum a fragmented database of 1MB? 10MB?
100MB? 1GB? i'm not looking for a benchmark, just some approximate data
(even though it would be great if you had a benchmark that you could share
with us).


> Our recommendation is that the use of VACUUM be discouraged.   The
> performance improvement only comes up with very large databases that are
> highly fragmented.  Where space reclamation is an issue, AUTO_VACUUM would
> be more appropriate.
>

the spec i proposed in my previous email allows UAs to ignore vacuum() calls
whenever they feel like it, so the UAs could just decide that it's not worth
vacuuming small databases. in cases of apps like offline gmail though, the
databases will be large, and i would guess that after a while they'd be
pretty fragmented, due to constantly deleting older emails and inserting
newer ones, so some kind of vacuuming would be needed.

AUTO_VACUUM: if i understand the docs correctly, auto-vacuum does not go
inside pages. it only automatically removes the pages that are entirely
free. is that correct? if so, then it seems to me that auto-vacuuming
doesn't really solve the fragmentation problem. also, what's the performance
overhead of having auto-vacuum on? and do we have any control over when it
kicks in? going back to my offline gmail example, i think it would be much
better to have the app call vacuum() when the user writes an email and the
app is otherwise idle, for example, then to have auto-vacuuming cause
user-visible delays when the app is syncing with the server or doing some
other intensive work.

thanks,
dumi



> -Shane
>
>
>
> On Thu, Mar 11, 2010 at 7:33 PM, Dimitri Glazkov <dglazkov@chromium.org>wrote:
>
>> I like the completion callback idea.
>>
>> Also like the notion of some sort of protection from "over-eager
>> vacuum-calling syndrome".
>>
>> :DG<
>>
>> On Thu, Mar 11, 2010 at 4:20 PM, Michael Nordman <michaeln@google.com>
>> wrote:
>> > Instead of calling back on success only, maybe call back on completion
>> > regardless of success or failure. This way the caller would know when
>> the
>> > potentially lengthy operation was done, regardless of the outcome.
>> >
>> > 2010/3/11 Dumitru Daniliuc <dumi@chromium.org>
>> >>
>> >> joao,
>> >>
>> >> it looks like we mostly agree on this feature, so i was wondering if we
>> >> could formally agree on a spec. here's what i propose:
>> >>
>> >> 1. name: vacuum. to the best of my knowledge, all current
>> WebSQLDatabases
>> >> implementations use SQLite, and in SQLite the command is called VACUUM.
>> so
>> >> it seems to me that we might as well call the new function vacuum().
>> what do
>> >> you think?
>> >>
>> >> 2. spec: no need for an error callback.
>> >>
>> >> interface Database {
>> >>   // the methods and properties currently in the spec
>> >>   void vacuum(in optional SQLVoidCallback successCallback);
>> >> };
>> >>
>> >> 3. what the call should do: the purpose of this call is to allow apps
>> to
>> >> vacuum/compact/defragment/clean up their databases whenever they see
>> fit.. a
>> >> call to vacuum() could take a considerable amount of time (especially
>> on
>> >> larger or highly fragmented databases); therefore, it is not
>> recommended for
>> >> web apps to call this method during periods of high activity.
>> >>
>> >> how to process a vacuum() call:
>> >>
>> >> if the UA does not support this call (mobile browsers?), jump to step
>> 3.
>> >> queue up a task to vacuum/compact/defragment/clean up the database..
>> >> if the task succeeds, and a success callback is provided, queue up a
>> task
>> >> to invoke the success callback; in all other cases (the task failed, or
>> no
>> >> success callback was provided), do nothing: proceed to the next task in
>> the
>> >> queue.
>> >>
>> >> does this seem acceptable? we (google engineers interested in this)
>> feel
>> >> that UAs should either not implement the vacuum() call, or they should
>> >> respect it (rather than taking it as a hint). it is ok for UAs to keep
>> track
>> >> of things like system idleness or databases closing to do more
>> vacuuming
>> >> that the apps asked for, if they want to. however, we feel that a
>> vacuum()
>> >> request by an app should not be postponed, simply because sometimes
>> apps
>> >> know better than UAs when the best time to vacuum is (it might be nice
>> to
>> >> give apps more information on how fragmented their databases are, but
>> that's
>> >> a separate discussion).
>> >>
>> >> thanks,
>> >> dumi
>> >>
>> >>
>> >> 2010/3/9 Jeremy Orlow <jorlow@google.com>
>> >>>
>> >>> On Mon, Mar 8, 2010 at 8:47 PM, Dumitru Daniliuc <dumi@google.com>
>> wrote:
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 8, 2010 at 3:39 AM, João Eiras <joaoe@opera.com> wrote:
>> >>>>>
>> >>>>>>> I don't see how the callbacks are useful though. Vacuum works
>> >>>>>>> transparently, its effects are not visible, and what should the
>> page
>> >>>>>>> do in
>> >>>>>>> case of error ?
>> >>>>>>>
>> >>>>>>
>> >>>>>> i was thinking of something like:
>> >>>>>>
>> >>>>>> db.defragment(errorCallback, successCallback);
>> >>>>>> showAPrettyImageAndAskTheUserToWait();
>> >>>>>>
>> >>>>>> function errorCallback(error) {}
>> >>>>>> function successCallback() {
>> >>>>>>  getRidOfThePrettyImageAndRestartTheApp();
>> >>>>>> }
>> >>>>>>
>> >>>>>> just like you, i'm not sure if the error callback is useful at all,
>> >>>>>> but i
>> >>>>>> thought i'd add it to make the defragment() call look more like a
>> >>>>>> transaction. maybe we don't need it.
>> >>>>>>
>> >>>>>
>> >>>>> True, but this is a kind of operation that could very well just run
>> on
>> >>>>> the background, with a single optional callback when it's done (the
>> webpage
>> >>>>> can't do anything if an error is detected anyway).
>> >>>>
>> >>>> ok, so let's drop the errorCallback: vacuum([optional]
>> successCallback);
>> >>>>
>> >>>>>
>> >>>>> The user agent would need to queue any subsequent transactions if a
>> >>>>> vacuum is running. I would consider it as an hint, and after all
>> webpages
>> >>>>> that own references to the underlying data files are closed, would
>> do a
>> >>>>> vacuum. So, if you have many tabs on gmail, and that a single gmail
>> instance
>> >>>>> tries to do multiple vacuums, it would equiv to one single vacuum
>> operation.
>> >>>>
>> >>>> what do we do if some databases are opened for the entire life of the
>> >>>> browser? for example, i open my browser which has myfavoriteapp.comset as
>> >>>> its homepage. myfavoriteapp.com immediately opens a DB, and i only
>> close
>> >>>> that app when i close the browser. when would the browser vacuum
>> >>>> myfavoriteapp's DBs in this case?
>> >>>>
>> >>>> i think it's ok for the UA to vacuum some DBs automatically when it
>> >>>> thinks it's a good time to do so; however, if a platform supports the
>> >>>> vacuum/defrag call (i.e. if it doesn't treat it is a no-op), then i
>> think a
>> >>>> vacuum call coming from the app should be immediately scheduled (yes,
>> the
>> >>>> subsequent transactions would have to wait for the vacuuming to
>> finish
>> >>>> running). in some cases, the apps know better than the UA when to
>> vacuum
>> >>>> their DBs.
>> >>>>
>> >>>> by the way, we should probably agree on a name for this call. which
>> one
>> >>>> do you prefer? vacuum, defragment, defrag, something else? i don't
>> have a
>> >>>> strong opinion.
>> >>>
>> >>> I think vacuum is fine since the spec is already tied to the SQLite
>> SQL
>> >>> dialect.
>> >>> collectGarbage() is another possibility
>> >>> Go with whatever you think is most clear and accurate though.
>> >>> J
>> >
>> >
>>
>>
>>
>
Received on Friday, 12 March 2010 21:16:32 UTC