- From: Shane Harrelson <shane@sqlite.org>
- Date: Fri, 12 Mar 2010 23:25:55 -0500
- To: public-webapps <public-webapps@w3.org>
- Message-ID: <969b6a981003122025v6f075267u87ae9946dda53ed@mail.gmail.com>
On Fri, Mar 12, 2010 at 4:15 PM, Dumitru Daniliuc <dumi@google.com> wrote: > thanks for your comments, shane. a few questions: > > >> (1) VACUUM is never "necessary", though can sometimes reduce the size of >> the database file and provide a performance improvement if the database is >> large and fragmented. >> > > large and fragmented databases is exactly the use case that made us think > about a vacuum() call. > > >> (2) Use PRAGMA freelist_count to estimate how much VACUUM will shrink a >> database. >> > > if i understand the docs correctly, PRAGMA freelist_count will only return > the number of empty pages. so if we had many pages with very little data on > each one, vacuuming the database would considerably reduce its size, but > PRAGMA freelist_node would return 0. is this correct? is there a better way > to estimate the amount of space we'd get back if we vacuumed the database? > in particular, is there a way to figure out how fragmented each page is? > > Without looking at the specific page structure on the database, there's not a good way to figure this out. However, the B-Tree balancing algorithm used by SQLite will attempt to merge pages with neighbors when there space utilization drops below certain thresholds. Minimum average fill for intkey leaves is 50%. For other pages I think it's 66%. > (3) VACUUM requires O(N) time where N is the size of the database file, and >> can require up to 2N temporary space while processing. >> > > i'm not so worried about the space: hard drives are much much bigger than > any DB a web app will have. the time it takes to vacuum a DB though is > something that UAs would probably want to take into account. do you have any > data on how long it takes to vacuum a fragmented database of 1MB? 10MB? > 100MB? 1GB? i'm not looking for a benchmark, just some approximate data > (even though it would be great if you had a benchmark that you could share > with us). > > VACUUM is largely io-bound and so the speed greatly depends on speed of the underlying filesystem. We generally use the rule of thumb of 1 second per MB on basic systems. > Our recommendation is that the use of VACUUM be discouraged. The >> performance improvement only comes up with very large databases that are >> highly fragmented. Where space reclamation is an issue, AUTO_VACUUM would >> be more appropriate. >> > > the spec i proposed in my previous email allows UAs to ignore vacuum() > calls whenever they feel like it, so the UAs could just decide that it's not > worth vacuuming small databases. in cases of apps like offline gmail though, > the databases will be large, and i would guess that after a while they'd be > pretty fragmented, due to constantly deleting older emails and inserting > newer ones, so some kind of vacuuming would be needed. > > AUTO_VACUUM: if i understand the docs correctly, auto-vacuum does not go > inside pages. it only automatically removes the pages that are entirely > free. is that correct? if so, then it seems to me that auto-vacuuming > doesn't really solve the fragmentation problem. also, what's the performance > overhead of having auto-vacuum on? and do we have any control over when it > kicks in? going back to my offline gmail example, i think it would be much > better to have the app call vacuum() when the user writes an email and the > app is otherwise idle, for example, then to have auto-vacuuming cause > user-visible delays when the app is syncing with the server or doing some > other intensive work. > > When combined with the basic page merging mentioned above, AUTO_VACUUM does a very good job, and the performance overhead is minimal. There's no control over when it kicks in. > thanks, > dumi > > > >> -Shane >> >> >> >> On Thu, Mar 11, 2010 at 7:33 PM, Dimitri Glazkov <dglazkov@chromium.org>wrote: >> >>> I like the completion callback idea. >>> >>> Also like the notion of some sort of protection from "over-eager >>> vacuum-calling syndrome". >>> >>> :DG< >>> >>> On Thu, Mar 11, 2010 at 4:20 PM, Michael Nordman <michaeln@google.com> >>> wrote: >>> > Instead of calling back on success only, maybe call back on completion >>> > regardless of success or failure. This way the caller would know when >>> the >>> > potentially lengthy operation was done, regardless of the outcome. >>> > >>> > 2010/3/11 Dumitru Daniliuc <dumi@chromium.org> >>> >> >>> >> joao, >>> >> >>> >> it looks like we mostly agree on this feature, so i was wondering if >>> we >>> >> could formally agree on a spec. here's what i propose: >>> >> >>> >> 1. name: vacuum. to the best of my knowledge, all current >>> WebSQLDatabases >>> >> implementations use SQLite, and in SQLite the command is called >>> VACUUM. so >>> >> it seems to me that we might as well call the new function vacuum(). >>> what do >>> >> you think? >>> >> >>> >> 2. spec: no need for an error callback. >>> >> >>> >> interface Database { >>> >> // the methods and properties currently in the spec >>> >> void vacuum(in optional SQLVoidCallback successCallback); >>> >> }; >>> >> >>> >> 3. what the call should do: the purpose of this call is to allow apps >>> to >>> >> vacuum/compact/defragment/clean up their databases whenever they see >>> fit.. a >>> >> call to vacuum() could take a considerable amount of time (especially >>> on >>> >> larger or highly fragmented databases); therefore, it is not >>> recommended for >>> >> web apps to call this method during periods of high activity. >>> >> >>> >> how to process a vacuum() call: >>> >> >>> >> if the UA does not support this call (mobile browsers?), jump to step >>> 3. >>> >> queue up a task to vacuum/compact/defragment/clean up the database.. >>> >> if the task succeeds, and a success callback is provided, queue up a >>> task >>> >> to invoke the success callback; in all other cases (the task failed, >>> or no >>> >> success callback was provided), do nothing: proceed to the next task >>> in the >>> >> queue. >>> >> >>> >> does this seem acceptable? we (google engineers interested in this) >>> feel >>> >> that UAs should either not implement the vacuum() call, or they should >>> >> respect it (rather than taking it as a hint). it is ok for UAs to keep >>> track >>> >> of things like system idleness or databases closing to do more >>> vacuuming >>> >> that the apps asked for, if they want to. however, we feel that a >>> vacuum() >>> >> request by an app should not be postponed, simply because sometimes >>> apps >>> >> know better than UAs when the best time to vacuum is (it might be nice >>> to >>> >> give apps more information on how fragmented their databases are, but >>> that's >>> >> a separate discussion). >>> >> >>> >> thanks, >>> >> dumi >>> >> >>> >> >>> >> 2010/3/9 Jeremy Orlow <jorlow@google.com> >>> >>> >>> >>> On Mon, Mar 8, 2010 at 8:47 PM, Dumitru Daniliuc <dumi@google.com> >>> wrote: >>> >>>> >>> >>>> >>> >>>> On Mon, Mar 8, 2010 at 3:39 AM, Joćo Eiras <joaoe@opera.com> wrote: >>> >>>>> >>> >>>>>>> I don't see how the callbacks are useful though. Vacuum works >>> >>>>>>> transparently, its effects are not visible, and what should the >>> page >>> >>>>>>> do in >>> >>>>>>> case of error ? >>> >>>>>>> >>> >>>>>> >>> >>>>>> i was thinking of something like: >>> >>>>>> >>> >>>>>> db.defragment(errorCallback, successCallback); >>> >>>>>> showAPrettyImageAndAskTheUserToWait(); >>> >>>>>> >>> >>>>>> function errorCallback(error) {} >>> >>>>>> function successCallback() { >>> >>>>>> getRidOfThePrettyImageAndRestartTheApp(); >>> >>>>>> } >>> >>>>>> >>> >>>>>> just like you, i'm not sure if the error callback is useful at >>> all, >>> >>>>>> but i >>> >>>>>> thought i'd add it to make the defragment() call look more like a >>> >>>>>> transaction. maybe we don't need it. >>> >>>>>> >>> >>>>> >>> >>>>> True, but this is a kind of operation that could very well just run >>> on >>> >>>>> the background, with a single optional callback when it's done (the >>> webpage >>> >>>>> can't do anything if an error is detected anyway). >>> >>>> >>> >>>> ok, so let's drop the errorCallback: vacuum([optional] >>> successCallback); >>> >>>> >>> >>>>> >>> >>>>> The user agent would need to queue any subsequent transactions if a >>> >>>>> vacuum is running. I would consider it as an hint, and after all >>> webpages >>> >>>>> that own references to the underlying data files are closed, would >>> do a >>> >>>>> vacuum. So, if you have many tabs on gmail, and that a single gmail >>> instance >>> >>>>> tries to do multiple vacuums, it would equiv to one single vacuum >>> operation. >>> >>>> >>> >>>> what do we do if some databases are opened for the entire life of >>> the >>> >>>> browser? for example, i open my browser which has myfavoriteapp.comset as >>> >>>> its homepage. myfavoriteapp.com immediately opens a DB, and i only >>> close >>> >>>> that app when i close the browser. when would the browser vacuum >>> >>>> myfavoriteapp's DBs in this case? >>> >>>> >>> >>>> i think it's ok for the UA to vacuum some DBs automatically when it >>> >>>> thinks it's a good time to do so; however, if a platform supports >>> the >>> >>>> vacuum/defrag call (i.e. if it doesn't treat it is a no-op), then i >>> think a >>> >>>> vacuum call coming from the app should be immediately scheduled >>> (yes, the >>> >>>> subsequent transactions would have to wait for the vacuuming to >>> finish >>> >>>> running). in some cases, the apps know better than the UA when to >>> vacuum >>> >>>> their DBs. >>> >>>> >>> >>>> by the way, we should probably agree on a name for this call. which >>> one >>> >>>> do you prefer? vacuum, defragment, defrag, something else? i don't >>> have a >>> >>>> strong opinion. >>> >>> >>> >>> I think vacuum is fine since the spec is already tied to the SQLite >>> SQL >>> >>> dialect. >>> >>> collectGarbage() is another possibility >>> >>> Go with whatever you think is most clear and accurate though. >>> >>> J >>> > >>> > >>> >>> >>> >> >
Received on Saturday, 13 March 2010 04:26:28 UTC