[whatwg] SQL API error handling from Ian Hickson on 2007-10-18 (public-whatwg-archive@w3.org from October 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 18 Oct 2007 01:58:21 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0710180131110.19595@hixie.dreamhostps.com>
On Mon, 15 Oct 2007, Brady Eidson wrote:
>
> In some embedded (and client-server) database implementations - 
> including SQLite - continuing to operate on a database that is known to 
> be corrupt can lead to the process crashing.  Unlike the "CPU core just 
> overheated" case, it is a dangerous state software can help avoid.

Ok... but why can't the software simply avoid corrupting the database in 
the first place?

We don't have error codes for "the DOM has become corrupted" or for "the 
Window object's properties have become corrupted", why do we need one for 
"the Database's contents are corrupted"? Why can't the database contents 
simply not be corrupted in the first place?

As I see it there are two ways the database could get corrupted -- 
software error (bugs, spec-wise illegal behavious) and uncontrollable 
external factors (cosmic rays, hardware failure, overheating, the user 
hexediting the database file...).

The uncontrollable external factors can affect _anything_. The DOM could 
get corrupted in RAM by cosmic rays while the browser is running. A chip 
could overheat and return bogus data randomly for a canvas getPixelData() 
call. The drive could intermittently fail, preventing the cookies from 
being read completely. We don't have any sort of handling for _any_ of 
these problems right now, and I don't see why the database would be a 
special case. If we want to handle these problems, we should find a 
generic solution, IMHO (though it might involve errorCode values for the 
Database object as well).

So this leaves the software error case. Specifying what should happen in 
the case of a software error seems to be about as pointless as regulating 
how outlaws should behave. By definition, the rules aren't being followed 
-- how can we provide rules to decide how things work?


> How the user agent handles the problem long term (ask the user what to 
> do, silently delete and recreate it, let the database continue to be 
> corrupt, etc) may not need to be specified, but perhaps it would be 
> prudent to change the spec to at least suggest that if a database 
> becomes "known to be corrupt," operations on all open handles to that 
> database should start throwing INVALID_STATE_ERR exceptions.

I think it would be reasonable for the UA to start raising UA-specific 
exceptions in this case, but I don't see why it's any different than the 
UA detecting that the network stack is corrupted and raising exceptions 
on calls to XMLHttpRequest.


On Tue, 16 Oct 2007, Brady Eidson wrote:
> 
> Say corruption is detected and we present this error to the script 
> somehow - an exception for example.  Then we nuke the database and 
> recreate it from scratch.  Unless the error we presented to the script 
> was explicit about the condition, the script might not know that we just 
> deleted the entire database and it is now empty.  They might execute 
> some query that will succeed on the new, fresh database and they didn't 
> even realize we reset the database behind their back.  They might think 
> that some cached data they have in memory is still persistent in the 
> database, instead of knowing they could recover a little by writing it 
> back out.
>
> It would be nice to have a way to indicate to the script "There was a 
> catastrophic event and we reset your database, assume you're starting 
> over from scratch."

But what if the catastrophic failure was that the Window object's 
properties got wiped? Should we fire an event for that case and let the 
script repopulate the Window object from the database?

I guess I just don't understand the expected scenarios where failure is to 
occur in such a localised case as to make this useful.


On Tue, 16 Oct 2007, Brady Eidson wrote:
> 
> You are (rightly) reading very specifically into what I am saying, 
> whereas what I'm trying to get at is still vague and general.

I'm also trying to read very specifically into what you're saying, since I 
have to write very specifically into the spec. :-)


> Let me take a step back and try to frame it at a higher level
> 
> - A page opens a valid database handle.
> - Some script uses that database handle - successfully
> - Some external event happens on the client machine - database corruption, the
> user deletes the database from the user agent's "database management mode",
> gamma rays corrupt a single bit on the disk, or whatever.  This event renders
> the database unusable.
> - Some action is taken to reset the database so that it is usable - lets
> pretend the resolution is always recreating an empty database from scratch
> - The script knows something wrong happened - it has a completely 100% generic
> error on its query.  But it is unaware of the nature of this event and its
> resolution.  It decides to execute a new sql statement, and the value of this
> statement (from the script's perspective) is reliant on previously established
> values in the database.  The statement coincidentally succeeds even with the
> new empty database.
> 
> For further argument, lets pretend the script is in an onunload handler and
> its writing some small piece of data out to the database before the user
> "quits".  It has a lot of other data in memory client-side that it *thinks* is
> in the database, but really isn't anymore.  It *could* write this data out,
> preserving a lot of important user state.  But it doesn't know to do so.

Are we seriously expecting authors to check for the extremely unlikely 
case of the database getting corrupted during their onunload, just so that 
they can try the long shot of writing everything back to the database 
again and hope it works?


> One can certainly make the argument that if this were a native application
> saving data to disk, it would be prudent to verify its data on disk.

Would it? I would imagine an application doing this would really just be 
checking the contents of the disk cache (and probably at the OS level, 
not even the disk's own internal cache).

I would imagine the same applies here, actually.


> But I think "raw filesystem" and "database" are different paradigms with 
> different usage expectations.
>
> An error code along the lines of "your database was just reset" might 
> fit the bill.  This could be because of corruption, because the user 
> agent database management was invoked and the database cleared, or a 
> number of other reasons.
> 
> This is a requested split from code 1 - "The statement failed for 
> reasons not covered by any other code."

I think this would be an event, not an error code.


On Wed, 17 Oct 2007, timeless wrote:
> >
> > Let me take a step back and try to frame it at a higher level
> >
> > - A page opens a valid database handle.
> > - Some script uses that database handle - successfully
> > - Some external event happens on the client machine - database
> > corruption, the user deletes the database from the user agent's
> > "database management mode", gamma rays corrupt a single bit on the
> > disk, or whatever.  This event renders the database unusable.
> 
> I'd like to somehow request that this point be stressed in the specification.

Which point?


> I haven't spoken often, but I'm working on a browser for the Nokia 770 
> and Nokia N800. The 770 has very little local storage, but most likely 
> there will be a version of the browser we're producing that will include 
> this SQL API. I haven't done the math, but I think it amounts to 
> something like room for not more than 5 website databases with 5mb each 
> and a chance of failing to commit a changeset fairly often.
> 
> A couple of questions along these lines:
> 1. user launches browser
> 2. user visits Gmail
> 3. Gmail sets up a database
> 4. network goes out of range (transition to offline)
> 5. user composes and saves a message as a draft
> 6. user quits browser
> 7. browser saves database to disk
> 8. user launches browser
> 9. user visits Gmail
> 10. user sends the message
> 11. user connects to network (transition to online)
> 12. Gmail sends message
> 13. Gmail performs SQL operation to remove unsent message from in
> memory database
> 14. SQL API reports that this transaction has successfully completed -
> against an in memory database.
> 15. network goes out of range (transition to offline)
> 16. Device reboots (buggy wireless driver, effectively, the Browser
> has crashed without saving updated database to disk, last save was at
> point 7).
> 17. User launches browser
> 18. User visits Gmail
> 19. User opens the draft
> 20. User changes the draft
> 21. User saves the message
> 22. user connects to network (transition to online)
> 
> At which points in these steps should something interesting happen that 
> I haven't documented?

What do you mean by "interesting"?


> version two.
> 
> A couple of questions along these lines:
> 1. user launches browser
> 2. user visits Gmail
> 3. Gmail sets up a database
> 4. network goes out of range (transition to offline)
> 5. user composes and sends a message
> 6. user quits browser
> 7. browser saves database to disk
> 
> 8. user launches browser
> 
> user visits 4 other web based applications that successfully create databases.
> 
> 9. user quits browser
> 10. browser saves databases (5) to disk
> 
> 11. user launches browser
> 12. user visits some other web application
> 13. web application tries to create a database.
> 14. browser panics (defined below)
> 
> User is given a dialog that lists what?
> At best, what information is available to the user?
> A. The name of each web application that has a database
> B. The name of the database
> C. The date/time at which it was last used
> D. The date/time the application was last used with a network connection
>  ** The browser is probably unlikely to be able to distinguish a
> walled garden (technically term, definable on demand) network from a
> proper network (with internet/intranet access to the application)
> unless someone defines an API for a web application to be able to tell
> the web browser "Sorry, this network doesn't work for me".
> E. Size of the database (which will make virtually no sense to the user)
> F. Number of write transactions made against the database since the
> last online state (this will be useless because an application could
> either write often in small increments or rarely in a large chunk)
> 
> What options does the user have? Here are the ones I can think of:
> A. Delete a database
> B. Refuse to allow this new application database access
> Bi. Remember this decision
> Bii. Perhaps my browser should always ask the user about database
> access, as it might be better for an application not to offer the
> feature if the user is likely to need it for other things, even before
> the user reaches full state, otherwise they might accidentally delete
> something important.
> C. Try to backup a database to the external RS-MMC
> Ci. Is this backup or atomic export. I believe it should be atomic export.
> D. Import a database from a file (i.e., RS-MMC) - currently disabled,
> limit reached.
> 
> For fun, MMCs can easily be corrupted. So, in case people need a
> reminder about this....
> 
> E. Create database on current external RS-MMC (again, use case is 770
> which only supports RS, the n800 supports more flavors). Corrupting a
> removal RS-MMC is trivial, just remove it during a write :) -- note,
> this is corrupting the file system, I understand that a database
> engine can claim that its writes will never create a corrupt database.
> F. Let the user try to open one of these web applications and give the
> application an event "database is being deleted, please generate a
> serialized export". The web application could then generate a query,
> construct the actual useful part and trigger a file download that the
> user can then save. Or the application could inform the user that it's
> safe to delete the database, or give the user a list of relevant items
> that will be lost.

I can't really advise you from a UI point of view. I'm not sure what your 
questions imply regarding the spec, though.


On Tue, 16 Oct 2007, Scott Hess wrote:
> 
> I think that if the user agent did detect corruption and nuke the 
> database from orbit, then it would be reasonable for the user agent to 
> invalidate all outstanding database handles.  But that kind of thing 
> would seem to be something really beyond the spec to deal with.  It 
> seems like at that point the most appropriate action to take would be to 
> refresh the page and start from scratch, rather than expecting the app 
> to somehow handle the problem.

I agree.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 17 October 2007 18:58:21 UTC