Re: [IndexedDB] Should .add/.put/.update throw when called in read-only transaction? from Jonas Sicking on 2010-07-02 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 2 Jul 2010 02:43:16 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: ben turner <bent@mozilla.com>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTiltzREuQvTeFySsDUJeiUl0X0L8CDuAsSH5HWVT@mail.gmail.com>
Filed http://www.w3.org/Bugs/Public/show_bug.cgi?id=10064 if you want
help with the editing, or this is unclear, let me know.

/ Jonas

On Fri, Jul 2, 2010 at 12:56 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> Ok.
> Want to write up a proposal for how to implement this (presumably in a bug)?
> J
> On Fri, Jul 2, 2010 at 6:38 AM, ben turner <bent@mozilla.com> wrote:
>>
>> I would also point out that throwing exceptions at the call site makes
>> debugging much easier in my opinion. Our error events currently don't
>> include information like filename and line number where the failing
>> request was generated (though I think we should add that eventually).
>> Exceptions are much easier to track down, diagnose, and fix.
>>
>> -Ben
>>
>> On Thu, Jul 1, 2010 at 2:03 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> > On Wed, Jun 30, 2010 at 8:16 PM, Jeremy Orlow <jorlow@chromium.org>
>> > wrote:
>> >> I've thought about this more and have some additional doubts inline.
>> >> On Thu, Jul 1, 2010 at 11:55 AM, Jonas Sicking <jonas@sicking.cc>
>> >> wrote:
>> >>>
>> >>> On Wed, Jun 30, 2010 at 6:42 PM, Jeremy Orlow <jorlow@chromium.org>
>> >>> wrote:
>> >>> > On Thu, Jul 1, 2010 at 11:17 AM, Jonas Sicking <jonas@sicking.cc>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi All,
>> >>> >>
>> >>> >> Currently the IndexedDB specification is silent on what should
>> >>> >> happen
>> >>> >> if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
>> >>> >> IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
>> >>> >> transaction. There are two possible ways we can handle this:
>> >>> >>
>> >>> >> 1. We can throw an exception.
>> >>> >> 2. We can return a IDBRequest object and asynchronously fire a
>> >>> >> 'error'
>> >>> >> event on this object.
>> >>> >>
>> >>> >> The advantage of 1 is that we pretty much know that this was an
>> >>> >> error
>> >>> >> due to a bug in the web page,
>> >>
>> >> I don't see why this is compelling.  Many of the other errors that you
>> >> still
>> >> propose we fire via the callback are due to bugs in the page.
>> >>
>> >>>
>> >>> >> and we can always know this
>> >>> >> synchronously without having to consult the database.
>> >>
>> >> So?  Just because we can doesn't mean we should.
>> >>
>> >>>
>> >>> >> Throwing an
>> >>> >> error means that all the existing infrastructure for error handling
>> >>> >> with automatically kick in. For example any higher-level try/catch
>> >>> >> constructs will have an opportunity to catch the error.
>> >>> >> Implementations generally report uncaught exceptions to an error
>> >>> >> log.
>> >>> >> The browser will fire an 'error' event on the window which the page
>> >>> >> can use for further logging. Firing an error event on the other
>> >>> >> hand
>> >>> >> does not allow the browser to automatically log the error in a
>> >>> >> console
>> >>> >> as the page hasn't yet gotten a chance to handle it.
>> >>
>> >> Sure, but this doesn't help the majority of error conditions in
>> >> IndexedDB.
>> >>  It also ignores the cost of handling errors in 2 different ways.
>> >
>> > I'm arguing that people generally won't want to check for all types of
>> > errors, such as writing in a READ_ONLY transaction. Compare to the
>> > Node.appendChild function which throws for a number of conditions, yet
>> > I've never seen anyone put a "try" around it and check for
>> > HIERARCHY_REQUEST_ERR.
>> >
>> > Additionally, anything you do can likely in theory throw exceptions.
>> > For example if you specify the wrong number of arguments or call a
>> > function that doesn't exist, this will result in an exception. However
>> > you rarely see people wrap "try" around calls to try to handle this
>> > type of bugs.
>> >
>> > Another example is the line:
>> >
>> > db.objectStore("myObjectStore").get(53)
>> >
>> > can result in errors both being thrown and being reported as an error
>> > event. If the objectStore "myObjectStore" doesn't exist, then the
>> > .objectStore() call will return null and the js engine will throw due
>> > to trying to call a function on a null value.
>> >
>> > Error checking is mostly useful when there are ways you can actually
>> > handle the error. I think generally if someone calls .put during a
>> > READ_ONLY transaction, they simply have a bug in their program and
>> > there is little they can do in terms of "handling" that, short of
>> > simply logging it.
>> >
>> >>> >> The advantage of 2 is that this is consistent with other error
>> >>> >> conditions, such as writing duplicate keys, disk errors during
>> >>> >> writing
>> >>> >> the database to disk, internal errors in the database, etc.
>> >>
>> >> The other problem is that users then need 2 sets of error handling
>> >> routines
>> >> for each call.  Given how difficult it is to get web developers to do
>> >> any
>> >> error checking, requiring 2 types of checks seems like a big downside.
>> >
>> > Actually, this is somewhat faulty math. With exceptions you can put a
>> > try/catch high up in a call stack to catch errors from multiple
>> > different locations. This will catch all errors thrown from all
>> > function calls inside the try. Compare that to error events, which has
>> > to be manually installed at every single request site. From this point
>> > of view, the more errors we move to being reported as exceptions, the
>> > more easily we're making it for developers to catch more errors.
>> >
>> > Javascript has a error reporting mechanism, exceptions. We should try
>> > to make good use of it.
>> >
>> >>> >> While consistency, and only needing to check for errors one way, is
>> >>> >> certainly good arguments, I would argue that people won't need to
>> >>> >> check for calling-add-on-read-only-transactions. For properly
>> >>> >> written
>> >>> >> code it's not an error that will occur, and thus there is no need
>> >>> >> to
>> >>> >> check for it. In fact, you probably are generally better off
>> >>> >> letting
>> >>> >> the exception bubble all the way up and get logged or caught by
>> >>> >> generic error handlers.
>> >>
>> >> These are awfully bold assumptions.  Simply not catching the error is
>> >> not an
>> >> option for many web applications or libraries.  At very least, they'll
>> >> need
>> >> to add finally statements to handle such cases.
>> >
>> > How would one handle the case of .put being called on a READ_ONLY
>> > transaction?
>> >
>> >>> >> Additionally, the structured clone algorithm, which defines that an
>> >>> >> exception should synchronously be thrown if the object is
>> >>> >> malformed,
>> >>> >> for example if it consists of a cyclic graph. So .add/.put/.update
>> >>> >> can
>> >>> >> already throw under certain circumstances.
>> >>> >>
>> >>> >> Also compare to if we were using a different API strategy of making
>> >>> >> objectStores and cursors returned from READ_ONLY transactions not
>> >>> >> have
>> >>> >> mutating functions. In this case if someone tried to call .put(),
>> >>> >> that
>> >>> >> also would result in a exception from the JS interpreter stating
>> >>> >> that
>> >>> >> you're calling a function that doesn't exist.
>> >>> >>
>> >>> >> So I would argue that we should throw for at least all transaction
>> >>> >> violations. I.e. whenever you try to perform an action not allowed
>> >>> >> by
>> >>> >> the current transaction. This would also cover the case of calling
>> >>> >> createObjectStore/removeObjectStore/createIndex/removeIndex during
>> >>> >> a
>> >>> >> non-setVersion-transaction.
>> >>> >>
>> >>> >>
>> >>> >> There is also another case where synchronously know that an error
>> >>> >> will
>> >>> >> be reported. We could throw when IDBCursor.update() is called when
>> >>> >> the
>> >>> >> underlying object store uses in-line keys and the property at the
>> >>> >> key
>> >>> >> path does not match the key in this cursor's position. In this case
>> >>> >> we
>> >>> >> similarly immediately know that there is an error without having to
>> >>> >> consult the database. We also generally can be sure that there is a
>> >>> >> bug in the web page which would benefit from being reported like
>> >>> >> other
>> >>> >> bugs are.
>> >>> >>
>> >>> >> And like stated above, IDBCursor.update() can already throw if the
>> >>> >> passed in object can't be structurally cloned.
>> >>> >>
>> >>> >>
>> >>> >> Jeremy previously asked if there was a test we could use to
>> >>> >> clearly/intuitively break error conditions into two groups. Ones
>> >>> >> that
>> >>> >> cause exceptions to be thrown, and ones that cause error events to
>> >>> >> be
>> >>> >> fired. I would say that errors that do not depend on what data is
>> >>> >> in
>> >>> >> the database, but rather are clearly due to errors at the call site
>> >>> >> should throw an exception.
>> >>> >
>> >>> > This would limit us in the future in terms of schema changes.  The
>> >>> > current
>> >>> > async interface differs starting the transaction until the first
>> >>> > call
>> >>> > that
>> >>> > accesses/modifies data (which are all async).  If we ever allow a
>> >>> > schema
>> >>> > change to happen without disconnecting all clients, it'd be possible
>> >>> > that
>> >>> > the objectStore could be deleted between when the call is made and
>> >>> > when
>> >>> > the
>> >>> > transaction is actually allowed to start.
>> >>>
>> >>> I'm not quite following here. Even if we in the future allow
>> >>> objectStores to be deleted while there are transactions open against
>> >>> it, then .add/.put would still know if we're inside a READ_ONLY or
>> >>> READ_WRITE transaction, no? And so could still throw an error if we're
>> >>> in a READ_ONLY transaction.
>> >>>
>> >>> By the test defined above, .put would in that situation have to fire
>> >>> an error event, rather than throw, if properly called on an READ_WRITE
>> >>> transaction, but where the objectStore had been deleted. This because
>> >>> we would have to check with the database if the objectStore was
>> >>> deleted and thus would fail the "do not depend on what data is in the
>> >>> database" check.
>> >>
>> >> I don't see how the existence of an objectStore can be considered
>> >> "data".
>> >>  Sure an objectStore _contains_ data and that data would no longer be
>> >> accessible, but I don't see how the existence can be considered
>> >> anything
>> >> other than meta data.
>> >
>> > It's a matter of definition I guess. So change the rule to be:
>> >
>> > I would say that errors that do not depend on what information is in
>> > the database, but rather are clearly due to errors at the call site
>> > should throw an exception.
>> >
>> > If we're changing the APIs to allow for objectStores to be removed at
>> > any time by another process, then the information of which
>> > objectStores actually exist would only live in the database. (Which is
>> > why I think that such an API would be a bad idea, but I realize of
>> > course that that isn't your point and that this is just a hypothetical
>> > example).
>> >
>> >>> > This also will limit what can be done on a background thread.  For
>> >>> > example,
>> >>> > an implementation couldn't do serialization of the object on a
>> >>> > background
>> >>> > thread (yes, if you did this, you'd need to make sure the main
>> >>> > thread
>> >>> > didn't
>> >>> > modify it until it finished serializing).
>> >>>
>> >>> For what it's worth, HTML5 already defines that the structured clone
>> >>> algorithm throws an exception, so that's not really something
>> >>> introduced by me in this thread.
>> >>
>> >> Is there any technical reason this has to throw rather than raising the
>> >> error through the callback?  I don't see one.  We already made the
>> >> change to
>> >> WebKit so that we can opt out of throwing on a serialization error.
>> >
>> > Both Worker.postMessage and History.pushState are defined to throw
>> > exceptions when the structured clone algorithm fails. Despite both
>> > likely communicating with other threads or processes and thus could
>> > have benefitted from a theoretical off-main-thread serialization
>> > implementation. I have not heard of anyone complaining that this is a
>> > problem so far.
>> >
>> > Another reason to throw for an invalid object is that storing an
>> > invalid value is something I could actually imagine being possible to
>> > handle. For example code like:
>> >
>> > try {
>> >  trans.objectStore("myStore").put(object);
>> > }
>> > catch (ex) {
>> >  var backup = object.extraInfo;
>> >  object.extraInfo = null;
>> >  try {
>> >    trans.objectStore("myStore").put(object);
>> >  }
>> >  finally {
>> >    object.extraInfo = backup;
>> >  }
>> > }
>> >
>> > This would be much harder to implement using error events, while
>> > keeping semantics exactly the same. For example you'd have to hope
>> > that 'object' doesn't change between the initial call to put and the
>> > error event firing.
>> >
>> >>> I also think that the problem you
>> >>> describe in parenthesis effectively prevents you from doing background
>> >>> serialization, at least without first copying so much data that you
>> >>> could also perform the constraints checks at the same time.
>> >>
>> >> Some implementations might be able to copy on write efficiently.  You
>> >> could
>> >> also put a lock around the data, lock it before main thread execution
>> >> begins
>> >> again, and have the main thread take the lock before modifying the
>> >> data.
>> >> This was also just one example of processing that could be done in the
>> >> background.  And even if none of them are very compelling, keep in mind
>> >> that
>> >> we're backing ourselves into a corner in terms of what we can add in
>> >> the
>> >> future (without breaking this heuristic).
>> >
>> > Both these sound highly theoretical. I have not heard of any efficient
>> > scripting engine implementing either copy-on-write or
>> > grab-lock-on-write. It seems like a lot of overhead to for each write
>> > operation check if you need to execute copy or lock semantics. I'd
>> > like to hear from a JS engine implementation that this is something
>> > that is practically feasible before worrying too much about it.
>> >
>> > Also note that you'll still have to traverse the whole graph reachable
>> > from the passed in object in order to mark that these objects should
>> > have copy-on-write or grab-lock-on-write semantics. So while doing
>> > that, it's fairly little overhead to also verify that the object can
>> > be structurally cloned without spending the time actually serializing
>> > it.
>> >
>> > Yes, we're limiting ourselves a little bit. However it's a limitation
>> > we've accepted in other APIs (Worker.postMessage, History.pushState),
>> > and so far this hasn't seemed to be a problem.
>> >
>> > Additionally, for each API we create, we're always adding limitations
>> > for what we can do in the future. For example by defining that all
>> > objectStores can be iterated and have a defined order, we're
>> > preventing them from being implemented using hash tables. Of course we
>> > should be careful, but so far lazy cloning seems pretty theoretical.
>> >
>> >>> > Because of these reasons, I'm not too excited about this particular
>> >>> > heuristic for when to throw vs fire an error callback.  I've thought
>> >>> > about
>> >>> > it a bit and can't think of anything better though, unfortunately.
>> >>> > I think I'm still slightly in favor of routing all errors through
>> >>> > onerror
>> >>> > callbacks and never throwing from a function that returns an
>> >>> > IDBResult,
>> >>> > but
>> >>> > I think there were some good points brought up by Jonas for why
>> >>> > throwing
>> >>> > on
>> >>> > some errors would make sense.
>> >>>
>> >>> I think HTML5 already forces us to make .put/.add/.update throw in
>> >>> certain circumstances. And I think the benefits that come with
>> >>> exception handling outweigh the theoretical possibility of performing
>> >>> the structured clone in a background thread.
>> >>
>> >> As I mentioned earlier in the thread, these were examples of small
>> >> problems
>> >> in the current spec that I expect will really come back to bite us in
>> >> the
>> >> future as we try to add more complex features.
>> >
>> > Other than the structured clone example, can you point to any other
>> > way that this could really bite us?
>> >
>> >> And I'm not convinced HTML5 forces us to do anything.
>> >
>> > I agree that that's true.
>> >
>> > / Jonas
>> >
>> >
>
>
Received on Friday, 2 July 2010 09:44:26 UTC