Re: [IndexedDB] Should .add/.put/.update throw when called in read-only transaction? from Jeremy Orlow on 2010-07-02 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Fri, 2 Jul 2010 17:56:43 +1000
To: ben turner <bent@mozilla.com>
Cc: Jonas Sicking <jonas@sicking.cc>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTinxq4qX6q7uAzAlPmgaV6ILXLYu8hWlbXz7bR2t@mail.gmail.com>
Ok.

Want to write up a proposal for how to implement this (presumably in a bug)?

J

On Fri, Jul 2, 2010 at 6:38 AM, ben turner <bent@mozilla.com> wrote:

> I would also point out that throwing exceptions at the call site makes
> debugging much easier in my opinion. Our error events currently don't
> include information like filename and line number where the failing
> request was generated (though I think we should add that eventually).
> Exceptions are much easier to track down, diagnose, and fix.
>
> -Ben
>
> On Thu, Jul 1, 2010 at 2:03 AM, Jonas Sicking <jonas@sicking.cc> wrote:
> > On Wed, Jun 30, 2010 at 8:16 PM, Jeremy Orlow <jorlow@chromium.org>
> wrote:
> >> I've thought about this more and have some additional doubts inline.
> >> On Thu, Jul 1, 2010 at 11:55 AM, Jonas Sicking <jonas@sicking.cc>
> wrote:
> >>>
> >>> On Wed, Jun 30, 2010 at 6:42 PM, Jeremy Orlow <jorlow@chromium.org>
> wrote:
> >>> > On Thu, Jul 1, 2010 at 11:17 AM, Jonas Sicking <jonas@sicking.cc>
> wrote:
> >>> >>
> >>> >> Hi All,
> >>> >>
> >>> >> Currently the IndexedDB specification is silent on what should
> happen
> >>> >> if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
> >>> >> IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
> >>> >> transaction. There are two possible ways we can handle this:
> >>> >>
> >>> >> 1. We can throw an exception.
> >>> >> 2. We can return a IDBRequest object and asynchronously fire a
> 'error'
> >>> >> event on this object.
> >>> >>
> >>> >> The advantage of 1 is that we pretty much know that this was an
> error
> >>> >> due to a bug in the web page,
> >>
> >> I don't see why this is compelling.  Many of the other errors that you
> still
> >> propose we fire via the callback are due to bugs in the page.
> >>
> >>>
> >>> >> and we can always know this
> >>> >> synchronously without having to consult the database.
> >>
> >> So?  Just because we can doesn't mean we should.
> >>
> >>>
> >>> >> Throwing an
> >>> >> error means that all the existing infrastructure for error handling
> >>> >> with automatically kick in. For example any higher-level try/catch
> >>> >> constructs will have an opportunity to catch the error.
> >>> >> Implementations generally report uncaught exceptions to an error
> log.
> >>> >> The browser will fire an 'error' event on the window which the page
> >>> >> can use for further logging. Firing an error event on the other hand
> >>> >> does not allow the browser to automatically log the error in a
> console
> >>> >> as the page hasn't yet gotten a chance to handle it.
> >>
> >> Sure, but this doesn't help the majority of error conditions in
> IndexedDB.
> >>  It also ignores the cost of handling errors in 2 different ways.
> >
> > I'm arguing that people generally won't want to check for all types of
> > errors, such as writing in a READ_ONLY transaction. Compare to the
> > Node.appendChild function which throws for a number of conditions, yet
> > I've never seen anyone put a "try" around it and check for
> > HIERARCHY_REQUEST_ERR.
> >
> > Additionally, anything you do can likely in theory throw exceptions.
> > For example if you specify the wrong number of arguments or call a
> > function that doesn't exist, this will result in an exception. However
> > you rarely see people wrap "try" around calls to try to handle this
> > type of bugs.
> >
> > Another example is the line:
> >
> > db.objectStore("myObjectStore").get(53)
> >
> > can result in errors both being thrown and being reported as an error
> > event. If the objectStore "myObjectStore" doesn't exist, then the
> > .objectStore() call will return null and the js engine will throw due
> > to trying to call a function on a null value.
> >
> > Error checking is mostly useful when there are ways you can actually
> > handle the error. I think generally if someone calls .put during a
> > READ_ONLY transaction, they simply have a bug in their program and
> > there is little they can do in terms of "handling" that, short of
> > simply logging it.
> >
> >>> >> The advantage of 2 is that this is consistent with other error
> >>> >> conditions, such as writing duplicate keys, disk errors during
> writing
> >>> >> the database to disk, internal errors in the database, etc.
> >>
> >> The other problem is that users then need 2 sets of error handling
> routines
> >> for each call.  Given how difficult it is to get web developers to do
> any
> >> error checking, requiring 2 types of checks seems like a big downside.
> >
> > Actually, this is somewhat faulty math. With exceptions you can put a
> > try/catch high up in a call stack to catch errors from multiple
> > different locations. This will catch all errors thrown from all
> > function calls inside the try. Compare that to error events, which has
> > to be manually installed at every single request site. From this point
> > of view, the more errors we move to being reported as exceptions, the
> > more easily we're making it for developers to catch more errors.
> >
> > Javascript has a error reporting mechanism, exceptions. We should try
> > to make good use of it.
> >
> >>> >> While consistency, and only needing to check for errors one way, is
> >>> >> certainly good arguments, I would argue that people won't need to
> >>> >> check for calling-add-on-read-only-transactions. For properly
> written
> >>> >> code it's not an error that will occur, and thus there is no need to
> >>> >> check for it. In fact, you probably are generally better off letting
> >>> >> the exception bubble all the way up and get logged or caught by
> >>> >> generic error handlers.
> >>
> >> These are awfully bold assumptions.  Simply not catching the error is
> not an
> >> option for many web applications or libraries.  At very least, they'll
> need
> >> to add finally statements to handle such cases.
> >
> > How would one handle the case of .put being called on a READ_ONLY
> transaction?
> >
> >>> >> Additionally, the structured clone algorithm, which defines that an
> >>> >> exception should synchronously be thrown if the object is malformed,
> >>> >> for example if it consists of a cyclic graph. So .add/.put/.update
> can
> >>> >> already throw under certain circumstances.
> >>> >>
> >>> >> Also compare to if we were using a different API strategy of making
> >>> >> objectStores and cursors returned from READ_ONLY transactions not
> have
> >>> >> mutating functions. In this case if someone tried to call .put(),
> that
> >>> >> also would result in a exception from the JS interpreter stating
> that
> >>> >> you're calling a function that doesn't exist.
> >>> >>
> >>> >> So I would argue that we should throw for at least all transaction
> >>> >> violations. I.e. whenever you try to perform an action not allowed
> by
> >>> >> the current transaction. This would also cover the case of calling
> >>> >> createObjectStore/removeObjectStore/createIndex/removeIndex during a
> >>> >> non-setVersion-transaction.
> >>> >>
> >>> >>
> >>> >> There is also another case where synchronously know that an error
> will
> >>> >> be reported. We could throw when IDBCursor.update() is called when
> the
> >>> >> underlying object store uses in-line keys and the property at the
> key
> >>> >> path does not match the key in this cursor's position. In this case
> we
> >>> >> similarly immediately know that there is an error without having to
> >>> >> consult the database. We also generally can be sure that there is a
> >>> >> bug in the web page which would benefit from being reported like
> other
> >>> >> bugs are.
> >>> >>
> >>> >> And like stated above, IDBCursor.update() can already throw if the
> >>> >> passed in object can't be structurally cloned.
> >>> >>
> >>> >>
> >>> >> Jeremy previously asked if there was a test we could use to
> >>> >> clearly/intuitively break error conditions into two groups. Ones
> that
> >>> >> cause exceptions to be thrown, and ones that cause error events to
> be
> >>> >> fired. I would say that errors that do not depend on what data is in
> >>> >> the database, but rather are clearly due to errors at the call site
> >>> >> should throw an exception.
> >>> >
> >>> > This would limit us in the future in terms of schema changes.  The
> >>> > current
> >>> > async interface differs starting the transaction until the first call
> >>> > that
> >>> > accesses/modifies data (which are all async).  If we ever allow a
> schema
> >>> > change to happen without disconnecting all clients, it'd be possible
> >>> > that
> >>> > the objectStore could be deleted between when the call is made and
> when
> >>> > the
> >>> > transaction is actually allowed to start.
> >>>
> >>> I'm not quite following here. Even if we in the future allow
> >>> objectStores to be deleted while there are transactions open against
> >>> it, then .add/.put would still know if we're inside a READ_ONLY or
> >>> READ_WRITE transaction, no? And so could still throw an error if we're
> >>> in a READ_ONLY transaction.
> >>>
> >>> By the test defined above, .put would in that situation have to fire
> >>> an error event, rather than throw, if properly called on an READ_WRITE
> >>> transaction, but where the objectStore had been deleted. This because
> >>> we would have to check with the database if the objectStore was
> >>> deleted and thus would fail the "do not depend on what data is in the
> >>> database" check.
> >>
> >> I don't see how the existence of an objectStore can be considered
> "data".
> >>  Sure an objectStore _contains_ data and that data would no longer be
> >> accessible, but I don't see how the existence can be considered anything
> >> other than meta data.
> >
> > It's a matter of definition I guess. So change the rule to be:
> >
> > I would say that errors that do not depend on what information is in
> > the database, but rather are clearly due to errors at the call site
> > should throw an exception.
> >
> > If we're changing the APIs to allow for objectStores to be removed at
> > any time by another process, then the information of which
> > objectStores actually exist would only live in the database. (Which is
> > why I think that such an API would be a bad idea, but I realize of
> > course that that isn't your point and that this is just a hypothetical
> > example).
> >
> >>> > This also will limit what can be done on a background thread.  For
> >>> > example,
> >>> > an implementation couldn't do serialization of the object on a
> >>> > background
> >>> > thread (yes, if you did this, you'd need to make sure the main thread
> >>> > didn't
> >>> > modify it until it finished serializing).
> >>>
> >>> For what it's worth, HTML5 already defines that the structured clone
> >>> algorithm throws an exception, so that's not really something
> >>> introduced by me in this thread.
> >>
> >> Is there any technical reason this has to throw rather than raising the
> >> error through the callback?  I don't see one.  We already made the
> change to
> >> WebKit so that we can opt out of throwing on a serialization error.
> >
> > Both Worker.postMessage and History.pushState are defined to throw
> > exceptions when the structured clone algorithm fails. Despite both
> > likely communicating with other threads or processes and thus could
> > have benefitted from a theoretical off-main-thread serialization
> > implementation. I have not heard of anyone complaining that this is a
> > problem so far.
> >
> > Another reason to throw for an invalid object is that storing an
> > invalid value is something I could actually imagine being possible to
> > handle. For example code like:
> >
> > try {
> >  trans.objectStore("myStore").put(object);
> > }
> > catch (ex) {
> >  var backup = object.extraInfo;
> >  object.extraInfo = null;
> >  try {
> >    trans.objectStore("myStore").put(object);
> >  }
> >  finally {
> >    object.extraInfo = backup;
> >  }
> > }
> >
> > This would be much harder to implement using error events, while
> > keeping semantics exactly the same. For example you'd have to hope
> > that 'object' doesn't change between the initial call to put and the
> > error event firing.
> >
> >>> I also think that the problem you
> >>> describe in parenthesis effectively prevents you from doing background
> >>> serialization, at least without first copying so much data that you
> >>> could also perform the constraints checks at the same time.
> >>
> >> Some implementations might be able to copy on write efficiently.  You
> could
> >> also put a lock around the data, lock it before main thread execution
> begins
> >> again, and have the main thread take the lock before modifying the data.
> >> This was also just one example of processing that could be done in the
> >> background.  And even if none of them are very compelling, keep in mind
> that
> >> we're backing ourselves into a corner in terms of what we can add in the
> >> future (without breaking this heuristic).
> >
> > Both these sound highly theoretical. I have not heard of any efficient
> > scripting engine implementing either copy-on-write or
> > grab-lock-on-write. It seems like a lot of overhead to for each write
> > operation check if you need to execute copy or lock semantics. I'd
> > like to hear from a JS engine implementation that this is something
> > that is practically feasible before worrying too much about it.
> >
> > Also note that you'll still have to traverse the whole graph reachable
> > from the passed in object in order to mark that these objects should
> > have copy-on-write or grab-lock-on-write semantics. So while doing
> > that, it's fairly little overhead to also verify that the object can
> > be structurally cloned without spending the time actually serializing
> > it.
> >
> > Yes, we're limiting ourselves a little bit. However it's a limitation
> > we've accepted in other APIs (Worker.postMessage, History.pushState),
> > and so far this hasn't seemed to be a problem.
> >
> > Additionally, for each API we create, we're always adding limitations
> > for what we can do in the future. For example by defining that all
> > objectStores can be iterated and have a defined order, we're
> > preventing them from being implemented using hash tables. Of course we
> > should be careful, but so far lazy cloning seems pretty theoretical.
> >
> >>> > Because of these reasons, I'm not too excited about this particular
> >>> > heuristic for when to throw vs fire an error callback.  I've thought
> >>> > about
> >>> > it a bit and can't think of anything better though, unfortunately.
> >>> > I think I'm still slightly in favor of routing all errors through
> >>> > onerror
> >>> > callbacks and never throwing from a function that returns an
> IDBResult,
> >>> > but
> >>> > I think there were some good points brought up by Jonas for why
> throwing
> >>> > on
> >>> > some errors would make sense.
> >>>
> >>> I think HTML5 already forces us to make .put/.add/.update throw in
> >>> certain circumstances. And I think the benefits that come with
> >>> exception handling outweigh the theoretical possibility of performing
> >>> the structured clone in a background thread.
> >>
> >> As I mentioned earlier in the thread, these were examples of small
> problems
> >> in the current spec that I expect will really come back to bite us in
> the
> >> future as we try to add more complex features.
> >
> > Other than the structured clone example, can you point to any other
> > way that this could really bite us?
> >
> >> And I'm not convinced HTML5 forces us to do anything.
> >
> > I agree that that's true.
> >
> > / Jonas
> >
> >
>
Received on Friday, 2 July 2010 07:57:35 UTC