Re: [IndexedDB] Should .add/.put/.update throw when called in read-only transaction? from ben turner on 2010-07-01 (public-webapps@w3.org from July to September 2010)

From: ben turner <bent@mozilla.com>
Date: Thu, 1 Jul 2010 13:38:09 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Jeremy Orlow <jorlow@chromium.org>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTil8CLiWhyS6WIF1uR6i4VrYulWxr-_fVQ8VWbdH@mail.gmail.com>
I would also point out that throwing exceptions at the call site makes
debugging much easier in my opinion. Our error events currently don't
include information like filename and line number where the failing
request was generated (though I think we should add that eventually).
Exceptions are much easier to track down, diagnose, and fix.

-Ben

On Thu, Jul 1, 2010 at 2:03 AM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Wed, Jun 30, 2010 at 8:16 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> I've thought about this more and have some additional doubts inline.
>> On Thu, Jul 1, 2010 at 11:55 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>
>>> On Wed, Jun 30, 2010 at 6:42 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>>> > On Thu, Jul 1, 2010 at 11:17 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> Currently the IndexedDB specification is silent on what should happen
>>> >> if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
>>> >> IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
>>> >> transaction. There are two possible ways we can handle this:
>>> >>
>>> >> 1. We can throw an exception.
>>> >> 2. We can return a IDBRequest object and asynchronously fire a 'error'
>>> >> event on this object.
>>> >>
>>> >> The advantage of 1 is that we pretty much know that this was an error
>>> >> due to a bug in the web page,
>>
>> I don't see why this is compelling.  Many of the other errors that you still
>> propose we fire via the callback are due to bugs in the page.
>>
>>>
>>> >> and we can always know this
>>> >> synchronously without having to consult the database.
>>
>> So?  Just because we can doesn't mean we should.
>>
>>>
>>> >> Throwing an
>>> >> error means that all the existing infrastructure for error handling
>>> >> with automatically kick in. For example any higher-level try/catch
>>> >> constructs will have an opportunity to catch the error.
>>> >> Implementations generally report uncaught exceptions to an error log..
>>> >> The browser will fire an 'error' event on the window which the page
>>> >> can use for further logging. Firing an error event on the other hand
>>> >> does not allow the browser to automatically log the error in a console
>>> >> as the page hasn't yet gotten a chance to handle it.
>>
>> Sure, but this doesn't help the majority of error conditions in IndexedDB.
>>  It also ignores the cost of handling errors in 2 different ways.
>
> I'm arguing that people generally won't want to check for all types of
> errors, such as writing in a READ_ONLY transaction. Compare to the
> Node.appendChild function which throws for a number of conditions, yet
> I've never seen anyone put a "try" around it and check for
> HIERARCHY_REQUEST_ERR.
>
> Additionally, anything you do can likely in theory throw exceptions.
> For example if you specify the wrong number of arguments or call a
> function that doesn't exist, this will result in an exception. However
> you rarely see people wrap "try" around calls to try to handle this
> type of bugs.
>
> Another example is the line:
>
> db.objectStore("myObjectStore").get(53)
>
> can result in errors both being thrown and being reported as an error
> event. If the objectStore "myObjectStore" doesn't exist, then the
> .objectStore() call will return null and the js engine will throw due
> to trying to call a function on a null value.
>
> Error checking is mostly useful when there are ways you can actually
> handle the error. I think generally if someone calls .put during a
> READ_ONLY transaction, they simply have a bug in their program and
> there is little they can do in terms of "handling" that, short of
> simply logging it.
>
>>> >> The advantage of 2 is that this is consistent with other error
>>> >> conditions, such as writing duplicate keys, disk errors during writing
>>> >> the database to disk, internal errors in the database, etc.
>>
>> The other problem is that users then need 2 sets of error handling routines
>> for each call.  Given how difficult it is to get web developers to do any
>> error checking, requiring 2 types of checks seems like a big downside.
>
> Actually, this is somewhat faulty math. With exceptions you can put a
> try/catch high up in a call stack to catch errors from multiple
> different locations. This will catch all errors thrown from all
> function calls inside the try. Compare that to error events, which has
> to be manually installed at every single request site. From this point
> of view, the more errors we move to being reported as exceptions, the
> more easily we're making it for developers to catch more errors.
>
> Javascript has a error reporting mechanism, exceptions. We should try
> to make good use of it.
>
>>> >> While consistency, and only needing to check for errors one way, is
>>> >> certainly good arguments, I would argue that people won't need to
>>> >> check for calling-add-on-read-only-transactions. For properly written
>>> >> code it's not an error that will occur, and thus there is no need to
>>> >> check for it. In fact, you probably are generally better off letting
>>> >> the exception bubble all the way up and get logged or caught by
>>> >> generic error handlers.
>>
>> These are awfully bold assumptions.  Simply not catching the error is not an
>> option for many web applications or libraries.  At very least, they'll need
>> to add finally statements to handle such cases.
>
> How would one handle the case of .put being called on a READ_ONLY transaction?
>
>>> >> Additionally, the structured clone algorithm, which defines that an
>>> >> exception should synchronously be thrown if the object is malformed,
>>> >> for example if it consists of a cyclic graph. So .add/.put/.update can
>>> >> already throw under certain circumstances.
>>> >>
>>> >> Also compare to if we were using a different API strategy of making
>>> >> objectStores and cursors returned from READ_ONLY transactions not have
>>> >> mutating functions. In this case if someone tried to call .put(), that
>>> >> also would result in a exception from the JS interpreter stating that
>>> >> you're calling a function that doesn't exist.
>>> >>
>>> >> So I would argue that we should throw for at least all transaction
>>> >> violations. I.e. whenever you try to perform an action not allowed by
>>> >> the current transaction. This would also cover the case of calling
>>> >> createObjectStore/removeObjectStore/createIndex/removeIndex during a
>>> >> non-setVersion-transaction.
>>> >>
>>> >>
>>> >> There is also another case where synchronously know that an error will
>>> >> be reported. We could throw when IDBCursor.update() is called when the
>>> >> underlying object store uses in-line keys and the property at the key
>>> >> path does not match the key in this cursor's position. In this case we
>>> >> similarly immediately know that there is an error without having to
>>> >> consult the database. We also generally can be sure that there is a
>>> >> bug in the web page which would benefit from being reported like other
>>> >> bugs are.
>>> >>
>>> >> And like stated above, IDBCursor.update() can already throw if the
>>> >> passed in object can't be structurally cloned.
>>> >>
>>> >>
>>> >> Jeremy previously asked if there was a test we could use to
>>> >> clearly/intuitively break error conditions into two groups. Ones that
>>> >> cause exceptions to be thrown, and ones that cause error events to be
>>> >> fired. I would say that errors that do not depend on what data is in
>>> >> the database, but rather are clearly due to errors at the call site
>>> >> should throw an exception.
>>> >
>>> > This would limit us in the future in terms of schema changes.  The
>>> > current
>>> > async interface differs starting the transaction until the first call
>>> > that
>>> > accesses/modifies data (which are all async).  If we ever allow a schema
>>> > change to happen without disconnecting all clients, it'd be possible
>>> > that
>>> > the objectStore could be deleted between when the call is made and when
>>> > the
>>> > transaction is actually allowed to start.
>>>
>>> I'm not quite following here. Even if we in the future allow
>>> objectStores to be deleted while there are transactions open against
>>> it, then .add/.put would still know if we're inside a READ_ONLY or
>>> READ_WRITE transaction, no? And so could still throw an error if we're
>>> in a READ_ONLY transaction.
>>>
>>> By the test defined above, .put would in that situation have to fire
>>> an error event, rather than throw, if properly called on an READ_WRITE
>>> transaction, but where the objectStore had been deleted. This because
>>> we would have to check with the database if the objectStore was
>>> deleted and thus would fail the "do not depend on what data is in the
>>> database" check.
>>
>> I don't see how the existence of an objectStore can be considered "data".
>>  Sure an objectStore _contains_ data and that data would no longer be
>> accessible, but I don't see how the existence can be considered anything
>> other than meta data.
>
> It's a matter of definition I guess. So change the rule to be:
>
> I would say that errors that do not depend on what information is in
> the database, but rather are clearly due to errors at the call site
> should throw an exception.
>
> If we're changing the APIs to allow for objectStores to be removed at
> any time by another process, then the information of which
> objectStores actually exist would only live in the database. (Which is
> why I think that such an API would be a bad idea, but I realize of
> course that that isn't your point and that this is just a hypothetical
> example).
>
>>> > This also will limit what can be done on a background thread.  For
>>> > example,
>>> > an implementation couldn't do serialization of the object on a
>>> > background
>>> > thread (yes, if you did this, you'd need to make sure the main thread
>>> > didn't
>>> > modify it until it finished serializing).
>>>
>>> For what it's worth, HTML5 already defines that the structured clone
>>> algorithm throws an exception, so that's not really something
>>> introduced by me in this thread.
>>
>> Is there any technical reason this has to throw rather than raising the
>> error through the callback?  I don't see one.  We already made the change to
>> WebKit so that we can opt out of throwing on a serialization error.
>
> Both Worker.postMessage and History.pushState are defined to throw
> exceptions when the structured clone algorithm fails. Despite both
> likely communicating with other threads or processes and thus could
> have benefitted from a theoretical off-main-thread serialization
> implementation. I have not heard of anyone complaining that this is a
> problem so far.
>
> Another reason to throw for an invalid object is that storing an
> invalid value is something I could actually imagine being possible to
> handle. For example code like:
>
> try {
>  trans.objectStore("myStore").put(object);
> }
> catch (ex) {
>  var backup = object.extraInfo;
>  object.extraInfo = null;
>  try {
>    trans.objectStore("myStore").put(object);
>  }
>  finally {
>    object.extraInfo = backup;
>  }
> }
>
> This would be much harder to implement using error events, while
> keeping semantics exactly the same. For example you'd have to hope
> that 'object' doesn't change between the initial call to put and the
> error event firing.
>
>>> I also think that the problem you
>>> describe in parenthesis effectively prevents you from doing background
>>> serialization, at least without first copying so much data that you
>>> could also perform the constraints checks at the same time.
>>
>> Some implementations might be able to copy on write efficiently.  You could
>> also put a lock around the data, lock it before main thread execution begins
>> again, and have the main thread take the lock before modifying the data.
>> This was also just one example of processing that could be done in the
>> background.  And even if none of them are very compelling, keep in mind that
>> we're backing ourselves into a corner in terms of what we can add in the
>> future (without breaking this heuristic).
>
> Both these sound highly theoretical. I have not heard of any efficient
> scripting engine implementing either copy-on-write or
> grab-lock-on-write. It seems like a lot of overhead to for each write
> operation check if you need to execute copy or lock semantics. I'd
> like to hear from a JS engine implementation that this is something
> that is practically feasible before worrying too much about it.
>
> Also note that you'll still have to traverse the whole graph reachable
> from the passed in object in order to mark that these objects should
> have copy-on-write or grab-lock-on-write semantics. So while doing
> that, it's fairly little overhead to also verify that the object can
> be structurally cloned without spending the time actually serializing
> it.
>
> Yes, we're limiting ourselves a little bit. However it's a limitation
> we've accepted in other APIs (Worker.postMessage, History.pushState),
> and so far this hasn't seemed to be a problem.
>
> Additionally, for each API we create, we're always adding limitations
> for what we can do in the future. For example by defining that all
> objectStores can be iterated and have a defined order, we're
> preventing them from being implemented using hash tables. Of course we
> should be careful, but so far lazy cloning seems pretty theoretical.
>
>>> > Because of these reasons, I'm not too excited about this particular
>>> > heuristic for when to throw vs fire an error callback.  I've thought
>>> > about
>>> > it a bit and can't think of anything better though, unfortunately.
>>> > I think I'm still slightly in favor of routing all errors through
>>> > onerror
>>> > callbacks and never throwing from a function that returns an IDBResult,
>>> > but
>>> > I think there were some good points brought up by Jonas for why throwing
>>> > on
>>> > some errors would make sense.
>>>
>>> I think HTML5 already forces us to make .put/.add/.update throw in
>>> certain circumstances. And I think the benefits that come with
>>> exception handling outweigh the theoretical possibility of performing
>>> the structured clone in a background thread.
>>
>> As I mentioned earlier in the thread, these were examples of small problems
>> in the current spec that I expect will really come back to bite us in the
>> future as we try to add more complex features.
>
> Other than the structured clone example, can you point to any other
> way that this could really bite us?
>
>> And I'm not convinced HTML5 forces us to do anything.
>
> I agree that that's true.
>
> / Jonas
>
>
Received on Friday, 2 July 2010 09:12:45 UTC