Re: IndexedDB, what were the issues? How do we stop it from happening again? from Alec Flett on 2013-03-06 (public-webapps@w3.org from January to March 2013)

From: Alec Flett <alecflett@chromium.org>
Date: Wed, 6 Mar 2013 10:14:10 -0800
To: Alex Russell <slightlyoff@google.com>
Cc: Miko Nieminen <miko.nieminen@iki.fi>, Marcos Caceres <marcosscaceres@gmail.com>, Jeni Tennison <jeni@jenitennison.com>, Shwetank Dixit <shwetankd@opera.com>, "www-tag@w3.org" <www-tag@w3.org>, Webapps WG <public-webapps@w3.org>, Joshua Bell <jsbell@google.com>, Jonas Sicking <sicking@mozilla.com>, mlamouri@mozilla.com, Tab Atkins <tabatkins@google.com>, Yehuda Katz <wycats@gmail.com>, Andrei Popescu <andreip@google.com>
Message-ID: <CAHWpXebgTQzYqnhAni8c2pd-n5opTMM2UizbWFLi=1X-Vtc+qg@mail.gmail.com>
My primary takeaway from both working on IDB and working with IDB for some
demo apps is that IDB has just the right amount of complexity for really
large, robust database use.. but for a "welcome to noSQL in the browser" it
is way too complicated.

Specifically:

   1. *versioning* - The reason this exists in IDB is to guarantee a schema
   (read: a fixed set of objectStores + indexes) for a given set of
   operations.  Versioning should be optional. And if versioning is optional,
   so should *opening* - the only reason you need to "open" a database is
   so that you have a handle to a versioned database. You can *almost*
implement
   versioning in JS if you really care about it...(either keep an explicit
   key, or auto-detect the state of the schema) its one of those cases where
   80% of versioning is dirt simple  and the complicated stuff is really about
   maintaining version changes across multiply-opened windows. (i.e. one
   window opens an idb, the next window opens it and changes the schema, the
   first window *may* need to know that and be able to adapt without
   breaking any in-flight transactions) -
   2. *transactions* - Also should be optional. Vital to complex apps, but
   totally not necessary for many.. there should be a default transaction,
   like db.objectStore("foo").get("bar")
   3. *transaction scoping* - even when you do want transactions, the api
   is just too verbose and repetitive for "get one key from one object store"
   - db.transaction("foo").objectStore("foo").get("bar") - there should be
   implicit (lightweight) transactions like db.objectStore("foo").get("bar")
   4. *forced versioning* - when versioning is optional, it should be then
   possible to change the schema during a regular transaction. Yes, this is a
   lot of rope but this is actually for much more complex apps, rather than
   simple ones. In particular, it's not uncommon for more complex database
   systems to dynamically create indexes based on observed behavior of the
   API, or observed data (i.e. when data with a particular key becomes
   prevalent, generate an index for it) and then dynamically use them if
   present. At the moment you have to do a manual close/open/version change to
   dynamically bump up the version - effectively rendering fixed-value
   versions moot (i.e. the schema for version 23 in my browser may look
   totally different than the schema for version 23 in your browser) and
   drastically complicating all your code (Because if you try to close/open
   while transactions are in flight, they will be aborted - so you have to
   temporarily pause all new transactions, wait for all in-flight transactions
   to finish, do a close/open, then start running all pending/paused
   transactions.) This last case MIGHT be as simple as adding
   db.reopen(newVersion) to the existing spec.
   5. *named object stores* - frankly, for *many* use cases, a single
   objectStore is all you need. a simple db.get("foo") would be sufficient.
   Simply naming a "default" isn't bad - whats bad is all the onupgradeneeded
   scaffolding required to create the objectstore in the first place.

I do think that the IDBRequest model needs tweaking, and Futures seem like
the obvious direction to head in.

FWIW, the "sync" version of the API is more or less dead - nobody has
actually implemented it.

I think there is a very specialized set of applications that absolutely
need the features that IDB has right now. Google Docs is a perfect example
- long lived complicated application that needs to keep absolute integrity
of schema across multiple tabs over a long period of time.. but for 99% of
usecases out there, I think they're unnecessary.

I think ultimately, a simplified IDB would allow progressive use of the api
as your application grows.

// basic interaction - some objectStore named 'default' gets crated under
the hood.
indexedDB.get("mykey");
// named database, auto-create the 'first' objectStore named 'default', no
need to 'close' anything
indexedDB.database("mydb").get("mykey")
// now we need multiple objectstores:
indexedDB.database("mydb").objectStore("default").get("mykey")
// time for versioning, but using 'default'
indexedDB.open("mydb", 12).onupgradeneeded(function (db) {...}).get("bar")

etc...


Alec



On Wed, Mar 6, 2013 at 6:01 AM, Alex Russell <slightlyoff@google.com> wrote:

> Comments inline. Adding some folks from the IDB team at Google to the
> thread as well as public-webapps.
>
> On Sunday, February 17, 2013, Miko Nieminen wrote:
>
>>
>>
>> 2013/2/15 Shwetank Dixit <shwetankd@opera.com>
>>
>>>  Why did you feel it was necessary to write a layer on top of IndexedDB?
>>>>
>>>
>>> I think this is the main issue here.
>>>
>>> As it stands, IDB is great in terms of features and power it offers, but
>>> the feedback I recieved from other devs was that writing raw IndexedDB
>>> requires an uncomfortable amount of verbosity even for some simple tasks
>>> (This can be disputed, but that is the views I got from some of the
>>> developers I interacted with). Adding that much amount of code (once again,
>>> im talking of raw IndexedDB) makes it less readable and understandable. For
>>> beginners, this all seemed very intimidating, and for some people more
>>> experienced, it was a bit frustrating.
>>>
>>>
>> After my experiments with IDB, I don't feel that it is particularly
>> verbose. I have to admit that often I prefer slightly verbose syntax over
>> shorter one when it makes reading the code easier. In IDB's case, I think
>> this is the case.
>>
>>
>>
>>>  For the latter bit, I reckon it would be a good practice for groups
>>>> working on low-level APIs to more or less systematically produce a library
>>>> that operates at a higher level. This would not only help developers in
>>>> that they could pick that up instead of the lower-level stuff, but more
>>>> importantly (at least in terms of goals) it would serve to validate that
>>>> the lower-level design is indeed appropriate for librarification.
>>>>
>>>
>>> I think that would be a good idea. Also, people making those low level
>>> APIs should still keep in mind that the resulting code should not be too
>>> verbose or complex. Librarification should be an advantage, but not a de
>>> facto requirement for developers when it comes to such APIs. It should
>>> still be feasable for them to write code in the raw low level API without
>>> writing uncomfortably verbose or complex code for simple tasks. Spec
>>> designers of low level APIs should not take this as a license to make
>>> things so complex that only they and a few others understand it, and then
>>> hope that some others will go ahead and make it simple for the 'common
>>> folk' through an abstraction library.
>>
>>
>> I quite don't see how to simplify IDB syntax much more.
>>
>
> I've avoided weighing in on this thread until I had more IDB experience.
> I've been wrestling with it on two fronts of late:
>
>
>    - A re-interpretation of the API based on Futures:
>
>    https://github.com/slightlyoff/DOMFuture/tree/master/reworked_APIs/IndexedDB
>    - A new async LocalStorage design + p(r)olyfill that's bootstrapped on
>    IDB:
>    https://github.com/slightlyoff/async-local-storage
>
> While you might be right that it's unlikely that the API can be
> "simplified", I think it's trivial to extend it in ways that make it easier
> to reason about and use.
>
> This thread started out with a discussion of what might be done to keep
> IDB's perceived mistakes from reoccurring. Here's a quick stab at both an
> outline of the mistakes and what can be done to avoid them:
>
>
>    - *Abuse of events*
>    The current IDB design models one-time operations using events. This *
>    can* make sense insofar as events can occur zero or more times in the
>    future, but it's not a natural fit. What does it mean for oncomplete to
>    happen more than once? Is that an error? Are onsuccess and onerror
>    exclusive? Can they both be dispatched for an operation? The API isn't
>    clear. Events don't lead to good design here as they don't encapsulate
>    these concerns. Similarly, event handlers don't chain. This is natural, as
>    they could be invoked multiple times (conceptually), but it's not a good
>    fit for data access. It's great that IDB as async, and events are the
>    existing DOM model for this, but IDB's IDBRequest object is calling out for
>    a different kind of abstraction. I'll submit Futures for the job, but
>    others might work (explicit callback, whatever) so long as they maintain
>    chainability + async.
>
>    - *Implicitness*
>    IDB is implicit in a number of places that cause confusion for folks
>    not intimately familiar with the contract(s) that IDB expects you to enter
>    into. First, the use of events for delivery of notifications means that
>    sequential-looking code that you might expect to have timing issues
>    doesn't. Why not? Because IDB operates in some vaguely async way; you can't
>    reason at all about events that have occurred in the past (they're not
>    values, they're points in time). I can't find anywhere in the spec that the
>    explicit gaurantees about delivery timing are noted (
>    http://www.w3.org/TR/IndexedDB/#async-api), so one could read IDB code
>    that registers two callbacks as having a temporal dead-zone: a space in
>    code where something might have happened but which your code might not have
>    a chance to hear about. I realize that in practice this isn't the case;
>    event delivery for these is asynchronous, but the soonest timing isn't
>    defined: end of turn? next turn? end-of-microtask? This means that it's
>    possible to have implementations the differ on delivery timing, astonishing
>    those who register event handlers at the wrong time. This is part DOM-ish
>    use of events for things they're not suited to and a lack of specificity in
>    the spec. Both can be fixed.
>
>    A related bit of implicitness is the transaction object. Auto-open and
>    auto-close might be virtues, but they come with costs. *When* does a
>    transaction auto-close? It's not clear from the spec; 4.2 says that a
>    transaction must be inactive when control returns to the event loop, but
>    gives no indication of what the nearest timing for that is. It's also not
>    clear how to keep a transaction "alive" across turns (a basic need), create
>    sub-transactions (a key feature of many transaction-oriented DBs), and
>    detect that a transaction object is in something other than the "active"
>    state. The last bit is particularly galling: you can have a handle to the
>    object, but users can't ask for state they might want, despite the spec
>    spending a great deal of time telling implementers that they must do this
>    and that with this bit. If there's a principle at issue, it's the idea that
>    specs -- particularly low-level APIs -- should not reserve to themselves
>    state and information that they need but for which they don't immediately
>    spot a user need. There's an obvious exception in the case of security
>    boundaries, but that's a different thing entirely. Generally speaking, if
>    you need it when writing down how your API operates, your users will too.
>    It's particularly punitive to be throwing exceptions for violations of
>    state you can't inspect but could manually cobble together from a large set
>    of events.
>
>    - *Confused collection interfaces
>    *IDB has a factory for databases and object stores and
>    allows retrieval of them by name (asynchronously, which is good)...but
>    doesn't provide a coherent Map interface onto them. By being DOM-ish and
>    not JS-ish, IDB once again creates oddball JS objects that could pun with
>    built-ins and therefore ease the learning curve, but doesn't. No, these
>    aren't (synchronous) maps, but punning the API with ES6's Map type would go
>    a long way.
>
>    - *Doubled API surface for sync version*
>    I assume I just don't understand why this choice was made, but the
>    explosion of API surface area combined with the conditional availability of
>    this version of the API make it an odd beast (to be charitable).
>
>    - *The idea that this is all going to be wrapped up by libraries anyway
>    *
>    This is aesthetic, as therefore subjective, but IDB is not a beautiful
>    API; nor does it seem evident that beauty and clarity were explicit goals.
>    I wasn't involved and don't know all the motivations (nor do I have time to
>    read all the minutes now), but there seems to be some apology happening now
>    for the lack of beauty and usability the the API based on the idea that
>    it'll just be wrapped up by libraries. This is failure for an API designer;
>    we should recognize it as such and try to belay it as long as possible.
>    Yes, all APIs are eventually wrapped as our general level of abstraction
>    goes up the stack, but it's possible to provide solid, usable APIs that
>    stand the test of time. Certainly no *new* API should plan on being as
>    painful to use as DOM has been historically.
>
> I'll close by saying that all of this is tractable. We can retrofit
> IDBRequest to be a Future subclass, create a Map-alike interface for the
> list of DB's and object stores, and move away from events where they're not
> natural; all without breaking the API. I'm hopeful we can do it quickly.
>
>
>> I think its request object based API is very nice and transactions are
>> much appreciated. Possible simplification could be achieved by introducing
>> somekind of auto transaction mechanism so that user could get and change
>> objects without creating transactions. There are some challenges to enable
>> this and it would complicate the engine especially if transactions are
>> still supported when users want to use those. And I hope transactions are
>> not dropped completely. When using CouchDB, I often find my self writing
>> some fairly painful code to handle the lack of transactions.
>>
>> Since IDB is aiming for its first standardised version of the API, I
>> wouldn't be too worried about people writing Javascript libraries that
>> simplify its use. As long as all low level capabilities are in place for
>> writing these abstractions, we should be in good order for the first
>> version of the standard. Later in following versions of the API we have
>> more experience about painful parts of IDB API and we can improve it and
>> simplify its use. Extending API by creating additional abstractions to
>> simplify its use is often more easier than going to other direction at
>> least according to my experience.
>>
>> --
>> Miko Nieminen
>> miko.nieminen@iki.fi
>> miko.nieminen@gmail.com
>>
>>
Received on Wednesday, 6 March 2013 18:15:02 UTC