Re: IndexedDB, what were the issues? How do we stop it from happening again?

On Wed, Mar 6, 2013 at 11:02 AM, Ian Fette (イアンフェッティ) <ifette@google.com>wrote:

> I seem to recall we contemplated people writing libraries on top of IDB
> from the beginning. I'm not sure why this is a bad thing. We originally
> shipped "web sql" / sqlite, which was a familiar interface for many and
> relatively easy to use, but had a sufficiently large API surface area that
> no one felt they wanted to document the whole thing such that we could have
> an inter-operable standard. (Yes, I'm simplifying a bit.) As a result, we
> came up with an approach of "What are the fundamental primitives that we
> need?", spec'd that out, and shipped it. We had discussions at the time
> that we expected library authors to produce abstraction layers that made
> IDB easier to use, as the "fundamental primitives" approach was not
> necessarily intended to produce an API that was as straightforward and easy
> to use as what we were trying to replace. If that's now what is happening,
> that seems like a good thing, not a failure.
>
>
That's fine for building up, but I guess what I'm saying is that the
primitives are too complicated to allow you to get started. There is an
excellent HTML5rocks page on indexeddb describing a very simple usecase.
But if I were a web developer, I'd say "screw that, back to localStorage".

Most of the html5 APIs seem almost too simple to be useful, and then people
chain primitives together into useful APIs with libraries. For example, the
DOM APIs are woefully verbose and primitive, but people have built much
better APIs around them. But fundamentally they are primitive.

IndexedDB seems like its build the other way - we looked at what would be
hard for developers to implement themselves (transactions, versions) and
built *primitives* that required them. At the same time, I'd argue that
versioning and transactions *could* be built upon a transactionless,
versionless keystore in JavaScript, using a library. Even the notion of
multiple objectStores is an abstraction that could be implemented on a
single keystore. To be fair, implementing transactions against a
transactionless keystore would not be *performant*, but thats a separate
issue. Now that we have transactions and versions, I wouldn't eliminate
them from IDB by any means, but they're not required.

Alec


> -Ian
>
>
> On Wed, Mar 6, 2013 at 10:14 AM, Alec Flett <alecflett@chromium.org>wrote:
>
>> My primary takeaway from both working on IDB and working with IDB for
>> some demo apps is that IDB has just the right amount of complexity for
>> really large, robust database use.. but for a "welcome to noSQL in the
>> browser" it is way too complicated.
>>
>> Specifically:
>>
>>    1. *versioning* - The reason this exists in IDB is to guarantee a
>>    schema (read: a fixed set of objectStores + indexes) for a given set of
>>    operations.  Versioning should be optional. And if versioning is optional,
>>    so should *opening* - the only reason you need to "open" a database
>>    is so that you have a handle to a versioned database. You can *almost* implement
>>    versioning in JS if you really care about it...(either keep an explicit
>>    key, or auto-detect the state of the schema) its one of those cases where
>>    80% of versioning is dirt simple  and the complicated stuff is really about
>>    maintaining version changes across multiply-opened windows. (i.e. one
>>    window opens an idb, the next window opens it and changes the schema, the
>>    first window *may* need to know that and be able to adapt without
>>    breaking any in-flight transactions) -
>>    2. *transactions* - Also should be optional. Vital to complex apps,
>>    but totally not necessary for many.. there should be a default transaction,
>>    like db.objectStore("foo").get("bar")
>>    3. *transaction scoping* - even when you do want transactions, the
>>    api is just too verbose and repetitive for "get one key from one object
>>    store" - db.transaction("foo").objectStore("foo").get("bar") - there should
>>    be implicit (lightweight) transactions like db.objectStore("foo").get("bar")
>>    4. *forced versioning* - when versioning is optional, it should be
>>    then possible to change the schema during a regular transaction. Yes, this
>>    is a lot of rope but this is actually for much more complex apps, rather
>>    than simple ones. In particular, it's not uncommon for more complex
>>    database systems to dynamically create indexes based on observed behavior
>>    of the API, or observed data (i.e. when data with a particular key becomes
>>    prevalent, generate an index for it) and then dynamically use them if
>>    present. At the moment you have to do a manual close/open/version change to
>>    dynamically bump up the version - effectively rendering fixed-value
>>    versions moot (i.e. the schema for version 23 in my browser may look
>>    totally different than the schema for version 23 in your browser) and
>>    drastically complicating all your code (Because if you try to close/open
>>    while transactions are in flight, they will be aborted - so you have to
>>    temporarily pause all new transactions, wait for all in-flight transactions
>>    to finish, do a close/open, then start running all pending/paused
>>    transactions.) This last case MIGHT be as simple as adding
>>    db.reopen(newVersion) to the existing spec.
>>    5. *named object stores* - frankly, for *many* use cases, a single
>>    objectStore is all you need. a simple db.get("foo") would be sufficient.
>>    Simply naming a "default" isn't bad - whats bad is all the onupgradeneeded
>>    scaffolding required to create the objectstore in the first place.
>>
>> I do think that the IDBRequest model needs tweaking, and Futures seem
>> like the obvious direction to head in.
>>
>> FWIW, the "sync" version of the API is more or less dead - nobody has
>> actually implemented it.
>>
>> I think there is a very specialized set of applications that absolutely
>> need the features that IDB has right now. Google Docs is a perfect example
>> - long lived complicated application that needs to keep absolute integrity
>> of schema across multiple tabs over a long period of time.. but for 99% of
>> usecases out there, I think they're unnecessary.
>>
>> I think ultimately, a simplified IDB would allow progressive use of the
>> api as your application grows.
>>
>> // basic interaction - some objectStore named 'default' gets crated under
>> the hood.
>> indexedDB.get("mykey");
>> // named database, auto-create the 'first' objectStore named 'default',
>> no need to 'close' anything
>> indexedDB.database("mydb").get("mykey")
>> // now we need multiple objectstores:
>> indexedDB.database("mydb").objectStore("default").get("mykey")
>> // time for versioning, but using 'default'
>> indexedDB.open("mydb", 12).onupgradeneeded(function (db) {...}).get("bar")
>>
>> etc...
>>
>>
>> Alec
>>
>>
>>
>> On Wed, Mar 6, 2013 at 6:01 AM, Alex Russell <slightlyoff@google.com>wrote:
>>
>>> Comments inline. Adding some folks from the IDB team at Google to the
>>> thread as well as public-webapps.
>>>
>>> On Sunday, February 17, 2013, Miko Nieminen wrote:
>>>
>>>>
>>>>
>>>> 2013/2/15 Shwetank Dixit <shwetankd@opera.com>
>>>>
>>>>>  Why did you feel it was necessary to write a layer on top of
>>>>>> IndexedDB?
>>>>>>
>>>>>
>>>>> I think this is the main issue here.
>>>>>
>>>>> As it stands, IDB is great in terms of features and power it offers,
>>>>> but the feedback I recieved from other devs was that writing raw IndexedDB
>>>>> requires an uncomfortable amount of verbosity even for some simple tasks
>>>>> (This can be disputed, but that is the views I got from some of the
>>>>> developers I interacted with). Adding that much amount of code (once again,
>>>>> im talking of raw IndexedDB) makes it less readable and understandable. For
>>>>> beginners, this all seemed very intimidating, and for some people more
>>>>> experienced, it was a bit frustrating.
>>>>>
>>>>>
>>>> After my experiments with IDB, I don't feel that it is particularly
>>>> verbose. I have to admit that often I prefer slightly verbose syntax over
>>>> shorter one when it makes reading the code easier. In IDB's case, I think
>>>> this is the case.
>>>>
>>>>
>>>>
>>>>>  For the latter bit, I reckon it would be a good practice for groups
>>>>>> working on low-level APIs to more or less systematically produce a library
>>>>>> that operates at a higher level. This would not only help developers in
>>>>>> that they could pick that up instead of the lower-level stuff, but more
>>>>>> importantly (at least in terms of goals) it would serve to validate that
>>>>>> the lower-level design is indeed appropriate for librarification.
>>>>>>
>>>>>
>>>>> I think that would be a good idea. Also, people making those low level
>>>>> APIs should still keep in mind that the resulting code should not be too
>>>>> verbose or complex. Librarification should be an advantage, but not a de
>>>>> facto requirement for developers when it comes to such APIs. It should
>>>>> still be feasable for them to write code in the raw low level API without
>>>>> writing uncomfortably verbose or complex code for simple tasks. Spec
>>>>> designers of low level APIs should not take this as a license to make
>>>>> things so complex that only they and a few others understand it, and then
>>>>> hope that some others will go ahead and make it simple for the 'common
>>>>> folk' through an abstraction library.
>>>>
>>>>
>>>> I quite don't see how to simplify IDB syntax much more.
>>>>
>>>
>>> I've avoided weighing in on this thread until I had more IDB experience.
>>> I've been wrestling with it on two fronts of late:
>>>
>>>
>>>    - A re-interpretation of the API based on Futures:
>>>
>>>    https://github.com/slightlyoff/DOMFuture/tree/master/reworked_APIs/IndexedDB
>>>    - A new async LocalStorage design + p(r)olyfill that's bootstrapped
>>>    on IDB:
>>>    https://github.com/slightlyoff/async-local-storage
>>>
>>> While you might be right that it's unlikely that the API can be
>>> "simplified", I think it's trivial to extend it in ways that make it easier
>>> to reason about and use.
>>>
>>> This thread started out with a discussion of what might be done to keep
>>> IDB's perceived mistakes from reoccurring. Here's a quick stab at both an
>>> outline of the mistakes and what can be done to avoid them:
>>>
>>>
>>>    - *Abuse of events*
>>>    The current IDB design models one-time operations using events. This
>>>    *can* make sense insofar as events can occur zero or more times in
>>>    the future, but it's not a natural fit. What does it mean for oncomplete to
>>>    happen more than once? Is that an error? Are onsuccess and onerror
>>>    exclusive? Can they both be dispatched for an operation? The API isn't
>>>    clear. Events don't lead to good design here as they don't encapsulate
>>>    these concerns. Similarly, event handlers don't chain. This is natural, as
>>>    they could be invoked multiple times (conceptually), but it's not a good
>>>    fit for data access. It's great that IDB as async, and events are the
>>>    existing DOM model for this, but IDB's IDBRequest object is calling out for
>>>    a different kind of abstraction. I'll submit Futures for the job, but
>>>    others might work (explicit callback, whatever) so long as they maintain
>>>    chainability + async.
>>>
>>>    - *Implicitness*
>>>    IDB is implicit in a number of places that cause confusion for folks
>>>    not intimately familiar with the contract(s) that IDB expects you to enter
>>>    into. First, the use of events for delivery of notifications means that
>>>    sequential-looking code that you might expect to have timing issues
>>>    doesn't. Why not? Because IDB operates in some vaguely async way; you can't
>>>    reason at all about events that have occurred in the past (they're not
>>>    values, they're points in time). I can't find anywhere in the spec that the
>>>    explicit gaurantees about delivery timing are noted (
>>>    http://www.w3.org/TR/IndexedDB/#async-api), so one could read IDB
>>>    code that registers two callbacks as having a temporal dead-zone: a space
>>>    in code where something might have happened but which your code might not
>>>    have a chance to hear about. I realize that in practice this isn't the
>>>    case; event delivery for these is asynchronous, but the soonest timing
>>>    isn't defined: end of turn? next turn? end-of-microtask? This means that
>>>    it's possible to have implementations the differ on delivery timing,
>>>    astonishing those who register event handlers at the wrong time. This is
>>>    part DOM-ish use of events for things they're not suited to and a lack
>>>    of specificity in the spec. Both can be fixed.
>>>
>>>    A related bit of implicitness is the transaction object. Auto-open
>>>    and auto-close might be virtues, but they come with costs. *When* does
>>>    a transaction auto-close? It's not clear from the spec; 4.2 says that a
>>>    transaction must be inactive when control returns to the event loop, but
>>>    gives no indication of what the nearest timing for that is. It's also not
>>>    clear how to keep a transaction "alive" across turns (a basic need), create
>>>    sub-transactions (a key feature of many transaction-oriented DBs), and
>>>    detect that a transaction object is in something other than the "active"
>>>    state. The last bit is particularly galling: you can have a handle to the
>>>    object, but users can't ask for state they might want, despite the spec
>>>    spending a great deal of time telling implementers that they must do this
>>>    and that with this bit. If there's a principle at issue, it's the idea that
>>>    specs -- particularly low-level APIs -- should not reserve to themselves
>>>    state and information that they need but for which they don't immediately
>>>    spot a user need. There's an obvious exception in the case of security
>>>    boundaries, but that's a different thing entirely. Generally speaking, if
>>>    you need it when writing down how your API operates, your users will too.
>>>    It's particularly punitive to be throwing exceptions for violations of
>>>    state you can't inspect but could manually cobble together from a large set
>>>    of events.
>>>
>>>    - *Confused collection interfaces
>>>    *IDB has a factory for databases and object stores and
>>>    allows retrieval of them by name (asynchronously, which is good)...but
>>>    doesn't provide a coherent Map interface onto them. By being DOM-ish and
>>>    not JS-ish, IDB once again creates oddball JS objects that could pun with
>>>    built-ins and therefore ease the learning curve, but doesn't. No, these
>>>    aren't (synchronous) maps, but punning the API with ES6's Map type would go
>>>    a long way.
>>>
>>>    - *Doubled API surface for sync version*
>>>    I assume I just don't understand why this choice was made, but the
>>>    explosion of API surface area combined with the conditional availability of
>>>    this version of the API make it an odd beast (to be charitable).
>>>
>>>    - *The idea that this is all going to be wrapped up by libraries
>>>    anyway*
>>>    This is aesthetic, as therefore subjective, but IDB is not a
>>>    beautiful API; nor does it seem evident that beauty and clarity were
>>>    explicit goals. I wasn't involved and don't know all the motivations (nor
>>>    do I have time to read all the minutes now), but there seems to be some
>>>    apology happening now for the lack of beauty and usability the the API
>>>    based on the idea that it'll just be wrapped up by libraries. This is
>>>    failure for an API designer; we should recognize it as such and try to
>>>    belay it as long as possible. Yes, all APIs are eventually wrapped as our
>>>    general level of abstraction goes up the stack, but it's possible to
>>>    provide solid, usable APIs that stand the test of time. Certainly no
>>>    *new* API should plan on being as painful to use as DOM has been
>>>    historically.
>>>
>>> I'll close by saying that all of this is tractable. We can retrofit
>>> IDBRequest to be a Future subclass, create a Map-alike interface for the
>>> list of DB's and object stores, and move away from events where they're not
>>> natural; all without breaking the API. I'm hopeful we can do it quickly.
>>>
>>>
>>>> I think its request object based API is very nice and transactions are
>>>> much appreciated. Possible simplification could be achieved by introducing
>>>> somekind of auto transaction mechanism so that user could get and change
>>>> objects without creating transactions. There are some challenges to enable
>>>> this and it would complicate the engine especially if transactions are
>>>> still supported when users want to use those. And I hope transactions are
>>>> not dropped completely. When using CouchDB, I often find my self writing
>>>> some fairly painful code to handle the lack of transactions.
>>>>
>>>> Since IDB is aiming for its first standardised version of the API, I
>>>> wouldn't be too worried about people writing Javascript libraries that
>>>> simplify its use. As long as all low level capabilities are in place for
>>>> writing these abstractions, we should be in good order for the first
>>>> version of the standard. Later in following versions of the API we have
>>>> more experience about painful parts of IDB API and we can improve it and
>>>> simplify its use. Extending API by creating additional abstractions to
>>>> simplify its use is often more easier than going to other direction at
>>>> least according to my experience.
>>>>
>>>> --
>>>> Miko Nieminen
>>>> miko.nieminen@iki.fi
>>>> miko.nieminen@gmail.com
>>>>
>>>>
>>
>

Received on Wednesday, 6 March 2013 19:22:18 UTC