Re: IndexedDB: Thoughts on implementing IndexedDB from Austin William Wright on 2013-08-02 (public-webapps@w3.org from July to September 2013)

From: Austin William Wright <aaa@bzfx.net>
Date: Fri, 2 Aug 2013 15:18:49 -0700
To: Joshua Bell <jsbell@google.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CANkuk-VQTP7keaTymb0+tJnRgp-bukRzBF1a1Kr035291xv83Q@mail.gmail.com>
On Tue, Jul 30, 2013 at 3:13 PM, Joshua Bell <jsbell@google.com> wrote:

> And now replying to the non-nits:
>
>
> On Tue, Jul 30, 2013 at 1:30 AM, Austin William Wright <aaa@bzfx.net>wrote:
>
>> I've been meaning to implement IndexedDB in some fashion for a while.
>> Earlier this month, shortly after the call for implementations, I realized
>> I should be getting on that. I've been working on an in-memory ECMAScript
>> implementation with fast data structures and the like. I also intend to
>> experiment with new features like new types of indexes (hash tables that
>> can't be iterated, and index values calculated by expression/function,
>> which appears to have been discussed elsewhere).
>>
>> I've had a few thoughts, mostly about language:
>>
>> (1) Is there no way to specify an arbitrary nested path? I want to do
>> something like ['menus', x] where `x` is some token which may be anything,
>> like an empty string or a string with a period in it. This is especially
>> important if there are structures like {"http://example.com/URI":
>> "value"} in documents, which is especially common in JSON-LD. From what I
>> can tell, IndexedDB essentially makes it impossible to index JSON-LD
>> documents.
>>
>> It appears the current behavior instead allows you to index by multiple
>> keys, but it's not immediately obvious this is the rationale.
>>
>> How *would* one include a property whose key includes a period? This
>> seems to be asking for security problems, if authors need to implement an
>> escaping scheme for their keys, either when constructing a key path or when
>> constructing objects. Database names can be anything, why not key names?
>>
>>
> The key path mechanism (and by definition, the index mechanism) definitely
> doesn't support every use case. It is focused on the "simple" case where
> the structure of the data being stored is under the control of the
> developer authoring code against the IDB API. Slap a library in the middle
> that's exposing a radically different storage API to authors and that
> library is going to need to compute index keys on its own and produce
> wrapper objects, or some such.
>
> One of the ideas that's been talked about for "v2" is extensible indexing,
> allowing the index key to be computed by a script function.
>

Computing index keys would be a fantastic step for any database, I think.
If one could also define a custom comparison operator, this would perhaps
one of the more powerful features in a database that I've seen.


>
>
>> (3) I had trouble deciphering the exact behavior of multiple open
>> transactions on one another. I eventually realized the definition of
>> IDBTransactionMode describes the behavior.
>>
>> Still, however, this document appears to talk in terms of what is
>> "written to the database". But this isn't well defined. If something is
>> written to the database, wouldn't it affect what is read in a readonly
>> transaction? (No.)
>>
>> And the language seems inconsistent. The language for `abort` says that
>> changes to the database must be "rolled back" (as if every operation writes
>> to storage), but the language for `Steps for committing a transaction`
>> specifies it is at that time the data is written (as if all write
>> operations up to this point are kept in memory). There's not strictly a
>> contradiction here, but perhaps more neutral language could be used.
>>
>>
> Agreed, this could be improved. (Practically speaking, I expect that would
> happen if we end up with implementation differences that require refining
> the language in a future iteration.)
>
>
>> (5) I found the language for iterating and creating a Cursor hard to
>> understand being nested in multiple layers of algorithms. Specifically,
>> where an IDBCursor instance was actually exposed to the user. But now it
>> makes sense, and I don't really see how it might be improved. An
>> (informative) example on iterating a cursor may be helpful.
>>
>>
> I recently added one towards the start of the spec ("The following example
> looks up all books in the database by author using an index and a cursor")
> - is that what you were thinking? Is it just a matter of spec organization?
> I think at some point in the spec history the examples were more integrated
> into the text.
>

I recall eventually finding that example, I think that works.


>
>
>> (6) The document refers to the HTML5 Structured Clone Algorithm. It's a
>> bit concerning that it has to refer to ECMAScript algorithms defined in a
>> specification that defines a markup language. I don't think referring to a
>> markup language should be necessary (I don't intend on using my
>> implementation in an (X)HTML environment, just straight XML if anything at
>> all), though perhaps this is just a modularity problem with the HTML5 draft
>> (or rather, lack thereof).
>>
>
> Agreed that it seems like an odd place for it in the abstract, but the
> HTML spec defines much of the behavior of the browser environment beyond
> the markup language. Hixie and Anne are doing some spec refactoring work;
> perhaps some day it will be more modular. Indexed DB is very much designed
> to be an API for scripts running in Web browsers, though.
>

One of the directions I'd like to take IndexedDB is making it
cross-platform, so developers can easily write applications that
synchronize between multiple devices including a central server.

Though perhaps not the focus of Web applications, I think there's a ton of
functionality to be realized by considering that Web Browsers aren't the
only type of user agent, or indeed even the only implementer of the APIs.

Web browsers are largely the only type of user agent that implement
scripting. This is part of the code-on-demand component of the Web's
architecture, that generic user agents who don't understand how to consume
a document can be told how to do so, but it's not /necessary/ in a RESTful
network application. This impacts IndexedDB when, in an application, a
client has the option of reproducing the server functionality on the client
side, like storing, editing, and locally caching emails or documents. And
then if not, such as if the client is a robot or a highly coupled user
agent (i.e. a client to a single HTTP API), it still has the option of
interacting with the same data over the network.

In order to do this, the server needs to be able to understand the
semantics of IndexedDB so it can consistently handle the data when it's
coming back from the client. The most straightforward way is just use
IndexedDB as the database server.

For this reason I consider small, modular specifications and "full"
specifications critical. For instance, IndexedDB also depends on DOM
Events. I love DOM Events, but implementing it requires traversing a
network of many "delta specifications". It implies that anywhere I want to
implement IndexedDB, I also need to be able to parse all of HTML *and* XML,
and have a notion of what is "clickable" and such. Perhaps I can use
IndexedDB with only a small subset of DOM, but software libraries won't be
written this way, making programs big and bloated. Obviously, this is not
ideal or necessary for most types of user agents (including
not-really-user-agent servers), and as such I think it's a major hindrance
to Web applications -- without the modularity of having e.g. "DOM Events"
be a completely independent specification from "HTML Semantics" or "CSS
Stylesheet for HTML" (to graphically render HTML), it becomes impossible or
costly to implement Web technologies in many kinds of user agents without
willfully violating the specifications.

Perhaps most people are aware of this, but I really want to stress its
importance.


>
>>
>>  Finally, is there a good test suite? I can't seem to find anything in
>> the way of regression tests. I'll perhaps publish my own, if not.
>>
>>
>> Austin Wright.
>>
>
>
> More tests welcome! The w3c has a test repo that Art has linked to in a
> fork of this thread. Blink's tests are here:
> http://src.chromium.org/viewvc/blink/trunk/LayoutTests/storage/indexeddb/
>
>
I'm going to have to figure out how to implement these programmatically.
None of these tests seem to have a README.
Received on Friday, 2 August 2013 22:19:17 UTC