[IndexedDB] Detailed comments for the current draft from Pablo Castro on 2010-01-26 (public-webapps@w3.org from January to March 2010)

From: Pablo Castro <Pablo.Castro@microsoft.com>
Date: Tue, 26 Jan 2010 20:47:22 +0000
To: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <F753B2C401114141B426DB383C8885E02C97CA95@TK5EX14MBXC128.redmond.corp.microsoft.>

These are notes that we collected both from reviewing the spec (editor's draft up to Jan 24th) and from a prototype implementation that we are working on. I didn't realize we had this many notes, otherwise I would have been sending intermediate notes early. Will do so next round.

1. Keys and sorting

a. 3.1.1: it would seem that having also date/time values as keys would be important and it's a common sorting criteria (e.g. as part of a composite primary key or in general as an index key).
b. 3.1.1: similarly, sorting on number in general (not just integers/longs) would be important (e.g. price lists, scores, etc.)
c. 3.1.1: cross type sorting and sorting of long values are clear. Sorting of strings however needs more elaboration. In particular, which collation do we use? Does the user or developer get to choose a collation? If we pick up a collation from the environment (e.g. the OS), if the collation changes we'd have to re-index all the databases.
d. 3.1.3: spec reads "…key path must be the name of an enumerated property…"; how about composite keys (would make the related APIs take a DOMString or DOMStringList)

2. Values

a. 3.1.2: isn't the requirement for "structured clones" too much? It would mean implementations would have to be able to store and retrieve File objects and such. Would it be more appropriate to say it's just graphs of Javascript primitive objects/values (object, string, number, date, arrays, null)?

3. Object store

a. 3.1.3: do we really need in-line + out-of-line keys? Besides the concept-count increase, we wonder whether out-of-line keys would cause trouble to generic libraries, as the values for the keys wouldn't be part of the values iterated when doing a "foreach" over the table.
b. Query processing libraries will need temporary stores, which need temporary names. Should we introduce an API for the creation of temporary stores with transaction lifetime and no name?
c. It would be nice to have an estimate row count on each store. This comes at an implementation and runtime cost. Strong opinions? Lacking everything else, this would be the only statistic to base decisions on for a query processor.
d. The draft does not touch on how applications would do optimistic concurrency. A common way of doing this is to use a timestamp value that's automatically updated by the system every time someone touches the row. While we don't feel it's a must have, it certainly supports common scenarios.

4. Indexes

a. 3.1.4 mentions "auto-populated" indexes, but then there is no mention of other types. We suggest that we remove this and in the algorithms section describe side-effecting operations as always updating the indexes as well.
b. If during insert/update the value of the key is not present (i.e. undefined as opposite to null or a value), is that a failure, does the row not get indexed, or is it indexed as null? Failure would probably cause a lot of trouble to users; the other two have correctness problems. An option is to index them as undefined, but now we have undefined and null as indexable keys. We lean toward this last option.
5. Databases
a. Not being able to enumerate database gets in the way of creating good tools and frameworks such as database explorers. What was the motivation for this? Is it security related?
b. Clarification on transactions: all database operations that affect the schema (create/remove store/index, setVersion, etc.) as well as data modification operations are assumed to be auto-commit by default, correct? Furthermore, all those operations (both schema and data) can happen within a transaction, including mixing schema and data changes. Does that line up with others' expectations? If so we should find a spot to articulate this explicitly.
c. No way to delete a database? It would be reasonable for applications to want to do that and let go of the user data (e.g. a "forget me" feature in a web site)
6. Transactions
a. While we understand the goal of simplifying developers' life with an error-free transactional model, we're not sure if we're making more harm by introducing more concepts into this space. Wouldn't it be better to use regular transactions with a well-known failure mode (e.g. either deadlocks or optimistic concurrency failure on commit)?
b. If in auto-commit mode, if two cursors are opened at the same time (e.g. to scan them in an interleaved way), are they in independent transactions simultaneously active in the same connection?

7. Algorithms

a. 3.2.2: steps 4 and 5 are inverted in order.
b. 3.2.2: when there is a key generator and the store uses in-line keys, should the generated key value be propagated to the original object (in addition to the clone), such that both are in sync after the put operation?
c. 3.2.3: step 2, probably editorial mistake? Wouldn't all indexes have a key path?
d. 3.2.4.2: in our experiments writing application code, the fact that this method throws an exception when an item is not found is quite inconvenient. It would be much natural to just return undefined, as this can be a primary code path (to not find something) and not an exceptional situation. Same for 3.2.5, step 2 and 3.2.6 step 2.
e. The algorithm to put a new object into a store currently indicates that the key of the object should be returned. How about other values that may be generated by the store? For example, if the store generates timestamps (not currently in the draft, but may be needed for optimistic concurrency control), how would be return them? should we update the actual object that was passed as a parameter with keys and other server-generated values?

8. Performance and API style

a. The async nature of the API makes regular scans very heavy on callbacks (one per row plus completion/error callbacks). This slows down scans a lot, so when doing a multiple scans (e.g. a reasonably complicated query that has joins, sorts and filters) performance will be bound by this even if everything else happens really fast. It would be interesting to support a block-fetch mode where the callback gets called for a number of buffered rows (indicated when the scan is initiated) instead of being called for a single row. This would be either a configuration option on openCursor or a new method on the cursor for

9. API

a. DatabaseSync.createIndex: what's the default for the unique argument?
b. DatabaseSync.createObjectStore: what's the default for autoIncrement?
c. DatabaseSync.openObjectStore: what's the default for mode?
d. DatabaseSync.transaction: what's the units for the timeout value? Seconds? Is there a value that means "infinite"?
e. ObjectStoreSync.get: see 7.d (return undefined instead of throwing an exception)
f. ObjectStoreSync: what happens to the reference if the underlying store is deleted through another connection? We propose it's ok to alter underlying objects in general and "visible" objects should be ready and start failing when the objects they surface go away or are altered.
g. CursorSync.openCursor: does the cursor start on the first record or before the first record? Should probably be before the first record so the first call to continue() can return false for empty stores, moving straight from BOF to EOF.
h. CursorSync.count: what scenario does this enable? Also, name is misleading; should be sameKeyCount or something that indicates it's the count only of the rows that share the current key.
i. CursorSync.value: when the cursor is over an index, shouldn't the value be read-only as changing it would make it inconsistent with the object store this index is for?
j. CursorSync.continue(): does it return false when it reaches the last record or when it's called *on* the last record and moves to EOF (effectively moved past the last record)? If it's sitting in EOF, does it "see" new inserts? (we assume not)
k. CursorSync.delete(): "delete" causes trouble, should be "remove"
l. CursorSync.delete(): what happens to the cursor position after this function returns? One option would be to leave the cursor on the deleted row, and fail all access attempts so only continue() can be called.
m. IndexSync: the put/delete methods seem to enable users to modify the index independently of the store, making them inconsistent. Given that the only kind of index described is auto-populated, it doesn't seem appropriate to have these.
n. Should we consider introducing an API that given an object and a store returns the key to that object? that would avoid the need for knowing the exact algorithm used to obtain the key from an object + path.

10. API (async specifics)

a. Currently the async API is only available on the window object and not to workers. Libraries are likely to target only one mode, in particular async, to work across all scenarios. So it would be important to have async also in workers.
b. DBRequest.abort(): it may not be possible to guarantee abort in all phases of execution, so this should be described as a "best effort" method; onsuccess would be called if the system decided to proceed and complete the operation, and onerror if abort succeeded at stopping the operation (with proper code indicating the error is due to an explicit abort request). In any case ready state should go do done.
c. The pattern where there is a single request object (e.g. indexedDB.request) prevents user code from having multiple outstanding requests against the same object (e.g. multiple ‘open' or multiple ‘openCursor' requests). An alternate pattern that does not have this problem would be to return the request object from the method (e.g. from ‘open').
d. CursorRequest.continue(): this seems to break the pattern where request.result has the result of the operation; for continue the operation (in the sync version) is true/false depending on whether the cursor reached EOF. So in async request.result should be the true/false value, the value itself would be available in the cursor's "value" property, and the success callback would be called instead of the error one.

11. API Names

a. "transaction" is really non-intuitive (particularly given the existence of currentTransaction in the same class). "beginTransaction" would capture semantics more accurately.
b. ObjectStoreSync.delete: delete is a Javascript keyword, can we use "remove" instead?

12. Object names in general

a. For database, store, index and other names in general, the current description in various places says "case sensitive". It would be good to be more specific and indicate "exact match" of all constructs (e.g. accents, kana width). Binary match would be very restrictive but a safe target. Alternatively we could just leave this up to each implementation, and indicate non-normatively what would be safe pattern of strings to use.

13. Editorial notes

a. Ranges: left-right versus start-end. "bound" versus "closed" for intervals.
b. Ranges: bound, "Create a new right-bound key range." -> right & left bound
c. 3.2.7 obejct -> object
d. The current draft fails to format in IE, the script that comes with the page fails with an error

Thanks
-pablo

Received on Tuesday, 26 January 2010 20:47:59 UTC