Re: [IndexedDB] Detailed comments for the current draft

Hi Pablo,

Great work and excellent feedback. I will take a little bit of time to  
digest and respond.

On Jan 26, 2010, at 12:47 PM, Pablo Castro wrote:

> These are notes that we collected both from reviewing the spec  
> (editor's draft up to Jan 24th) and from a prototype implementation  
> that we are working on. I didn't realize we had this many notes,  
> otherwise I would have been sending intermediate notes early. Will  
> do so next round.
> 1. Keys and sorting
> a.       3.1.1:  it would seem that having also date/time values as  
> keys would be important and it's a common sorting criteria (e.g. as  
> part of a composite primary key or in general as an index key).
> b.      3.1.1: similarly, sorting on number in general (not just  
> integers/longs) would be important (e.g. price lists, scores, etc.)
> c.       3.1.1: cross type sorting and sorting of long values are  
> clear. Sorting of strings however needs more elaboration. In  
> particular, which collation do we use? Does the user or developer  
> get to choose a collation? If we pick up a collation from the  
> environment (e.g. the OS), if the collation changes we'd have to re- 
> index all the databases.
> d.      3.1.3: spec reads "…key path must be the name of an  
> enumerated property…"; how about composite keys (would make the  
> related APIs take a DOMString or DOMStringList)
> 2. Values
> a.       3.1.2: isn't the requirement for "structured clones" too  
> much? It would mean implementations would have to be able to store  
> and retrieve File objects and such. Would it be more appropriate to  
> say it's just graphs of Javascript primitive objects/values (object,  
> string, number, date, arrays, null)?
> 3. Object store
> a.       3.1.3: do we really need in-line + out-of-line keys?  
> Besides the concept-count increase, we wonder whether out-of-line  
> keys would cause trouble to generic libraries, as the values for the  
> keys wouldn't be part of the values iterated when doing a "foreach"  
> over the table.
> b.      Query processing libraries will need temporary stores, which  
> need temporary names. Should we introduce an API for the creation of  
> temporary stores with transaction lifetime and no name?
> c.      It would be nice to have an estimate row count on each  
> store. This comes at an implementation and runtime cost. Strong  
> opinions? Lacking everything else, this would be the only statistic  
> to base decisions on for a query processor.
> d.      The draft does not touch on how applications would do  
> optimistic concurrency. A common way of doing this is to use a  
> timestamp value that's automatically updated by the system every  
> time someone touches the row. While we don't feel it's a must have,  
> it certainly supports common scenarios.
> 4. Indexes
> a.       3.1.4 mentions "auto-populated" indexes, but then there is  
> no mention of other types. We suggest that we remove this and in the  
> algorithms section describe side-effecting operations as always  
> updating the indexes as well.
> b.      If during insert/update the value of the key is not present  
> (i.e. undefined as opposite to null or a value), is that a failure,  
> does the row not get indexed, or is it indexed as null? Failure  
> would probably cause a lot of trouble to users; the other two have  
> correctness problems. An option is to index them as undefined, but  
> now we have undefined and null as indexable keys. We lean toward  
> this last option.
> 5.       Databases
> a.       Not being able to enumerate database gets in the way of  
> creating good tools and frameworks such as database explorers. What  
> was the motivation for this? Is it security related?
> b.      Clarification on transactions: all database operations that  
> affect the schema (create/remove store/index, setVersion, etc.) as  
> well as data modification operations are assumed to be auto-commit  
> by default, correct? Furthermore, all those operations (both schema  
> and data) can happen within a transaction, including mixing schema  
> and data changes. Does that line up with others' expectations? If so  
> we should find a spot to articulate this explicitly.
> c.       No way to delete a database? It would be reasonable for  
> applications to want to do that and let go of the user data (e.g. a  
> "forget me" feature in a web site)
> 6.       Transactions
> a.       While we understand the goal of simplifying developers'  
> life with an error-free transactional model, we're not sure if we're  
> making more harm by introducing more concepts into this space.  
> Wouldn't it be better to use regular transactions with a well-known  
> failure mode (e.g. either deadlocks or optimistic concurrency  
> failure on commit)?
> b.    If in auto-commit mode, if two cursors are opened at the same  
> time (e.g. to scan them in an interleaved way), are they in  
> independent transactions simultaneously active in the same connection?
> 7. Algorithms
> a.       3.2.2: steps 4 and 5 are inverted in order.
> b.      3.2.2: when there is a key generator and the store uses in- 
> line keys, should the generated key value be propagated to the  
> original object (in addition to the clone), such that both are in  
> sync after the put operation?
> c.       3.2.3: step 2, probably editorial mistake? Wouldn't all  
> indexes have a key path?
> d. in our experiments writing application code, the  
> fact that this method throws an exception when an item is not found  
> is quite inconvenient. It would be much natural to just return  
> undefined, as this can be a primary code path (to not find  
> something) and not an exceptional situation. Same for 3.2.5, step 2  
> and 3.2.6 step 2.
> e.      The algorithm to put a new object into a store currently  
> indicates that the key of the object should be returned. How about  
> other values that may be generated by the store? For example, if the  
> store generates timestamps (not currently in the draft, but may be  
> needed for optimistic concurrency control), how would be return  
> them? should we update the actual object that was passed as a  
> parameter with keys and other server-generated values?
> 8. Performance and API style
> a.       The async nature of the API makes regular scans very heavy  
> on callbacks (one per row plus completion/error callbacks). This  
> slows down scans a lot, so when doing a multiple scans (e.g. a  
> reasonably complicated query that has joins, sorts and filters)  
> performance will be bound by this even if everything else happens  
> really fast. It would be interesting to support a block-fetch mode  
> where the callback gets called for a number of buffered rows  
> (indicated when the scan is initiated) instead of being called for a  
> single row. This would be either a configuration option on  
> openCursor or a new method on the cursor for
> 9. API
> a.       DatabaseSync.createIndex: what's the default for the unique  
> argument?
> b.      DatabaseSync.createObjectStore: what's the default for  
> autoIncrement?
> c.       DatabaseSync.openObjectStore: what's the default for mode?
> d.      DatabaseSync.transaction: what's the units for the timeout  
> value? Seconds? Is there a value that means "infinite"?
> e.      ObjectStoreSync.get: see 7.d (return undefined instead of  
> throwing an exception)
> f.        ObjectStoreSync: what happens to the reference if the  
> underlying store is deleted through another connection? We propose  
> it's ok to alter underlying objects in general and "visible" objects  
> should be ready and start failing when the objects they surface go  
> away or are altered.
> g.       CursorSync.openCursor: does the cursor start on the first  
> record or before the first record? Should probably be before the  
> first record so the first call to continue() can return false for  
> empty stores, moving straight from BOF to EOF.
> h.      CursorSync.count: what scenario does this enable? Also, name  
> is misleading; should be sameKeyCount or something that indicates  
> it's the count only of the rows that share the current key.
> i.         CursorSync.value: when the cursor is over an index,  
> shouldn't the value be read-only as changing it would make it  
> inconsistent with the object store this index is for?
> j.        CursorSync.continue(): does it return false when it  
> reaches the last record or when it's called *on* the last record and  
> moves to EOF (effectively moved past the last record)? If it's  
> sitting in EOF, does it "see" new inserts? (we assume not)
> k.       CursorSync.delete(): "delete" causes trouble, should be  
> "remove"
> l.         CursorSync.delete(): what happens to the cursor position  
> after this function returns? One option would be to leave the cursor  
> on the deleted row, and fail all access attempts so only continue()  
> can be called.
> m.    IndexSync: the put/delete methods seem to enable users to  
> modify the index independently of the store, making them  
> inconsistent. Given that the only kind of index described is auto- 
> populated, it doesn't seem appropriate to have these.
> n.    Should we consider introducing an API that given an object and  
> a store returns the key to that object? that would avoid the need  
> for knowing the exact algorithm used to obtain the key from an  
> object + path.
> 10.       API (async specifics)
> a.       Currently the async API is only available on the window  
> object and not to workers. Libraries are likely to target only one  
> mode, in particular async, to work across all scenarios. So it would  
> be important to have async also in workers.
> b.      DBRequest.abort(): it may not be possible to guarantee abort  
> in all phases of execution, so this should be described as a "best  
> effort" method; onsuccess would be called if the system decided to  
> proceed and complete the operation, and onerror if abort succeeded  
> at stopping the operation (with proper code indicating the error is  
> due to an explicit abort request). In any case ready state should go  
> do done.
> c.       The pattern where there is a single request object (e.g.  
> indexedDB.request) prevents user code from having multiple  
> outstanding requests against the same object (e.g. multiple ‘open'  
> or multiple ‘openCursor' requests). An alternate pattern that does  
> not have this problem would be to return the request object from the  
> method (e.g. from ‘open').
> d.      CursorRequest.continue(): this seems to break the pattern  
> where request.result has the result of the operation; for continue  
> the operation (in the sync version) is true/false depending on  
> whether the cursor reached EOF. So in async request.result should be  
> the true/false value, the value itself would be available in the  
> cursor's "value" property,  and the success callback would be called  
> instead of the error one.
> 11. API Names
> a.       "transaction" is really non-intuitive (particularly given  
> the existence of currentTransaction in the same class).  
> "beginTransaction" would capture semantics more accurately.
> b.      ObjectStoreSync.delete: delete is a Javascript keyword, can  
> we use "remove" instead?
> 12. Object names in general
> a.       For database, store, index and other names in general, the  
> current description in various places says "case sensitive". It  
> would be good to be more specific and indicate "exact match" of all  
> constructs (e.g. accents, kana width). Binary match would be very  
> restrictive but a safe target. Alternatively we could just leave  
> this up to each implementation, and indicate non-normatively what  
> would be safe pattern of strings to use.
> 13. Editorial notes
> a.      Ranges: left-right versus start-end. "bound" versus "closed"  
> for intervals.
> b.      Ranges: bound, "Create a new right-bound key range." ->  
> right & left bound
> c.       3.2.7 obejct -> object
> d.      The current draft fails to format in IE, the script that  
> comes with the page fails with an error
> Thanks
> -pablo

Received on Tuesday, 26 January 2010 23:15:02 UTC