- From: Nikunj Mehta <nikunj@o-micron.com>
- Date: Tue, 26 Jan 2010 15:13:43 -0800
- To: Pablo Castro <Pablo.Castro@microsoft.com>
- Cc: public-webapps WG <public-webapps@w3.org>
Hi Pablo, Great work and excellent feedback. I will take a little bit of time to digest and respond. Nikunj On Jan 26, 2010, at 12:47 PM, Pablo Castro wrote: > These are notes that we collected both from reviewing the spec > (editor's draft up to Jan 24th) and from a prototype implementation > that we are working on. I didn't realize we had this many notes, > otherwise I would have been sending intermediate notes early. Will > do so next round. > > > 1. Keys and sorting > > a. 3.1.1: it would seem that having also date/time values as > keys would be important and it's a common sorting criteria (e.g. as > part of a composite primary key or in general as an index key). > b. 3.1.1: similarly, sorting on number in general (not just > integers/longs) would be important (e.g. price lists, scores, etc.) > c. 3.1.1: cross type sorting and sorting of long values are > clear. Sorting of strings however needs more elaboration. In > particular, which collation do we use? Does the user or developer > get to choose a collation? If we pick up a collation from the > environment (e.g. the OS), if the collation changes we'd have to re- > index all the databases. > d. 3.1.3: spec reads "…key path must be the name of an > enumerated property…"; how about composite keys (would make the > related APIs take a DOMString or DOMStringList) > > > 2. Values > > a. 3.1.2: isn't the requirement for "structured clones" too > much? It would mean implementations would have to be able to store > and retrieve File objects and such. Would it be more appropriate to > say it's just graphs of Javascript primitive objects/values (object, > string, number, date, arrays, null)? > > > 3. Object store > > a. 3.1.3: do we really need in-line + out-of-line keys? > Besides the concept-count increase, we wonder whether out-of-line > keys would cause trouble to generic libraries, as the values for the > keys wouldn't be part of the values iterated when doing a "foreach" > over the table. > b. Query processing libraries will need temporary stores, which > need temporary names. Should we introduce an API for the creation of > temporary stores with transaction lifetime and no name? > c. It would be nice to have an estimate row count on each > store. This comes at an implementation and runtime cost. Strong > opinions? Lacking everything else, this would be the only statistic > to base decisions on for a query processor. > d. The draft does not touch on how applications would do > optimistic concurrency. A common way of doing this is to use a > timestamp value that's automatically updated by the system every > time someone touches the row. While we don't feel it's a must have, > it certainly supports common scenarios. > > > 4. Indexes > > a. 3.1.4 mentions "auto-populated" indexes, but then there is > no mention of other types. We suggest that we remove this and in the > algorithms section describe side-effecting operations as always > updating the indexes as well. > b. If during insert/update the value of the key is not present > (i.e. undefined as opposite to null or a value), is that a failure, > does the row not get indexed, or is it indexed as null? Failure > would probably cause a lot of trouble to users; the other two have > correctness problems. An option is to index them as undefined, but > now we have undefined and null as indexable keys. We lean toward > this last option. > 5. Databases > a. Not being able to enumerate database gets in the way of > creating good tools and frameworks such as database explorers. What > was the motivation for this? Is it security related? > b. Clarification on transactions: all database operations that > affect the schema (create/remove store/index, setVersion, etc.) as > well as data modification operations are assumed to be auto-commit > by default, correct? Furthermore, all those operations (both schema > and data) can happen within a transaction, including mixing schema > and data changes. Does that line up with others' expectations? If so > we should find a spot to articulate this explicitly. > c. No way to delete a database? It would be reasonable for > applications to want to do that and let go of the user data (e.g. a > "forget me" feature in a web site) > 6. Transactions > a. While we understand the goal of simplifying developers' > life with an error-free transactional model, we're not sure if we're > making more harm by introducing more concepts into this space. > Wouldn't it be better to use regular transactions with a well-known > failure mode (e.g. either deadlocks or optimistic concurrency > failure on commit)? > b. If in auto-commit mode, if two cursors are opened at the same > time (e.g. to scan them in an interleaved way), are they in > independent transactions simultaneously active in the same connection? > > > 7. Algorithms > > a. 3.2.2: steps 4 and 5 are inverted in order. > b. 3.2.2: when there is a key generator and the store uses in- > line keys, should the generated key value be propagated to the > original object (in addition to the clone), such that both are in > sync after the put operation? > c. 3.2.3: step 2, probably editorial mistake? Wouldn't all > indexes have a key path? > d. 3.2.4.2: in our experiments writing application code, the > fact that this method throws an exception when an item is not found > is quite inconvenient. It would be much natural to just return > undefined, as this can be a primary code path (to not find > something) and not an exceptional situation. Same for 3.2.5, step 2 > and 3.2.6 step 2. > e. The algorithm to put a new object into a store currently > indicates that the key of the object should be returned. How about > other values that may be generated by the store? For example, if the > store generates timestamps (not currently in the draft, but may be > needed for optimistic concurrency control), how would be return > them? should we update the actual object that was passed as a > parameter with keys and other server-generated values? > > > 8. Performance and API style > > a. The async nature of the API makes regular scans very heavy > on callbacks (one per row plus completion/error callbacks). This > slows down scans a lot, so when doing a multiple scans (e.g. a > reasonably complicated query that has joins, sorts and filters) > performance will be bound by this even if everything else happens > really fast. It would be interesting to support a block-fetch mode > where the callback gets called for a number of buffered rows > (indicated when the scan is initiated) instead of being called for a > single row. This would be either a configuration option on > openCursor or a new method on the cursor for > > > 9. API > > a. DatabaseSync.createIndex: what's the default for the unique > argument? > b. DatabaseSync.createObjectStore: what's the default for > autoIncrement? > c. DatabaseSync.openObjectStore: what's the default for mode? > d. DatabaseSync.transaction: what's the units for the timeout > value? Seconds? Is there a value that means "infinite"? > e. ObjectStoreSync.get: see 7.d (return undefined instead of > throwing an exception) > f. ObjectStoreSync: what happens to the reference if the > underlying store is deleted through another connection? We propose > it's ok to alter underlying objects in general and "visible" objects > should be ready and start failing when the objects they surface go > away or are altered. > g. CursorSync.openCursor: does the cursor start on the first > record or before the first record? Should probably be before the > first record so the first call to continue() can return false for > empty stores, moving straight from BOF to EOF. > h. CursorSync.count: what scenario does this enable? Also, name > is misleading; should be sameKeyCount or something that indicates > it's the count only of the rows that share the current key. > i. CursorSync.value: when the cursor is over an index, > shouldn't the value be read-only as changing it would make it > inconsistent with the object store this index is for? > j. CursorSync.continue(): does it return false when it > reaches the last record or when it's called *on* the last record and > moves to EOF (effectively moved past the last record)? If it's > sitting in EOF, does it "see" new inserts? (we assume not) > k. CursorSync.delete(): "delete" causes trouble, should be > "remove" > l. CursorSync.delete(): what happens to the cursor position > after this function returns? One option would be to leave the cursor > on the deleted row, and fail all access attempts so only continue() > can be called. > m. IndexSync: the put/delete methods seem to enable users to > modify the index independently of the store, making them > inconsistent. Given that the only kind of index described is auto- > populated, it doesn't seem appropriate to have these. > n. Should we consider introducing an API that given an object and > a store returns the key to that object? that would avoid the need > for knowing the exact algorithm used to obtain the key from an > object + path. > > > 10. API (async specifics) > > a. Currently the async API is only available on the window > object and not to workers. Libraries are likely to target only one > mode, in particular async, to work across all scenarios. So it would > be important to have async also in workers. > b. DBRequest.abort(): it may not be possible to guarantee abort > in all phases of execution, so this should be described as a "best > effort" method; onsuccess would be called if the system decided to > proceed and complete the operation, and onerror if abort succeeded > at stopping the operation (with proper code indicating the error is > due to an explicit abort request). In any case ready state should go > do done. > c. The pattern where there is a single request object (e.g. > indexedDB.request) prevents user code from having multiple > outstanding requests against the same object (e.g. multiple ‘open' > or multiple ‘openCursor' requests). An alternate pattern that does > not have this problem would be to return the request object from the > method (e.g. from ‘open'). > d. CursorRequest.continue(): this seems to break the pattern > where request.result has the result of the operation; for continue > the operation (in the sync version) is true/false depending on > whether the cursor reached EOF. So in async request.result should be > the true/false value, the value itself would be available in the > cursor's "value" property, and the success callback would be called > instead of the error one. > > > 11. API Names > > a. "transaction" is really non-intuitive (particularly given > the existence of currentTransaction in the same class). > "beginTransaction" would capture semantics more accurately. > b. ObjectStoreSync.delete: delete is a Javascript keyword, can > we use "remove" instead? > > > 12. Object names in general > > a. For database, store, index and other names in general, the > current description in various places says "case sensitive". It > would be good to be more specific and indicate "exact match" of all > constructs (e.g. accents, kana width). Binary match would be very > restrictive but a safe target. Alternatively we could just leave > this up to each implementation, and indicate non-normatively what > would be safe pattern of strings to use. > > > 13. Editorial notes > > a. Ranges: left-right versus start-end. "bound" versus "closed" > for intervals. > b. Ranges: bound, "Create a new right-bound key range." -> > right & left bound > c. 3.2.7 obejct -> object > d. The current draft fails to format in IE, the script that > comes with the page fails with an error > > > Thanks > -pablo >
Received on Tuesday, 26 January 2010 23:15:02 UTC