Re: WebSimpleDB Issues from Nikunj R. Mehta on 2009-12-11 (public-webapps@w3.org from October to December 2009)

From: Nikunj R. Mehta <nikunj.mehta@oracle.com>
Date: Fri, 11 Dec 2009 10:45:55 -0800
To: Kris Zyp <kris@sitepen.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-Id: <E00B33BF-BB0C-434E-8DC1-97073EE0BD7B@oracle.com>
Hi Kris,

Sorry for taking so long to get back on these issues.

On Dec 1, 2009, at 10:33 PM, Kris Zyp wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I had few thoughts/questions/issues with the WebSimpleDB proposal:
>
> * No O(log n) access to position/counts in index sequences - If you
> want find all the entities that have a price less than 10, it is quite
> easy (assuming there is an index on that property) with the
> WebSimpleDB to access the price index, iterate through the index until
> you hit 10 (or vice versa). However, if this is a large set of data,
> the vast majority of applications I have built involve providing a
> subset or page of data at a time to keep constant or logarithmic time
> access to data, *and* an indication of how many rows/items/entities
> are available. Limiting a "query" to a certain number of items is easy
> enough with WebSimple, you just only iterate so far, but determining
> how many items are less than 10 is no longer a logarithmic complexity
> problem, but is linear with the number of items that are less 10,
> because you have to iterate over all of them to count them. If a
> cursor were to indicate the numeric position within an index (even if
> it was an estimate, without transactional strictness), one could
> easily determine the count of these type of queries in O(log n) time.
> This would presumably entail B-tree nodes keeping track of their
> number of children.

There are definitely important implementation challenges with keep  
track of the number of children for each node. I have added a  
mechanism to the API but am willing to be flexible based on  
implementation feedback. This mechanism is a count attribute on the  
CursorSync interface. It will allow you to find out the number of  
records corresponding to the current position of a cursor. Please take  
a look.

>
> * Asynchronicity is not well-aligned with the expensive operations -
> The asynchronous actions are starting and committing transactions. It
> makes sense that committing transactions would be expensive, but why
> do we make the start of a transaction asynchronous? Is there an
> expectation that the a global lock will be sought on the entire DB
> when the transaction is started? That certainly doesn't seem
> desirable. Normally a DB would create locks as data is accessed,
> correct? If anything a "get" operation would be more costly than
> starting a transaction.

There are three transaction behaviors that are possible - a  
dynamically scoped transaction (like what you are suggesting), a  
statically scoped transaction that includes the entire database and a  
statically scoped transaction that includes only a subset of the  
database objects. Obviously in the static scope case, we would like to  
reserve the objects prior to beginning the transaction. That is why  
the call should block (in the sync case). We are providing the  
statically scoped transaction to support the use case for deadlock- 
free operation (even though it may cause timeouts).

>
> * Hanging EntityStores off of transactions creates unnecessary
> complexity in referencing stores - A typical pattern in applications
> is to provide a reference to a store to a widget that will use it.
> However, with the WebSimpleDB, you can't really just hand off a
> reference to an EntityStore, since each store object is
> transaction-specific. You would either need to pass the name of store
> to a widget, and have it generate its own transaction to get a store
> (which seems like very bad form from object capability perspective),
> or pass in a store for every action, which may not be viable in many
> situations.

This has changed just prior to TPAC (early November). I encourage you  
to take a look at the current API, which has taken in to all the  
feedback I have been getting since then.

>
> Would it be reasonable (based on the last two points) to have
> getEntityStore be a method on database objects, rather than
> transaction objects? Actions would just take place in the current
> transaction for the context. With the single-threaded nature of JS
> contexts, having a single active transaction at a time doesn't not
> seem like a hardship, but rather makes things a lot simpler to work
> with. Also, if an impl wanted to support auto-commit, it would be very
> intuitive, devs just wouldn't be required to start a transaction prior
> performing actions on a store.
> Thanks,
>

I greatly appreciate your interest and detailed feedback.

Nikunj
http://o-micron.blogspot.com
Received on Friday, 11 December 2009 18:47:41 UTC