Re: [IndexedDB] Posting lists/inverted indexes

Some quick comments.  I'll point some of our FTS experts at this as well.

On Thu, Jun 17, 2010 at 7:56 AM, Nikunj Mehta <nikunj@o-micron.com> wrote:

> I would like to confirm the requirements for posting list and inverted
> index support in IndexedDB. To that extent, here is a short list ordered by
> importance. Please let me know if I have missed anything important.
>
> 1. Store sorted runs of terms and their occurrences in documents along with
> a payload.
>   a. Each occurrence is identified as some numeric value.
>   b. The payload is an opaque string value.
> 2. Look up a term to obtain its occurrences.
>   a. Look up produces a cursor, each value of which is the document ID
> where the term occurs and the corresponding payload
>   b. Full power of cursors as available in IndexedDB is present, i.e.,
> KeyRange and direction.
> 3. An inverted index could be linked to an object store, in which case, it
> is possible to look up objects using the inverted index.
> 4. When an object is removed from the object store linked to an inverted
> index, no automatic change management applies to inverted index. In other
> words, the inverted index is application managed.
>

I'm still not sure I agree with application managed Indexes in the spec at
all (see other threads).


> 5. Find co-occurrence of terms.
>   a. This would bring back the join feature that was present in earlier
> versions of the spec [1], although in a different API form than earlier.
>

Would it be practical to use inverted indexes without a join feature?  We
should probably try to be consistent


> 6. Store lexicon for IDF-type statistics
>   a. term-level statistics
>
> I am not sure if there is any point in specifying performance and
> efficiency goals in the spec.
>

Agreed.


>
> Nikunj
>
> [1] http://www.w3.org/TR/2009/WD-WebSimpleDB-20090929/#entity-join
>

Received on Thursday, 17 June 2010 16:50:28 UTC