- From: <ga11@cs.waikato.ac.nz>
- Date: Wed, 25 Feb 2004 18:24:02 +1300 (NZDT)
- To: www-webdav-dasl@w3.org
Hi -- let me start with some background on what I'm interested in. I am hoping for comments on whether my interest might have some merit in the context of DASL. General comments also much appreciated :) I am considering research into a "version-aware inverted-file full-text indexing algorithm". The result would be an index optimized for searching a collection of versioned documents. By 'optimized' I mean (1) taking advantage of similarities across revisions to reduce index size, and (2) improving search speed for large collections. For a document that is present in 10 revisions, a "traditional" full-text index (as for example in Lucene or Greenstone) would index 10 documents, and add to each some versioning metadata. A version-aware approach would be based on the premise that there is one 'base' document, and a delta for each revision. If the revision becomes vastly different from the original, a new base can be established. In terms of DASL, this may be of use to implementations for full-text content search. I would expect an inverted index to perform better than the database-driven approach discussed in the paper on Catacomb, if I understand their approach correctly. It would also support features such as proximity matching. I would welcome feedback as to the perceived merits or non-merits of this idea in terms of DASL implementations. Is version-aware searching a feature in demand? I haven't seen much of it in mainstream applications. cheers Gerret
Received on Wednesday, 25 February 2004 00:24:04 UTC