- From: Edward C. Zimmermann <edz@elmyra.bsn.com>
- Date: Wed, 25 Feb 2004 09:19:20 +0100 (MET)
- To: ga11@cs.waikato.ac.nz
- Cc: www-webdav-dasl@w3.org
> >Hi -- > >let me start with some background on what I'm interested in. I am hoping >for comments on whether my interest might have some merit in the context >of DASL. General comments also much appreciated :) > >I am considering research into a "version-aware inverted-file full-text >indexing algorithm". The result would be an index optimized for searching A few points: - Given the current rate of storage cost versus I/O and processing costs, index size is no longer the issue.it was a decade ago. While collections of many GBs and millions of records (beyond the task of crawling and replicating "Internet Web pages") are not pedestrian, sufficient storage is available on nearly any personal computer sold at the supermarkets today (these I think are typically no less than 80 GB this week). The limiting factor continues to be I/O and especially the capacities of 32-bit kernel memory management of the operating systems that still are dominant. - small embeded systems might not have storage but I have trouble thinking of a S/R application on an embeded system that would demand more than what these things already seem to have--- we might not be able to fit all the NIH human genome records or USPTO's patents but I don't see why one would need to (and for these we have more conventional computers). - Inverted-file algorithms are not terribly good at handling large amounts of data.. and more importantly handling fields and structure. If you are interested in indexing only context diffs but in searching the entire of the rendered document one would need an additional kind of diff between that document and a fixed reference point and search via de-referencing to a complex document that would contain both "fragements". During presentation one would be to then reconstruct the document. This strikes me as more costly than just indexing the documents and handling the versioning on another layer. Our "fulltext engine", in fact, contains a versioning facility. We tend to use very straightforward approaches such as handling all the versions as fully rendered documents but with the system aware of the versioning and what to do-- this tends to depend upon what we need or want to do (application/project specific). ______________________ Edward C. Zimmermann, Basis Systeme netzwerk, Munich <A HREF="http://www.stadtplandienst.de/query;ORT=m;PLZ=80802;STR=Leopoldstr%2E;HNR= 53;GR=2;PRINTER_FRIENDLY=TRUE">Leopoldstrasse 53-55, D-80802 Munich, Federal Republic of Germany</A> Telephone: Voice:= +49 (89) 385-47074 Fax:= +49 (89) 692-8150 Cellular:= +49 (179) 205-0539
Received on Wednesday, 25 February 2004 03:19:39 UTC