- From: Edward C. Zimmermann <edz@elmyra.bsn.com>
- Date: Wed, 25 Feb 2004 09:19:20 +0100 (MET)
- To: ga11@cs.waikato.ac.nz
- Cc: www-webdav-dasl@w3.org
>
>Hi --
>
>let me start with some background on what I'm interested in. I am hoping
>for comments on whether my interest might have some merit in the context
>of DASL. General comments also much appreciated :)
>
>I am considering research into a "version-aware inverted-file full-text
>indexing algorithm". The result would be an index optimized for searching
A few points:
- Given the current rate of storage cost versus I/O and processing costs, index
size is no longer the issue.it was a decade ago. While collections of many
GBs and millions of records (beyond the task of crawling and replicating
"Internet Web pages") are not pedestrian, sufficient storage is available on
nearly any personal computer sold at the supermarkets today (these I think
are typically no less than 80 GB this week).
The limiting factor continues to be I/O and especially the capacities of
32-bit kernel memory management of the operating systems that still are
dominant.
- small embeded systems might not have storage but I have trouble thinking
of a S/R application on an embeded system that would demand more than what
these things already seem to have--- we might not be able to fit all the
NIH human genome records or USPTO's patents but I don't see why one would
need to (and for these we have more conventional computers).
- Inverted-file algorithms are not terribly good at handling large amounts
of data.. and more importantly handling fields and structure.
If you are interested in indexing only context diffs but in searching the
entire of the rendered document one would need an additional kind of
diff between that document and a fixed reference point and search via
de-referencing to a complex document that would contain both "fragements".
During presentation one would be to then reconstruct the document. This
strikes me as more costly than just indexing the documents and handling the
versioning on another layer.
Our "fulltext engine", in fact, contains a versioning facility. We tend to
use very straightforward approaches such as handling all the versions as
fully rendered documents but with the system aware of the versioning and
what to do-- this tends to depend upon what we need or want to do
(application/project specific).
______________________
Edward C. Zimmermann, Basis Systeme netzwerk, Munich
<A
HREF="http://www.stadtplandienst.de/query;ORT=m;PLZ=80802;STR=Leopoldstr%2E;HNR=
53;GR=2;PRINTER_FRIENDLY=TRUE">Leopoldstrasse 53-55, D-80802 Munich, Federal
Republic of Germany</A>
Telephone: Voice:= +49 (89) 385-47074 Fax:= +49 (89) 692-8150
Cellular:= +49 (179) 205-0539
Received on Wednesday, 25 February 2004 03:19:39 UTC