- From: Jim Davis <jdavis@parc.xerox.com>
- Date: Fri, 24 Jul 1998 11:14:12 PDT
- To: www-webdav-dasl@w3.org
The DASL simplesearch grammar, while supporting SQL fairly well, does a poor job of supporting full text search engines such as Verity, WAIS, SMART, or MG. For some such engines, a query is a document, or at least a lengthy portion of text, rather than a set of expressions on fields joined by Boolean values. For others the query is a small set of words, and the query may specify the maximum allowable distance between words in the target documents (e.g. within N words, in the same sentence, or in the same paragraph). The result is a set of documents ordered according to similarity to the query. Typically there is a cutoff in the number of documents returned, but in principle the similarity is computed for every document in the corpus. Usually a numeric score is returned for each document. Many of these systems also allow the client to specify choice of token processing (e.g. stemming), the matching rules (soundex, left or right truncation), and/or to influence the ranking by providing weights on terms used in the search. None of these are well supported in the DASL simplesearch grammar, and I don't think they should be. For one thing, there is no common practise to standardize on for queries that work on both boolean and full text engines. (STARTS is the best attempt so far.) Even if we succeeded in defining it, the result would not be a *simple* search grammar, and I think the likely outcome would be that typical implementations of DASL simplesearch would either support the boolean side well, or the fulltext side well, but not both. So in practice, a client would do query schema discovery to find out which kind worked, and once it does that, there's no real difference between doing QSD on one grammar, and grammar discovery (via OPTIONS) on the arbiter itself. In other words, rather than make a complicated simplesearch that can express both kinds of search, leave simplesearch alone, and define a fulltextsearch. This is not to say that there should be NO content search at all in DASL, to the contrary, there should, but it should be quite limited. It's really a call to begin thinking about defining a second grammar, which may or may not make it into the first DASL specification.
Received on Friday, 24 July 1998 14:14:27 UTC