- From: Babich, Alan <ABabich@filenet.com>
- Date: Thu, 7 May 1998 15:07:57 -0700
- To: "'Jim Davis'" <jdavis@parc.xerox.com>, www-webdav-dasl@w3.org
I wasn't really advocating that position as my preferred solution. My intention in that part of that particular e-mail was to illustrate some issues raised by full text retrieval, not to actually make specific proposals for the solution. My concern is that there is a lot there in content based retrieval for our plate, but we only have a limited amount of time, and there are no real standards to guide us. Deciding upon the scope of the content based retrieval that problem we take on in 1.0 is going to be important. Obviously, we could have multiple, simple, specialized operators, each of which took a string parameter, and did something specific with it, e.g., case sensitive comparison for exact match with optional wild cards, case insensitive comparison for exact match with optional wild cards, contains the words in the string parameter and stemming is applied, contains the words in the string near each other, etc. Then we wouldn't have to do anything special for grouping -- the XML starting and ending tags do grouping for us automatically. And for Boolean combinations of the full text operators, again we wouldn't have to do anything special -- we would just use AND, OR, and NOT. I didn't want to get into a specific proposal -- I just wanted to illustrate the types of functionality we have to consider, so we could start the scoping effort. I thought that if I advocated the multiple simple operator approach, people would focus on that as a specific proposal instead of thinking about the issues of what functionality we want. Well, I guess at least some people focused on that as a specific proposal anyway, so it seems I can't win. So, let me go on record as saying that if I make a specific proposal (as opposed to raising issues, discussing issues, asking questions, or whatever), I will use the word "propose", "proposal", or "proposing". If I don't, then I'm not making a specific proposal. Now, to actually answer your question, I would say that an advantage of doing it with a second level of syntax inside the string parameter to the "contains" operator makes the query representation more compact. In contrast, an advantage of having multiple simple operators is that you can maintain the very powerful invariant condition I am proposing, i.e., that a DASL query is valid if and only if it is type safe. The syntax-within-the-string approach does not have this advantage. Another potential advantage of the multiple simple operators approach is that it may be possible to "prune" them out using three valued elimination when sending the query to a collection that doesn't support all the content based retrieval operators. And to answer the next question you would probably ask me (i.e., "which method do I prefer?"), as I write this e-mail, I am forced to think on it a bit. Right now I am leaning toward having multiple simple operators, instead of an embedded string syntax. I'm not very concerned about the compactness of the query. Text compression is available, and, no matter what, on the average, the size of the answer set will dwarf the size of the query. Furthermore, if content is retrieved, that could dwarf the size of the answer set. So, if you can afford the size of the answer, you can afford the size of the query. The ability to use three valued elimination is also a powerful argument in favor of separate operators. And maintaining the powerful invariant condition "validity if and only if type safety" helps give us a rock solid base architecture to proceed into the future with, and that will mesh naturally with the XML data types proposal as well as future additions of query operators and other query functionality. That would not be possible with an embedded string syntax: A second level of syntax inside a string parameter would not be descriabable in XML in the DTD, and that, I think, would be a mistake. I hope this explanation answers your questions. Alan Babich > -----Original Message----- > From: Jim Davis [SMTP:jdavis@parc.xerox.com] > Sent: May 06, 1998 7:19 PM > To: www-webdav-dasl@w3.org > Subject: Re: DASL Issues for Content Based Retrieval > > At 08:00 PM 5/4/98 PDT, Babich, Alan wrote: > >One obvious approach to use is to have the "contains" operator take a > string parameter. The string parameter would use a syntax that > specifies > the search functionality. > > I am puzzled by your advocacy for this position, since you've just > finished > explaining in eloquent language the advantages of using a parse tree > for > the representation of the boolean (or property based) portion of the > query. > What advantage do you see in using string encoding for the full text > portion?
Received on Thursday, 7 May 1998 18:10:19 UTC