RE: DASL Issues for Content Based Retrieval

I wasn't really advocating that position as my preferred 
solution. My intention in that part of that particular e-mail was to
illustrate some issues raised by full text retrieval,
not to actually make specific proposals for the solution.
My concern is that there is a lot there in content
based retrieval for our plate, but we only have a 
limited amount of time, and there are no real standards
to guide us. Deciding upon the scope
of the content based retrieval that problem we take 
on in 1.0 is going to be important.

Obviously, we could have multiple, simple, specialized 
operators, each of which took a string parameter, and did
something specific with it, e.g., case sensitive
comparison for exact match with optional wild cards,
case insensitive comparison for exact match
with optional wild cards, contains the words in the string
parameter and stemming is applied, contains the words in the
string near each other, etc. Then we wouldn't have to do
anything special for grouping -- the XML starting and ending 
tags do grouping for us automatically. And for Boolean
combinations of the full text operators, again we wouldn't
have to do anything special -- we would just use
AND, OR, and NOT. 

I didn't want to get into a specific proposal -- I just wanted to
illustrate the types of functionality we have to consider, so
we could start the scoping effort.
I thought that if I advocated the multiple simple operator approach,
people
would focus on that as a specific proposal instead of thinking
about the issues of what functionality we want. Well, I guess at 
least some people focused on that as a specific proposal anyway, 
so it seems I can't win.

So, let me go on record as saying that if I make
a specific proposal (as opposed to raising issues,
discussing issues, asking questions, or whatever),
I will use the word "propose", "proposal", or "proposing".
If I don't, then I'm not making a specific proposal.

Now, to actually answer your question, I would say that an 
advantage of doing it with a second level of syntax inside the string
parameter to the "contains" operator makes the query
representation more compact.

In contrast, an advantage of having multiple simple operators 
is that you can maintain the very powerful invariant condition 
I am proposing, i.e., that a DASL query is valid if and only if
it is type safe. The syntax-within-the-string approach does
not have this advantage. Another potential advantage of the
multiple simple operators approach is that it may be
possible to "prune" them out using three valued elimination
when sending the query to a collection that doesn't
support all the content based retrieval operators.

And to answer the next question you would probably
ask me (i.e., "which method do I prefer?"), as I write this e-mail,
I am forced to think on it a bit. Right now I am leaning
toward having multiple simple operators, instead of
an embedded string syntax. 

I'm not very concerned about the compactness of the 
query. Text compression is available,
and, no matter what, on the average, the size of the
answer set will dwarf the size of the query. Furthermore, if
content is retrieved, that could dwarf the size of the
answer set. So, if you can afford the size of the answer,
you can afford the size of the query.

The ability to use three valued elimination is also
a powerful argument in favor of separate operators.

And maintaining the powerful invariant
condition "validity if and only if type safety" helps give us a
rock solid base architecture to proceed into the future
with, and that will mesh naturally with the XML data
types proposal as well as future additions of query operators
and other query functionality.
That would not be possible with an embedded string syntax:
A second level of syntax inside a string parameter would
not be descriabable in XML in the DTD, and that, I think,
would be a mistake.

I hope this explanation answers your questions.

Alan Babich
> -----Original Message-----
> From:	Jim Davis [SMTP:jdavis@parc.xerox.com]
> Sent:	May 06, 1998 7:19 PM
> To:	www-webdav-dasl@w3.org
> Subject:	Re: DASL Issues for Content Based Retrieval
> 
> At 08:00 PM 5/4/98 PDT, Babich, Alan wrote:
> >One obvious approach to use is to have the "contains" operator take a
> string parameter. The string parameter would use a syntax that
> specifies
> the search functionality.
> 
> I am puzzled by your advocacy for this position, since you've just
> finished
> explaining in eloquent language the advantages of using a parse tree
> for
> the representation of the boolean (or property based) portion of the
> query.
>   What advantage do you see in using string encoding for the full text
> portion?

Received on Thursday, 7 May 1998 18:10:19 UTC