RE: FTS comments from Michael Rys on 2003-03-24 (public-qt-comments@w3.org from March 2003)

From: Michael Rys <mrys@microsoft.com>
Date: Mon, 24 Mar 2003 13:57:18 -0800
To: Kai Großjohann <kai.grossjohann@uni-duisburg.de>, <public-qt-comments@w3.org>
Message-ID: <5C39F806F9939046B4B1AFE652500A3A04E5D8A7@RED-MSG-10.redmond.corp.microsoft.com>
Dear Kai, thanks for your comments.

As others noted, my comments are my own and not the ones of the working group.

1. As I mentioned in a separate mail, the use cases should encompass functionality that the consensus in the WG considers to be important for discussion for the first version more than giving a repository of all possible use cases in the area of information retrieval.

2. Your use case of application of SCORE to non-text conditions is captured in the requirements document mainly for allowing researchers, vendors and at some point in time the WG to add such functionality. I personally consider that to be an important use case in the future, but do not see it as an important feature for this release and thus having it as part of the use cases seems a bit premature. 

Thanks and best regards
Michael
 
> I have read the FTS requirements document and the use cases
> (http://www.w3.org/TR/xmlquery-full-text-requirements/ and
> http://www.w3.org/TR/xmlquery-full-text-use-cases/), and would like
> to make some comments.
> 
> First of all, I'm happy that work is proceeding in the general
> direction of providing more IR functionality in XQuery.  It is dear
> to my heart :-)
> 
> I have two comments:
> 
> * Application of SCORE to non-text conditions.
> 
>   I believe that vagueness and uncertainty, the central issues of
>   Information Retrieval, are vital features for systems even outside
>   the domain of full text.  Consider the infamous used-car database
>   example: say the user searches for a white Lincoln Continental from
>   2001 with a given mileage (is that the right word? number of miles
>   run by that car is what I mean) and price.  There are no full text
>   conditions in this example.  Yet, what happens if there are no cars
>   fulfilling the exact condition but only cars that are "close
>   matches"?  One approach would be to interpret the conditions
>   vaguely.  Another approach requires the user to specify another
>   query to find those "close matches".  However, the latter approach
>   requires the user to know which of the query conditions to relax to
>   find that close match, and thus requires knowledge of the contents
>   of the database.  Surely this is not desirable: if the user knew
>   what's in the database, why search?
> 
>   Therefore, the "best match" approach is important also for non-text
>   conditions.
> 
>   In the requirements document it says that the SCORE language should
>   be either equal to the FTS language or a superset thereof.  I
>   couldn't find a use case where a vague interpretation is given to a
>   non-text condition.
> 
>   I'm saying that it should be possible for the user to specify a
>   vague interpretation for *every* query condition: wherever XQuery
>   allows strict equality, allow vague equality, too.  Wherever XQuery
>   allows strict less-than, allow vague less-than, too.  Wherever
>   XQuery allows Boolean and, allow a vague version, too.  And so on.
> 
>   (XQuery does not seem to allow vague conditions on the XML
>   structure, either...)
> 
> 
> * Higher-level, semantic, search predicates.
> 
>   The use cases document talks a lot about proximity search and that
>   the user should be able to specify various special cases: word
>   order required or not required, number of stopwords or
>   non-stopwords allowed between the matching terms, whether or not
>   an element boundary is allowed, and other things.
> 
>   I think that the user really wishes to do phrase search.
> 
>   All the above specifications are just (poor) approximations on that
>   goal.  I don't think that the user wishes to think about the word
>   order or the number of intervening stopwords that are allowed.  The
>   user just wants to search for "information retrieval" and find
>   "... retrieval of information ..." but not "... retrieval.
>   Information about...".
> 
>   The situation is similar to stemming: in the old days the systems
>   had wildcards, and then it was up to the user to emulate stemming
>   with wildcards.  Now the FTS use cases talk about stemming,
>   carefully sidestepping the problem of actual implementation.
> 
>   In the same vein, I suggest to talk about phrase search, and leave
>   the implementation up to the, err, implementors.  (Actually, you
>   offer wildcards in addition to stemming, so I guess it's okay to
>   offer proximity search in addition to phrase search.  But phrase
>   search is more important than proxmity search IMHO.)
> 
>   (I think the use cases document uses "phrase" to describe a
>   sequence of words.  I use "phrase" in a linguistic sense of, say, a
>   noun phrase.)
> 
> 
> Kai
> --
> A preposition is not a good thing to end a sentence with.
Received on Monday, 24 March 2003 16:57:12 UTC