- From: Kai Großjohann <kai.grossjohann@uni-duisburg.de>
- Date: Sat, 22 Mar 2003 22:18:57 +0100
- To: public-qt-comments@w3.org
I have read the FTS requirements document and the use cases (http://www.w3.org/TR/xmlquery-full-text-requirements/ and http://www.w3.org/TR/xmlquery-full-text-use-cases/), and would like to make some comments. First of all, I'm happy that work is proceeding in the general direction of providing more IR functionality in XQuery. It is dear to my heart :-) I have two comments: * Application of SCORE to non-text conditions. I believe that vagueness and uncertainty, the central issues of Information Retrieval, are vital features for systems even outside the domain of full text. Consider the infamous used-car database example: say the user searches for a white Lincoln Continental from 2001 with a given mileage (is that the right word? number of miles run by that car is what I mean) and price. There are no full text conditions in this example. Yet, what happens if there are no cars fulfilling the exact condition but only cars that are "close matches"? One approach would be to interpret the conditions vaguely. Another approach requires the user to specify another query to find those "close matches". However, the latter approach requires the user to know which of the query conditions to relax to find that close match, and thus requires knowledge of the contents of the database. Surely this is not desirable: if the user knew what's in the database, why search? Therefore, the "best match" approach is important also for non-text conditions. In the requirements document it says that the SCORE language should be either equal to the FTS language or a superset thereof. I couldn't find a use case where a vague interpretation is given to a non-text condition. I'm saying that it should be possible for the user to specify a vague interpretation for *every* query condition: wherever XQuery allows strict equality, allow vague equality, too. Wherever XQuery allows strict less-than, allow vague less-than, too. Wherever XQuery allows Boolean and, allow a vague version, too. And so on. (XQuery does not seem to allow vague conditions on the XML structure, either...) * Higher-level, semantic, search predicates. The use cases document talks a lot about proximity search and that the user should be able to specify various special cases: word order required or not required, number of stopwords or non-stopwords allowed between the matching terms, whether or not an element boundary is allowed, and other things. I think that the user really wishes to do phrase search. All the above specifications are just (poor) approximations on that goal. I don't think that the user wishes to think about the word order or the number of intervening stopwords that are allowed. The user just wants to search for "information retrieval" and find "... retrieval of information ..." but not "... retrieval. Information about...". The situation is similar to stemming: in the old days the systems had wildcards, and then it was up to the user to emulate stemming with wildcards. Now the FTS use cases talk about stemming, carefully sidestepping the problem of actual implementation. In the same vein, I suggest to talk about phrase search, and leave the implementation up to the, err, implementors. (Actually, you offer wildcards in addition to stemming, so I guess it's okay to offer proximity search in addition to phrase search. But phrase search is more important than proxmity search IMHO.) (I think the use cases document uses "phrase" to describe a sequence of words. I use "phrase" in a linguistic sense of, say, a noun phrase.) Kai -- A preposition is not a good thing to end a sentence with.
Received on Saturday, 22 March 2003 16:21:59 UTC