- From: Pat Case <PCASE@crs.loc.gov>
- Date: Mon, 24 Mar 2003 09:27:14 -0500
- To: <kai.grossjohann@uni-duisburg.de>, <public-qt-comments@w3.org>
- Cc: "<" <member-query-fttf@w3.org>
Kai, These are all personal responses. I can't speak for Working Group. See inline. Pat Case, Librarian, LIS Interface Team Congressional Research Service Library of Congress 101 Independence Ave., SE, LM-223 Washington, DC 20540-7000 202-707-9104 202-252-3370 (Fax) pcase@crs.loc.gov >>> Kai Großjohann <kai.grossjohann@uni-duisburg.de> 03/22/03 04:18PM >>> I have read the FTS requirements document and the use cases (http://www.w3.org/TR/xmlquery-full-text-requirements/ and http://www.w3.org/TR/xmlquery-full-text-use-cases/), and would like to make some comments. First of all, I'm happy that work is proceeding in the general direction of providing more IR functionality in XQuery. It is dear to my heart :-) I have two comments: * Application of SCORE to non-text conditions. I believe that vagueness and uncertainty, the central issues of Information Retrieval, are vital features for systems even outside the domain of full text. Consider the infamous used-car database example: say the user searches for a white Lincoln Continental from 2001 with a given mileage (is that the right word? number of miles run by that car is what I mean) and price. There are no full text conditions in this example. Yet, what happens if there are no cars fulfilling the exact condition but only cars that are "close matches"? One approach would be to interpret the conditions vaguely. Another approach requires the user to specify another query to find those "close matches". However, the latter approach requires the user to know which of the query conditions to relax to find that close match, and thus requires knowledge of the contents of the database. Surely this is not desirable: if the user knew what's in the database, why search? Therefore, the "best match" approach is important also for non-text conditions. In the requirements document it says that the SCORE language should be either equal to the FTS language or a superset thereof. I couldn't find a use case where a vague interpretation is given to a non-text condition. I'm saying that it should be possible for the user to specify a vague interpretation for *every* query condition: wherever XQuery allows strict equality, allow vague equality, too. Wherever XQuery allows strict less-than, allow vague less-than, too. Wherever XQuery allows Boolean and, allow a vague version, too. And so on. (XQuery does not seem to allow vague conditions on the XML structure, either...) [Pat Case: I too would like to see score applicable to all XQuery.] * Higher-level, semantic, search predicates. The use cases document talks a lot about proximity search and that the user should be able to specify various special cases: word order required or not required, number of stopwords or non-stopwords allowed between the matching terms, whether or not an element boundary is allowed, and other things. I think that the user really wishes to do phrase search. All the above specifications are just (poor) approximations on that goal. I don't think that the user wishes to think about the word order or the number of intervening stopwords that are allowed. The user just wants to search for "information retrieval" and find "... retrieval of information ..." but not "... retrieval. Information about...". The situation is similar to stemming: in the old days the systems had wildcards, and then it was up to the user to emulate stemming with wildcards. Now the FTS use cases talk about stemming, carefully sidestepping the problem of actual implementation. In the same vein, I suggest to talk about phrase search, and leave the implementation up to the, err, implementors. (Actually, you offer wildcards in addition to stemming, so I guess it's okay to offer proximity search in addition to phrase search. But phrase search is more important than proximity search IMHO.) (I think the use cases document uses "phrase" to describe a sequence of words. I use "phrase" in a linguistic sense of, say, a noun phrase.) [Pat Case: Please remember that a phrase query is a proximity query (ordered, allowing no intervening words). Also remember we are defining the functionalities which will be available for implementors. We don't expect most end users to define the parameters for a proximity query, but we do expect them to profit from proximity querying. We expect implementors to build GUIs which utilize proximity queries. For example, a system may take any search terms in the Words search box and return them in any order within 9 words of each other, then offer a More button which might use an "and" operator. Or the implementors might build queries under buttons or links. The functionality has to be there so we can develop GUIs for end users. And yes, I do want the functionalities to surface for the small number of expert users who can use them. I do not emphasize phrase querying because I think it is as dangerous as "or" querying is useless. I advise end users to do wider unordered proximity queries instead. In a system which supports phrase query I would build a More button that runs a wider unordered proximity query to pick up the missed results. My favorite example is in the internal system I work on for congressional documents. Folks search on "elementary education" and find very little. It is a reasonable query but it fails because congressional bills almost exclusively carry the phrase "elementary and secondary education". Allow a few intervening characters and hundreds of bills are returned.] Kai -- A preposition is not a good thing to end a sentence with.
Received on Monday, 24 March 2003 09:30:11 UTC