- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Mon, 11 Aug 2003 09:35:43 +1000
- To: ZIG <www-zig@w3.org>
On Fri, Aug 08, 2003 at 02:08:43PM -0400, Ray Denenberg wrote: > I propose we define a new "type", along the lines that Alan was originally > suggesting. > > Let's start with two points we all agree upon: > 1. allWords, anyWord, adjacentWords need to be changed from structure to > comparison attributes. > 2. We need to distinguish between word and string indexes. I agree that allWords etc should not be in format/structure. I agree that one option is to make them comparison attributes, but I don't agree that this the only option. If adding new types etc is an option, then I think its worth standing back a little and working out the semantic model behind things, get the model right, then reapply it to the AA. My later proposals I purposely tried to keep things close to what they are now rather than pushing a more generic solution. Now that the dust has settled a little, I might try to describe the more generic solution again, but am happy if people dont want to go there. To me the Z39.50 model is such that you have records from which 'index terms' are extracted. You also have queries from which 'index terms' are extracted. A record matches if the index terms from the record match the index terms from the indexes. Scanning looks at index terms. (I am going to try to ignore the display terms concept if I can to simplify this discussion.) What then are the semantics of the different attribute types? I think its good to have orthogonal definitions for each attribute type. They should ideally work together with minimal or zero interdependence (otherwise they are not orthogonal). Please note! The following is my personal interpretation! Access point attributes define which subset of index terms from a record to check (title, author, etc). Comparison attributes define how to compare index terms (equal, greater than, etc). Expansion/interpretation defines tweaks to the comparisons. Eg: ignore case, stem, etc. (Not all combinations of comparison and expansion make sense - greater than stem?) Format/structure defines the structure of index terms. This is to ensure you compare apples to apples when doing a query. I personally think format/structure makes sense for words vs strings. It is similar to the Bib-2 format/structure values in that it defines the structure of index terms that you would get back if you scan an index. So I actually think word & string make complete sense in format/structure. I actually think that allWords, anyWords, adjacentWords are misnamed. I propose that these operators be able to deal with any repeating list of index terms that can be extracted from a record. That is, they should be called allTerms, anyTerms, adjacentTerms. They are semantically the same as AND, OR, and PROX operators. They make sense for use with a repeating complete author name field (but an implementation would have to define how different author names were separated in the query term - eg semicolons?). They make sense with a series of numbers, coordinates, or bounding boxes ("1,2 3,1 5,2"). They make sense with words. So if a new type was introduced, I think it makes more sense for that new type to deal with the behaviour to take when a query term contains multiple index terms. If I am searching a field containing coordinates and the query term is "1,2 3,1 5,2" then three index terms would be extracted "1,2" "3,1" and "5,2". Saying allTerms and anyTerms makes sense to me. Using PROX (adjacency) could be supported (or the server could return an error if it does not support that attribute). If I wanted to search as words, then I would the format/structure of my query term was a series of words. I think step 1 is to work out where string/word goes. Ray was saying looking at Bib-2 he thinks they should not be format/structure attributes. To me looking at Bib-2 I think they should be :-) Ray - feel like explaining in your own words the purpose of format/structure? I think it defines the format/structure of values extracted from both records and queries to ensure you comapre apples with apples. Bib-2 defines various forms for firstname/lastname, lastname/firstname, commas, etc. I would imagine scanning an index I should get values back in these formats. If a word or string index, I imagine getting terms back as words or complete values. But I may be missing something here. Orthogonality can be a useful metric to work out if a differen attribute type should be used. Can you use 'word' and 'string' in combination with the Bib-2 format/structure attributes? What would it mean? If there are sensible semantics using word & string in combination with more than one of the Bib-2 values, then that is a strong indication they should be separate attribute types. Regarding all/any/adj - I can live with whatever is proposed. However, I feel that all (query) terms, comparison=contained-within, "1,2,3,4 4,3,2,1" makes sense. How useful? Debatable - that is why I am not stressed about all/any/adj being comparison operators. They would then mean all-equal, any-equal, adj-equal. Alan
Received on Sunday, 10 August 2003 19:35:50 UTC