Re: Attribute Architecture -- new type? from Alan Kent on 2003-08-10 (www-zig@w3.org from August 2003)

From: Alan Kent <ajk@mds.rmit.edu.au>
Date: Mon, 11 Aug 2003 09:35:43 +1000
To: ZIG <www-zig@w3.org>
Message-ID: <20030810233542.GD28124@io.mds.rmit.edu.au>
On Fri, Aug 08, 2003 at 02:08:43PM -0400, Ray Denenberg wrote:
>  I  propose we define a new "type", along the lines that Alan was originally
> suggesting.
> 
> Let's start with two points we all agree upon:
> 1. allWords, anyWord, adjacentWords need to be changed from structure to
> comparison attributes.
> 2.  We need to distinguish between word and string indexes.

I agree that allWords etc should not be in format/structure.
I agree that one option is to make them comparison attributes, but
I don't agree that this the only option.

If adding new types etc is an option, then I think its worth standing
back a little and working out the semantic model behind things, get the
model right, then reapply it to the AA. My later proposals I purposely
tried to keep things close to what they are now rather than pushing a
more generic solution. Now that the dust has settled a little, I might
try to describe the more generic solution again, but am happy if people
dont want to go there.


To me the Z39.50 model is such that you have records from which 'index
terms' are extracted. You also have queries from which 'index terms'
are extracted. A record matches if the index terms from the record
match the index terms from the indexes.

Scanning looks at index terms. (I am going to try to ignore the
display terms concept if I can to simplify this discussion.)

What then are the semantics of the different attribute types?
I think its good to have orthogonal definitions for each attribute
type. They should ideally work together with minimal or zero
interdependence (otherwise they are not orthogonal).

Please note! The following is my personal interpretation!

Access point attributes define which subset of index terms from a
record to check (title, author, etc).

Comparison attributes define how to compare index terms (equal,
greater than, etc).

Expansion/interpretation defines tweaks to the comparisons. Eg: ignore
case, stem, etc. (Not all combinations of comparison and expansion
make sense - greater than stem?)

Format/structure defines the structure of index terms. This is to
ensure you compare apples to apples when doing a query.


I personally think format/structure makes sense for words vs strings.
It is similar to the Bib-2 format/structure values in that it defines
the structure of index terms that you would get back if you scan an
index. So I actually think word & string make complete sense in
format/structure.


I actually think that allWords, anyWords, adjacentWords are misnamed.
I propose that these operators be able to deal with any repeating list
of index terms that can be extracted from a record. That is, they should
be called allTerms, anyTerms, adjacentTerms. They are semantically the
same as AND, OR, and PROX operators. They make sense for use with
a repeating complete author name field (but an implementation would have
to define how different author names were separated in the query term -
eg semicolons?). They make sense with a series of numbers, coordinates,
or bounding boxes ("1,2 3,1 5,2"). They make sense with words.

So if a new type was introduced, I think it makes more sense for that
new type to deal with the behaviour to take when a query term contains
multiple index terms. If I am searching a field containing coordinates
and the query term is "1,2 3,1 5,2" then three index terms would be
extracted "1,2" "3,1" and "5,2". Saying allTerms and anyTerms makes
sense to me. Using PROX (adjacency) could be supported (or the server
could return an error if it does not support that attribute).
If I wanted to search as words, then I would the format/structure of
my query term was a series of words.


I think step 1 is to work out where string/word goes. Ray was saying
looking at Bib-2 he thinks they should not be format/structure attributes.
To me looking at Bib-2 I think they should be :-)

Ray - feel like explaining in your own words the purpose of format/structure?
I think it defines the format/structure of values extracted from both
records and queries to ensure you comapre apples with apples. Bib-2
defines various forms for firstname/lastname, lastname/firstname, commas,
etc. I would imagine scanning an index I should get values back in
these formats. If a word or string index, I imagine getting terms
back as words or complete values. But I may be missing something here.

Orthogonality can be a useful metric to work out if a differen attribute
type should be used. Can you use 'word' and 'string' in combination with
the Bib-2 format/structure attributes? What would it mean? If there are
sensible semantics using word & string in combination with more than one
of the Bib-2 values, then that is a strong indication they should be 
separate attribute types.


Regarding all/any/adj - I can live with whatever is proposed. However,
I feel that

    all (query) terms, comparison=contained-within, "1,2,3,4 4,3,2,1"

makes sense. How useful? Debatable - that is why I am not stressed
about all/any/adj being comparison operators. They would then mean
all-equal, any-equal, adj-equal.

Alan
Received on Sunday, 10 August 2003 19:35:50 UTC