Re: more on attribute proposal from Alan Kent on 2003-07-20 (www-zig@w3.org from July 2003)

From: Alan Kent <ajk@mds.rmit.edu.au>
Date: Mon, 21 Jul 2003 08:57:49 +1000
To: www-zig@w3.org
Message-ID: <20030721085749.A10849@io.mds.rmit.edu.au>
On Sat, Jul 19, 2003 at 12:12:35PM -0400, Robert Waldstein wrote:
> Let me come out of lurking to say that "words" always confused me in Z39.50;
> my client still has code in it because some servers required (require?) 
> a different attribute depending on whether the term I sent had a <space> in
> it or not.  This was confusing and wrong (in my opinion).

I think I agree where you are coming from in that our engine does not
know what a 'word' is really either. It just indexes sequences of bytes
(terms) that were somehow extracted from a record.

The issue to me is there is currently no way for a client to say which
rules to use. It is harder to explain with queries (the problem is there,
but its harder to see clearly as there are so many attributes that its
easy to hide the problem by saying 'oh, why not try attribute X instead'),
so I recommend thinking about it first purely from a SCAN perspective.
Attribute sets are designed to support SCAN, not just searches.

If you want to support scanning of values extracted with different term
extraction rules, you need to be able to specify the rule to use.

The most common example of this in general practice is things like
Bath profile title searches where you may want to search as words or
as a complete title. I have been using this example to make things
concrete to people based on what they are familiar with. I would like
to be able to express even more rules - but that is my problem.
Once format/structure has been identified as the correct way to
specify these term extraction rules, I can define my own personal
attribute set and define new attribute values in the format/structure
attribute type. This is what the attribute set architecture was designed
to do (be extensible without redesigning everything). Life is wonderful.

So I have been using the specific example of words vs string (complete
values). If you want to support scanning to return all words in a title
and the complete title you need a different attribute to send through.
Format/structure is the correct place for this attribute. I believe
its the documented purpose for this attribute.

>   If I send a term like "jaws" or "real-time" and I tell the server it is
> a "complete title" I see no reason why I should say anything about "words";
> plus as part of this thread said, how do I even know if the server considers
> the string I sent as containing "words".

If you say you are trying to find things with that exact title, then
you should not say anything about words. If you are trying to find 
things that *contain* the word 'jaws', then you need to say that instead.
At present, you cannot specify these two queries separately.

Alan
Received on Sunday, 20 July 2003 18:57:57 UTC