Re: Attribute Architecture -- new type?

> Date: Tue, 19 Aug 2003 09:15:11 +1000
> From: Alan Kent <ajk@mds.rmit.edu.au>
> 
> > >     access=title
> > >     comparision=any
> > >     format=string
> > >     Term=child's book-case 
> > > Does the second mean the title must equal 'child's' or 'book-case'?
> > 
> > Exactly my point! :) We don't know if the client meant two strings
> > or one. Thus it needs to say what it meant somehow.
> 
> Yes, I thought that is the problem we were trying to come up with a
> precise spec to address. We know there is a problem - do you have a
> concrete proposed solution we can put into the spec?

It is none of the client's damned business whether the server treats
this as two, three or four words -- or indeed seven or twenty-nine.
The _sole_ purpose of the anyWords/allWords attributes is so that the
client can remain in this state of blissful ignorance -- so it can say
to the server, "Here's a bunch of words, pick them apart exactly as
you would do if they were part of a record contributing to the index
I'm searching".  That's a valuable thing to be able to do: it means
that the client can submit "child's book-case" without knowing or
caring what the server will do with it, beyond that it will Do The
Right Thing.

If the client already _knows_ how it wants the string split up, it can
do so itself and submit and AND or OR search.  Again, the ONLY reason
anyWords and allWords are useful is because the client can't, in
general, know how to do this splitting.

(Rob, I don't remember whether your CQL compiler has a back-end that
renders out to a Type-1 equivalent format such as PQF, but if it does,
you must have run into exactly this problem when trying to generate
Type-1 queries using BIB-1, which doesn't have allWords/anyWords.  The
CQL parser can't know how to break up the multi-word search term -- it
needs to pass it to the server, which does know.)

> * If you specify multiple terms for format string, then the system
>   should "work out" what the terms are (Does this mean child's
>   book-case is 2 terms ("child's" and "book-case") if '2' is
>   specified and 3 terms ("child's" "book" "case") if '3' is
>   specified?)

Yeuch, no.  This is not only undesirable, it also doesn't work.  There
are at lest two perfectly good ways to parse "child's book-case" into
three words: "child's", "book", "case" and "child", "s", "book-case".
(Yes, I have worked with servers configured to work the second way.)

> I like trying to keep things orthogonal in the attribute types as
> much as possible.

Exactly!

> But I am strongly of the opinion that the rules for breaking the
> query string into multiple search terms should be clear in the spec.

Nope.  It's no-one's business but the server's how it does this.

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor  <mike@indexdata.com>  http://www.miketaylor.org.uk
)_v__/\  "Football is a simple game complicated by fools" -- Kevin
	 Keegan, quoting Bill Shankly.

--
Listen to my wife's new CD of kids' music, _Child's Play_, at
	http://www.pipedreaming.org.uk/childsplay/

Received on Tuesday, 26 August 2003 06:23:19 UTC