W3C home > Mailing lists > Public > www-zig@w3.org > August 2003

Re: Attribute Architecture -- new type?

From: Mike Taylor <mike@indexdata.com>
Date: Tue, 26 Aug 2003 16:13:48 +0100
To: azaroth@liverpool.ac.uk
Cc: www-zig@w3.org
Message-Id: <E19rfW0-0004Ht-00@auntie.miketaylor.org.uk>

> Date: Tue, 26 Aug 2003 15:32:45 +0100 (BST)
> From: Robert Sanderson <azaroth@liverpool.ac.uk>
> 
> > The _sole_ purpose of the anyWords/allWords attributes is so that
> > the client [...]
> 
> Which is great for Words, but potentially less great for other
> formats, in particular 'string' or 'exact' or whatever you want to
> call it.

Are you referring to Alan's concept of completeTerm?  allWords and
anyWords (or allTerms and anyTerm if you prefer) are meaningless when
used against a term of this type.

> > (Rob, I don't remember whether your CQL compiler has a back-end
> > that renders out to a Type-1 equivalent format such as PQF, but if
> > it does, you must have run into exactly this problem when trying
> > to generate Type-1 queries using BIB-1, which doesn't have
> > allWords/anyWords.  The CQL parser can't know how to break up the
> > multi-word search term -- it needs to pass it to the server, which
> > does know.)
> 
> Yep.  It splits it up with a convenient space separation.

... which is wrong (although an acceptable hack in the current state
of things).  If the server indexes "yellow book-case" as two words and
I search for allWords "yellow book-case", then your CQL parser will
submit an AND search that CAN NOT succeed against that index (because
the server treats hyphens as word breaks).  That's why you need
allWords/anyWords attributes that leave the server to do the parsing.

> The current definition of 'string' is insufficient for use with any
> as there's no way to distinguish individual strings within the Term.

Quite!  Because only the server knows or _can_ know!

> I think it would be solved by one of the following:
> 
> * A numberOfTerms attribute with values of null/single/multiple/unknown.
>   Then at least the server will know that the term should be split up 
>   somehow or not, and do what it thinks is appropriate.  (Forget the exact 
>   number of terms suggestion)
> 
> * Change the definition of string to something that can be embedded within 
>   a Term.  Probably by saying the ""s are used to delineate a single 
>   string term within the query term and that non special " characters must 
>   be escaped with a \

These don't help at all.  They don't solve the problem, and the
"problem" doesn't need solving in the first place.

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor  <mike@indexdata.com>  http://www.miketaylor.org.uk
)_v__/\  "White: a blank page or canvas.  The challenge: bring order
	 to the whole, through design, composition, tension, balance,
	 light and harmony" -- Steven Sondheim, "Sunday in the Park
	 with George"

--
Listen to my wife's new CD of kids' music, _Child's Play_, at
	http://www.pipedreaming.org.uk/childsplay/
Received on Tuesday, 26 August 2003 11:14:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 29 October 2009 06:12:23 GMT