- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Mon, 21 Jul 2003 08:57:49 +1000
- To: www-zig@w3.org
On Sat, Jul 19, 2003 at 12:12:35PM -0400, Robert Waldstein wrote: > Let me come out of lurking to say that "words" always confused me in Z39.50; > my client still has code in it because some servers required (require?) > a different attribute depending on whether the term I sent had a <space> in > it or not. This was confusing and wrong (in my opinion). I think I agree where you are coming from in that our engine does not know what a 'word' is really either. It just indexes sequences of bytes (terms) that were somehow extracted from a record. The issue to me is there is currently no way for a client to say which rules to use. It is harder to explain with queries (the problem is there, but its harder to see clearly as there are so many attributes that its easy to hide the problem by saying 'oh, why not try attribute X instead'), so I recommend thinking about it first purely from a SCAN perspective. Attribute sets are designed to support SCAN, not just searches. If you want to support scanning of values extracted with different term extraction rules, you need to be able to specify the rule to use. The most common example of this in general practice is things like Bath profile title searches where you may want to search as words or as a complete title. I have been using this example to make things concrete to people based on what they are familiar with. I would like to be able to express even more rules - but that is my problem. Once format/structure has been identified as the correct way to specify these term extraction rules, I can define my own personal attribute set and define new attribute values in the format/structure attribute type. This is what the attribute set architecture was designed to do (be extensible without redesigning everything). Life is wonderful. So I have been using the specific example of words vs string (complete values). If you want to support scanning to return all words in a title and the complete title you need a different attribute to send through. Format/structure is the correct place for this attribute. I believe its the documented purpose for this attribute. > If I send a term like "jaws" or "real-time" and I tell the server it is > a "complete title" I see no reason why I should say anything about "words"; > plus as part of this thread said, how do I even know if the server considers > the string I sent as containing "words". If you say you are trying to find things with that exact title, then you should not say anything about words. If you are trying to find things that *contain* the word 'jaws', then you need to say that instead. At present, you cannot specify these two queries separately. Alan
Received on Sunday, 20 July 2003 18:57:57 UTC