- From: Ray Denenberg, Library of Congress <rden@loc.gov>
- Date: Tue, 19 Aug 2003 13:53:43 -0400
- To: "ZIG" <www-zig@w3.org>
I'm having trouble following the discussion because the messages that Alan is quoting, I can't find. In fact I can't find any messages posted by Rob the past two months. (Not even in the archive.) Could one of you (Alan or Rob) post the applicable messages. Thanks. --Ray ----- Original Message ----- From: "Alan Kent" <ajk@mds.rmit.edu.au> To: "ZIG" <www-zig@w3.org> Sent: Monday, August 18, 2003 7:15 PM Subject: Re: Attribute Architecture -- new type? > > On Mon, Aug 18, 2003 at 12:16:14PM +0100, Robert Sanderson wrote: > > > access=title > > > comparision=any > > > format=string > > > Term=child's book-case > > > Does the second mean the title must equal 'child's' or 'book-case'? > > > > Exactly my point! :) We don't know if the client meant two strings or > > one. Thus it needs to say what it meant somehow. > > Yes, I thought that is the problem we were trying to come up with a > precise spec to address. We know there is a problem - do you have a > concrete proposed solution we can put into the spec? > > What I am trying to work out in my mind is what are the rules and > when are they invoked. I believe this has to be clearly expressed > in the spec. > > Being picky to highlight the point (not purposely trying to be obnoxious), > examples I have seen in recent mail are: > > * If date/time then 1/2/3 12:34:56 2/3/4 12:34:56 is two values > * If there are quotes around strings, treat each quoted string as > a separate value (implying you have to release quotes in strings) > * If you specify multiple terms for format string, then the system > should "work out" what the terms are (Does this mean child's book-case > is 2 terms ("child's" and "book-case") if '2' is specified and 3 terms > ("child's" "book" "case") if '3' is specified?) > > (I will admit the last point I have purposely pushed what you said > beyond what I suspect you intended. My point is simply that I would > like to know the *precise* rules to use, whatever they are, otherwise > people can bend and twist them in ways not intended.) > > The above seems to imply each 'format' value has different rules for > extracting multiple terms from the Term=... value (which is fine). > But further, the parsing rules change depending on what occurrence value > is specified? (format=string + occurrence=single means grab whole > string, but format=string + occurrence=multiple means split on white > space, but don't split quoted strings and release quotes in strings? > Or is it the presence of any/all/adj that indicates multiplicity?) > > I like trying to keep things orthogonal in the attribute types as > much as possible. It seems like the concept of pulling out multiple > terms from a single query string is a query-only concept - it is > not relevant scanning an index for example. Is it therefore that > the new attribute should specify not only that there are multiple > terms, but how to pull them out of the supplied query string? > (The default value for different formats would be different.) > > access=title > format=string > parse-query-term=single-value (the default for format=string) > comparison=equal (actually, its irrelevant) > Term=The Fall of the Roman Empire > > access=title > format=string > parse-query-term=space-separated-quoted-strings > comparison=any > Term="The Fall of the Roman Empire" "Batman forever" Jaws > > access=title > format=word > parse-query-term=word-boundaries (the default for format=word) > comparison=adj > Term=XML Schema > > access=title > format=date/time > parse-query-term=space-separated-date/time-values (the default) > comparison=all > Term=1/2/3 12:34:56 2/3/4 12:34:56 > > The idea is the parse-query-term attribute type alone identifies how to > parse the supplied Term=... into multiple terms. The default value if > this attribute is not specified is defaulted based on the format=... value. > > Further, comparisons of any/all/adj should only be used with parsing > rules that can return more than one value. Comparisons of equal/greater/... > should only be used with parsing rules that return exactly one value. > > Candidate values for the parse-query-term attribute type: > > single-value > Takes input string verbatim. > Returns exactly one term. > > space-separated-quoted-strings > Finds all the quoted strings. Any non-quoted text is separated > on whitespace boundaries. > Can return multiple terms. > > word-boundaries > Parses as words using the same word parsing rules as the access > point uses. Can return multiple terms. > > I am not saying we should do the above, but it is an option. Previously > I had been suggesting the format=... value alone should specify how > to identify multiple terms from the query string. If that is the case, > then I would not allow multiple quoted strings with format=string. > format=string means grab the whole value. If people think there is > an advantage in being able to have multiple string values in a single > query term, then we could come up with a syntax - but I don't see > clearly how this would fit in with CQL etc. Quoted strings in quoted > strings? > > > But I am strongly of the opinion that the rules for breaking the query > string into multiple search terms should be clear in the spec. I don't > mind the system working out the terms from an occurrence count, if the > algorithm for doing so is included in the spec. If you still prefer > a null/single/multiple style attribute type, did you have a specific > algorithm in mind for extracing terms from query strings? Could you write > it down? There is a danger of wandering around discussing options too long. > My personal goal is to get an acceptable approach signed off on. Looking > at different options can help find a better approach, but only if the > proposal is pretty concrete (in my personal opinion). > > > Thanks! > Alan >
Received on Saturday, 23 August 2003 08:07:02 UTC