- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Tue, 19 Aug 2003 09:15:11 +1000
- To: ZIG <www-zig@w3.org>
On Mon, Aug 18, 2003 at 12:16:14PM +0100, Robert Sanderson wrote: > > access=title > > comparision=any > > format=string > > Term=child's book-case > > Does the second mean the title must equal 'child's' or 'book-case'? > > Exactly my point! :) We don't know if the client meant two strings or > one. Thus it needs to say what it meant somehow. Yes, I thought that is the problem we were trying to come up with a precise spec to address. We know there is a problem - do you have a concrete proposed solution we can put into the spec? What I am trying to work out in my mind is what are the rules and when are they invoked. I believe this has to be clearly expressed in the spec. Being picky to highlight the point (not purposely trying to be obnoxious), examples I have seen in recent mail are: * If date/time then 1/2/3 12:34:56 2/3/4 12:34:56 is two values * If there are quotes around strings, treat each quoted string as a separate value (implying you have to release quotes in strings) * If you specify multiple terms for format string, then the system should "work out" what the terms are (Does this mean child's book-case is 2 terms ("child's" and "book-case") if '2' is specified and 3 terms ("child's" "book" "case") if '3' is specified?) (I will admit the last point I have purposely pushed what you said beyond what I suspect you intended. My point is simply that I would like to know the *precise* rules to use, whatever they are, otherwise people can bend and twist them in ways not intended.) The above seems to imply each 'format' value has different rules for extracting multiple terms from the Term=... value (which is fine). But further, the parsing rules change depending on what occurrence value is specified? (format=string + occurrence=single means grab whole string, but format=string + occurrence=multiple means split on white space, but don't split quoted strings and release quotes in strings? Or is it the presence of any/all/adj that indicates multiplicity?) I like trying to keep things orthogonal in the attribute types as much as possible. It seems like the concept of pulling out multiple terms from a single query string is a query-only concept - it is not relevant scanning an index for example. Is it therefore that the new attribute should specify not only that there are multiple terms, but how to pull them out of the supplied query string? (The default value for different formats would be different.) access=title format=string parse-query-term=single-value (the default for format=string) comparison=equal (actually, its irrelevant) Term=The Fall of the Roman Empire access=title format=string parse-query-term=space-separated-quoted-strings comparison=any Term="The Fall of the Roman Empire" "Batman forever" Jaws access=title format=word parse-query-term=word-boundaries (the default for format=word) comparison=adj Term=XML Schema access=title format=date/time parse-query-term=space-separated-date/time-values (the default) comparison=all Term=1/2/3 12:34:56 2/3/4 12:34:56 The idea is the parse-query-term attribute type alone identifies how to parse the supplied Term=... into multiple terms. The default value if this attribute is not specified is defaulted based on the format=... value. Further, comparisons of any/all/adj should only be used with parsing rules that can return more than one value. Comparisons of equal/greater/... should only be used with parsing rules that return exactly one value. Candidate values for the parse-query-term attribute type: single-value Takes input string verbatim. Returns exactly one term. space-separated-quoted-strings Finds all the quoted strings. Any non-quoted text is separated on whitespace boundaries. Can return multiple terms. word-boundaries Parses as words using the same word parsing rules as the access point uses. Can return multiple terms. I am not saying we should do the above, but it is an option. Previously I had been suggesting the format=... value alone should specify how to identify multiple terms from the query string. If that is the case, then I would not allow multiple quoted strings with format=string. format=string means grab the whole value. If people think there is an advantage in being able to have multiple string values in a single query term, then we could come up with a syntax - but I don't see clearly how this would fit in with CQL etc. Quoted strings in quoted strings? But I am strongly of the opinion that the rules for breaking the query string into multiple search terms should be clear in the spec. I don't mind the system working out the terms from an occurrence count, if the algorithm for doing so is included in the spec. If you still prefer a null/single/multiple style attribute type, did you have a specific algorithm in mind for extracing terms from query strings? Could you write it down? There is a danger of wandering around discussing options too long. My personal goal is to get an acceptable approach signed off on. Looking at different options can help find a better approach, but only if the proposal is pretty concrete (in my personal opinion). Thanks! Alan
Received on Monday, 18 August 2003 19:15:20 UTC