- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Wed, 27 Aug 2003 09:47:43 +1000
- To: www-zig@w3.org
On Tue, Aug 26, 2003 at 11:22:47AM +0100, Mike Taylor wrote: > > But I am strongly of the opinion that the rules for breaking the > > query string into multiple search terms should be clear in the spec. > > Nope. It's no-one's business but the server's how it does this. What I had meant was if its up to the server how to break it into words, then the spec should say so and identify the lack of guarantee. If its not up to the server, the spec should say what the rules are. (See below for an example.) I think it needs to be up to the servers because that is where Z39.50 is at today. Coming back to the original purpose of this whole discussion: * I still think 'string' and 'words' are appropriate format/strcture values. * I think it makes sense to have any/all/adj as a separate attribute type (multipleTermBehaviour) indicating behaviour to take when multiple terms are extracted from the supplied query string. If there is only one term, the new attribute type can be ignored. There is no need for it to be an error as a query string can have only one word in it - semantics for a single term in the supplied query string is catered for and is a valid situation. 'format/structure = string + multipleTermBehaviour = any' has well defined semantics. multipleTermBehaviour *could* have 4 values: any/all/adj/singleton where 'singleton' causes an error if there is not exactly one search term. Semantically cleaner with format/structure=string. 'singleton' does *not* say 'parse the query string as if there is one value in it' - it says 'parse the query string and report an error if there is not one value in it'. I agree with Mike that it does not make sense for the client to say how many words are in the query string for format/structure=words. Only the server knows. Coming back to the above text - each format/structure I think should specify how multiple terms are extracted from the query string. * words - server choice on how words are extracted from the query string. Different systems have different word parsing rules. * string - query string is the single, complete term. * geographic coordinates - each pair of numbers is a separate term. (This is just an example - I am not a GEO expert so don't know what the rule should really be.) Note I think there are workable solutions merging any/all/adj into existing attribute types. I think it makes more sense having it separate (in a new attribute type), but I can understand hesitance in introducing new attribute types. Also note: I think the above also allows new format/structure values to be introduced later with *precise* word parsing rules - IF REQUIRED. Eg: spaceSeparatedTerms, multipleAlphaNumericSequences, oxfordDictionaryDefinitionOfAWord, etc. Alan
Received on Tuesday, 26 August 2003 19:46:24 UTC