- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Mon, 18 Aug 2003 13:58:33 +1000
- To: ZIG <www-zig@w3.org>
On Fri, Aug 15, 2003 at 12:58:57PM +0100, Robert Sanderson wrote: > access=title > comparison=any > format=string > Term=Utz Jaws Skyscraper > > If we allow any to be applied to a single term rather than being an error, > then is this a single, whacked out, term or is it 3 terms? I would have said 1 term, because 'format=string' to me would say to treat the Term as a single string. If it was 'format=coordinate' then 'Term=12,41 54,41 52,54' would have been 3 terms. > > Would knowing the > > number of terms change how the parser did its job? If parsing words > > In the above case, yes, it would split 1 string in to 3 strings. How to know to split into 3 strings? Does comparison=any + format=string mean split terms on whitespace? Or does it mean at word boundaries? (Words may be separated by punctuation without any whitespace.) Just trying to clarify what you are suggesting in my mind. I think you are suggesting (2), I am suggesting (3). But here are some alternative interpretations. (1) Use of any/all/adj implies the query term consists of words, irrespective of what format= says (ie: comparison=any + format=string means input string is words, not a single string). (2) Use of any/all/adj as a comparison means split the term on white space (no connotations of words - just split on whitespace). Then treat each value separated by whitespace as input into the normal comparison process. (3) Use of any/all/adj does not affect how to extract multiple terms from the query string. format=string alone does this. If multiple terms are extracted from the query string, then any/all/adj kicks in. (Otherwise all 3 are identical in their behaviour - the single term must match.) (Let me know if I have captured the semantics you are proposing correctly.) I dislike (1) because any/all/adj overrides the format attribute. I don't think you are proposing this. (2) is a valid option (more below). (3) is what I had been pushing. The implication of (2) is that the query string before its changed into terms has to check to see if any/all/adj is in effect, and if so, then split the query string on whitespace and then do the processing as if 3 query strings had been supplied joined by AND/OR/PROX nodes. But what if format=word is specified and a word parser splits on hyphens so that 'book-case' has two terms extracted ('book' and 'case'). What does 'any' (and 'all' and 'adj') mean with access=title comparision=any format=word Term=child's book-case versus access=title comparision=any format=string Term=child's book-case Does the first mean the title must contain 'child's' or 'book' or 'case'? Does the second mean the title must equal 'child's' or 'book-case'? Just trying to work out the exact semantics of what you are suggesting. I do not agree or disagree yet - I am not sure what text you would put into the spec to describe the behaviour. > > from a string, how would the client know the exact parsing rules > > the server is going to use? (eg: book-case - how many terms?) > > It could just say 'multiple'. But the client knows (in theory) what the > Term means, at least to the point that it was a single term or multiple > terms. Exact number of terms is probably too much, so > null/single/multiple/unknown is a better division. I agree that if this route is taken, removing 'exact number' is better. Alan
Received on Sunday, 17 August 2003 23:58:52 UTC