W3C home > Mailing lists > Public > www-zig@w3.org > August 2003

Re: Attribute Architecture -- new type?

From: Alan Kent <ajk@mds.rmit.edu.au>
Date: Wed, 27 Aug 2003 09:47:43 +1000
To: www-zig@w3.org
Message-ID: <20030826234743.GC29835@io.mds.rmit.edu.au>

On Tue, Aug 26, 2003 at 11:22:47AM +0100, Mike Taylor wrote:
> > But I am strongly of the opinion that the rules for breaking the
> > query string into multiple search terms should be clear in the spec.
> Nope.  It's no-one's business but the server's how it does this.

What I had meant was if its up to the server how to break it into
words, then the spec should say so and identify the lack of guarantee.
If its not up to the server, the spec should say what the rules are.
(See below for an example.)

I think it needs to be up to the servers because that is where Z39.50
is at today.

Coming back to the original purpose of this whole discussion:

* I still think 'string' and 'words' are appropriate format/strcture values.

* I think it makes sense to have any/all/adj as a separate attribute type
  (multipleTermBehaviour) indicating behaviour to take when multiple terms
  are extracted from the supplied query string.
If there is only one term, the new attribute type can be ignored. There is
no need for it to be an error as a query string can have only one word
in it - semantics for a single term in the supplied query string is
catered for and is a valid situation. 'format/structure = string +
multipleTermBehaviour = any' has well defined semantics.

multipleTermBehaviour *could* have 4 values: any/all/adj/singleton
where 'singleton' causes an error if there is not exactly one search term.
Semantically cleaner with format/structure=string. 'singleton' does
*not* say 'parse the query string as if there is one value in it' - it says
'parse the query string and report an error if there is not one value in it'.
I agree with Mike that it does not make sense for the client to say how
many words are in the query string for format/structure=words. Only
the server knows.

Coming back to the above text - each format/structure I think should
specify how multiple terms are extracted from the query string.

* words - server choice on how words are extracted from the query string.
  Different systems have different word parsing rules.

* string - query string is the single, complete term.

* geographic coordinates - each pair of numbers is a separate term.
  (This is just an example - I am not a GEO expert so don't know what
  the rule should really be.)

Note I think there are workable solutions merging any/all/adj into existing
attribute types. I think it makes more sense having it separate (in a
new attribute type), but I can understand hesitance in introducing new
attribute types.

Also note: I think the above also allows new format/structure values
to be introduced later with *precise* word parsing rules - IF REQUIRED.
Eg: spaceSeparatedTerms, multipleAlphaNumericSequences,
oxfordDictionaryDefinitionOfAWord, etc.

Received on Tuesday, 26 August 2003 19:46:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:26:05 UTC