Re: Attribute Architecture -- new type? from Ray Denenberg, Library of Congress on 2003-08-19 (www-zig@w3.org from August 2003)

From: Ray Denenberg, Library of Congress <rden@loc.gov>
Date: Tue, 19 Aug 2003 13:53:43 -0400
To: "ZIG" <www-zig@w3.org>
Message-ID: <05de01c3667a$d4da37a0$849c938c@lib.loc.gov>
I'm having trouble following the discussion because the messages that Alan
is quoting, I can't find. In fact I can't find any messages posted by Rob
the past two months.  (Not even in the archive.) Could one of you (Alan or
Rob) post the applicable messages. Thanks. --Ray


----- Original Message -----
From: "Alan Kent" <ajk@mds.rmit.edu.au>
To: "ZIG" <www-zig@w3.org>
Sent: Monday, August 18, 2003 7:15 PM
Subject: Re: Attribute Architecture -- new type?


>
> On Mon, Aug 18, 2003 at 12:16:14PM +0100, Robert Sanderson wrote:
> > >     access=title
> > >     comparision=any
> > >     format=string
> > >     Term=child's book-case
> > > Does the second mean the title must equal 'child's' or 'book-case'?
> >
> > Exactly my point! :)  We don't know if the client meant two strings or
> > one. Thus it needs to say what it meant somehow.
>
> Yes, I thought that is the problem we were trying to come up with a
> precise spec to address. We know there is a problem - do you have a
> concrete proposed solution we can put into the spec?
>
> What I am trying to work out in my mind is what are the rules and
> when are they invoked. I believe this has to be clearly expressed
> in the spec.
>
> Being picky to highlight the point (not purposely trying to be obnoxious),
> examples I have seen in recent mail are:
>
> * If date/time then 1/2/3 12:34:56 2/3/4 12:34:56 is two values
> * If there are quotes around strings, treat each quoted string as
>   a separate value (implying you have to release quotes in strings)
> * If you specify multiple terms for format string, then the system
>   should "work out" what the terms are (Does this mean child's book-case
>   is 2 terms ("child's" and "book-case") if '2' is specified and 3 terms
>   ("child's" "book" "case") if '3' is specified?)
>
> (I will admit the last point I have purposely pushed what you said
> beyond what I suspect you intended. My point is simply that I would
> like to know the *precise* rules to use, whatever they are, otherwise
> people can bend and twist them in ways not intended.)
>
> The above seems to imply each 'format' value has different rules for
> extracting multiple terms from the Term=... value (which is fine).
> But further, the parsing rules change depending on what occurrence value
> is specified? (format=string + occurrence=single means grab whole
> string, but format=string + occurrence=multiple means split on white
> space, but don't split quoted strings and release quotes in strings?
> Or is it the presence of any/all/adj that indicates multiplicity?)
>
> I like trying to keep things orthogonal in the attribute types as
> much as possible. It seems like the concept of pulling out multiple
> terms from a single query string is a query-only concept - it is
> not relevant scanning an index for example. Is it therefore that
> the new attribute should specify not only that there are multiple
> terms, but how to pull them out of the supplied query string?
> (The default value for different formats would be different.)
>
>     access=title
>     format=string
>     parse-query-term=single-value    (the default for format=string)
>     comparison=equal                 (actually, its irrelevant)
>     Term=The Fall of the Roman Empire
>
>     access=title
>     format=string
>     parse-query-term=space-separated-quoted-strings
>     comparison=any
>     Term="The Fall of the Roman Empire" "Batman forever" Jaws
>
>     access=title
>     format=word
>     parse-query-term=word-boundaries   (the default for format=word)
>     comparison=adj
>     Term=XML Schema
>
>     access=title
>     format=date/time
>     parse-query-term=space-separated-date/time-values    (the default)
>     comparison=all
>     Term=1/2/3 12:34:56 2/3/4 12:34:56
>
> The idea is the parse-query-term attribute type alone identifies how to
> parse the supplied Term=... into multiple terms. The default value if
> this attribute is not specified is defaulted based on the format=...
value.
>
> Further, comparisons of any/all/adj should only be used with parsing
> rules that can return more than one value. Comparisons of
equal/greater/...
> should only be used with parsing rules that return exactly one value.
>
> Candidate values for the parse-query-term attribute type:
>
>     single-value
> Takes input string verbatim.
>     Returns exactly one term.
>
>     space-separated-quoted-strings
> Finds all the quoted strings. Any non-quoted text is separated
> on whitespace boundaries.
> Can return multiple terms.
>
>     word-boundaries
> Parses as words using the same word parsing rules as the access
> point uses. Can return multiple terms.
>
> I am not saying we should do the above, but it is an option. Previously
> I had been suggesting the format=... value alone should specify how
> to identify multiple terms from the query string. If that is the case,
> then I would not allow multiple quoted strings with format=string.
> format=string means grab the whole value. If people think there is
> an advantage in being able to have multiple string values in a single
> query term, then we could come up with a syntax - but I don't see
> clearly how this would fit in with CQL etc. Quoted strings in quoted
> strings?
>
>
> But I am strongly of the opinion that the rules for breaking the query
> string into multiple search terms should be clear in the spec. I don't
> mind the system working out the terms from an occurrence count, if the
> algorithm for doing so is included in the spec. If you still prefer
> a null/single/multiple style attribute type, did you have a specific
> algorithm in mind for extracing terms from query strings? Could you write
> it down? There is a danger of wandering around discussing options too
long.
> My personal goal is to get an acceptable approach signed off on. Looking
> at different options can help find a better approach, but only if the
> proposal is pretty concrete (in my personal opinion).
>
>
> Thanks!
> Alan
>
Received on Saturday, 23 August 2003 08:07:02 UTC