- From: Mike Taylor <mike@indexdata.com>
- Date: Mon, 21 Jul 2003 10:08:49 +0100
- To: www-zig@w3.org
Just want to say I agree 100% with all Alan's said here. We need the AA to provide total orthogonality. _/|_ _______________________________________________________________ /o ) \/ Mike Taylor <mike@indexdata.com> http://www.miketaylor.org.uk )_v__/\ "No pearl grows without a grain of irritation at its heart. The trick is to grow a pearl and not an ulcer" -- Neil Peart. -- Listen to my wife's new CD of kids' music, _Child's Play_, at http://www.pipedreaming.org.uk/childsplay/ > Envelope-to: mike@indexdata.com > Delivery-date: Mon, 21 Jul 2003 02:07:16 +0200 > Date: Mon, 21 Jul 2003 10:06:35 +1000 > From: Alan Kent <ajk@mds.rmit.edu.au> > Content-Type: text/plain; charset=us-ascii > X-Archived-At: http://www.w3.org/mid/20030721100635.B10849@io.mds.rmit.edu.au > Resent-From: www-zig@w3.org > X-Mailing-List: <www-zig@w3.org> archive/latest/1363 > X-Loop: www-zig@w3.org > Sender: www-zig-request@w3.org > Resent-Sender: www-zig-request@w3.org > Precedence: list > List-Id: <www-zig.w3.org> > List-Help: <http://www.w3.org/Mail/> > List-Unsubscribe: <mailto:www-zig-request@w3.org?subject=unsubscribe> > Resent-Bcc: > X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20 > X-Spam-Level: > > > Hi Ray, > > I have replied to your email below. > > Sorry to be a broken record, but I think its *critical* to get scanning > to work. Once scanning is fixed, the query stuff falls out in the wash. > The attribute architecture is not only for querying - its for everything > that uses attributes. This includes scanning. > > If you step away from doing searches for a moment, and just look at > scanning indexes, you immediately and clearly hit the problem (in my > opinion anyway! ;-). If I have term-lists that contain words from titles > and the complete values of titles, then how do I express an attribute > list for scanning? > > If I read the textual descriptions of the various attribute types, then > format/structure sounds ideal. The Bib-2 attribute values make complete > sense for scanning. Its the Util attribute values that are strange. > Why specify 'any of these words' when doing a scan to identify that > I want the title as a scan? Its semantically wrong. You want to identify > the fact that you want words independently to the search-oriented operator > of how to handle multiple terms in a search. > > Once you reach this point, you realise the descriptions for the attribute > types are good. The overall architecture is good. Its just that the utility > attribute set has not defined words vs strings, and that some comparison > operators (any/all/adj) have slipped into format/structure by mistake. > > So I strongly recommended for a moment forgetting searches, and thinking > about SCAN requests. What are the attribute lists for scanning title > as keywords and title as complete values? > > > > On Fri, Jul 18, 2003 at 05:46:57PM -0400, Ray Denenberg wrote: > > There's consensus (among those who have participated in this > > discussion) that allTheseWords, anyOfTheseWords, adjacentWords should > > be changed from Structure/format to Comparison attributes. > > > > There's less consensus about adding two new Structure/format > > attributes, (1) word(s), and (2) string (or 'completeValue'). Mike > > feels strongly that they should be added, and I don't feel strongly but > > am somewhat uncomfortable about adding them (without clarifying certain > > other parts of the proposal). I don't know how strongly Alan feel. And > > I'd like to get other opinions. > > Actually, I think the discussion has been the opposite. I think there is > strong consensus that word and string should be in format/structure. > This is because they should be talking about the format or interpretation > of the structure of the value supplied. This is ideal for doing index > scanning too as the attribute is also for what is returned by a SCAN > request - it describes the format/structure of the returned scan terms. > It is not purely a query attribute. > > As a *result* of this consensus, it was realised and agreed to all,any,adj > words should move out - they are in the wrong spot. Comparison is a more > correct place. It makes sense with scanning too. Comparisons are query > operators, not scanning stuff (this is a little hand-wavy here, which is > always dangerous as I know I can come up with example applications where > this is not true). > > But I am happy to get other people's opinions. I think I have finally > managed to express what I meant clearly so that Mike and Rob understand > and agree with what I am saying (they may have actually reached where > I am at before me as a result of the CQL work). > > > This is how I see it: if the query term is a set of words, and the > > comparison attribute is one of the above three, then clearly a > > structure/attribute to indicate "words" is not necessary. > > To keep arguments simpler, I have tried to avoid the different > term extraction rules side of things. But I want to support different > definitions of what a 'word' is. To me, allWords, anyWords etc really > should be allTerms, anyTerms, etc. They define what to do if there > are multiple terms extracted from the query. The terms can be words. > But the terms could also be floating point numbers defining a line > segment, or coordinate pairs, or special things in chemical formulas etc. > I think its better if comparison operators, whenever possible, should > define how to *compare* values, not how to extract values to be > compared. Orthogonality is good. It allows new term extraction rules > to be added orthogonally. > > For example, Bib-2 already defines additional format/structure attributes. > So its not a possibility, its a current reality. Its not just words > and strings we are talking about - its the ability to define multiple > ways to structure terms extracted from records (and queries), then > then keeping this independent to comparison operators. > > I would love to change any/all/adj from 'words' to 'terms' in > general. I think they will make sense when people define other > concepts of how to extract terms from record content. However, this > seemed to hard for people to swallow so I backed off trying to > get at least the major problem fixed. > > > Conversely, if the desire is to search for words (as opposed to a > > complete string) then can the comparison attribute be anything but one > > of these three? > > Probably not. However, I believe a goal of the AA is to be extensible, > and I can see cases where different projects may want different concepts > of what a 'word' is. I am thinking more of chemical formulas, geographic > coordinates, other rich and complex data types etc. > > This would be done by defining a new 'chemical' attribute set with a set > of chemsitry specific access point names and new format/structure attributes > related to chemical formulas. (Note: I know almost nothing about chemistry. > I am using it as an example only.) > > > However, what if the term is a single word? If the intent is to > > search for it as a word (not a string), I don't think Alan's proposal > > addresses whether this should fit within the three attributes proposed > > - all three would mean the same thing, and so there may be sentiment > > for separating out the single-word case. If so, then I can see a > > stronger argument for having 'word' and 'string' format/structure > > values. > > I think what you are saying is a good example of why comparions operators > should not be used to define what terms are. Its a good example of one > of the many little nasty side effects that come up. That is why I > strongly believe any/all/adj words should not imply term structure. > > > So I see two possibilities: > > > > 1. A single-word search would be handled by one of the > > word-comparison attributes (one of these would be "singled-out" for > > this use), no format/structure attribute included. If the term is a > > single-word but is to be searched as a string, then another comparison > > would be used. [aside: I'm not sure which one though. "Equal" seems to > > be precluded, since the Utility set prose says that it cannot be used > > with expansion/interpretation. On the other hand, Bath uses it. This > > may be another defect that we should address.] > > > > 2. When the term is a single-word, the comparison attribute may not > > be one of the above three (they can only be used for multiple words) > > and the format/structure 'word' or 'string' is supplied. > > > > I think we need to nail down one of these two, and I don't really care which. > > I think the above assumes there is only one definition of what a 'word' > is, and I think the goal of the AA is to be a framework for expansion, > not restrictive. So I don't think you can ever preclude using a > format/structure attribute in a query. I (personally) think it makes > sense leaving format/structure open to identify different ways to > extract multiple terms from a record (even different term extraction > rules). > > I understand where you are coming from, but I don't think either (1) or (2) > above should be mandated. The problem is both options assume the client > *knows* whether the search term contains one or more words. But is > 'book-case' one word or two? Clients do not know the word extraction > rules used by a server (there is no formal agreed interpretation of > what a word is anywhere), so clients cannot know if a search string > entered by a user is a single word or not. > > So I think > * Clients must be allowed to send all/any/adj for single or multiple > word queries. > * Clients must always be allowed to send a format/structure attribute. > If omitted, its the servers choice as what to do. > > I don't think its necessary to define the preferred way to do single > word queries as distinct from multi-word queries - as it implies the > client has to understand how to extract words from strings using > the same rules as a server. If this is considered important, then > I would look at adding a new comparison operator to the any/all/adj list > of 'exactly one', which aborts the query with an error if there is not > exactly one word (term) supplied. The responsibility is then given to > the server rather than being on the client. > > Alan >
Received on Monday, 21 July 2003 05:38:34 UTC