- From: Babich, Alan <ABabich@filenet.com>
- Date: Tue, 30 Jun 1998 15:13:36 -0700
- To: "'DASL'" <www-webdav-dasl@w3.org>
Heretofore, this hasn't been as explicit as it needs to be. This is an attempt to clarify the situation. LIST OF SCOPE ELEMENTS Currently, in a DASL query, there is a list of scope elements. Each scope element is a DAV:href element and an optional DAV:depth element. A scope list element can, for example, reference a collection that is a whole document space under the control of a DMS, or it can reference a collection that is a particular (sub)folder in a document space, etc. There is currently no constraint that the scope list elements all be subcollections of the same root collection or document space, or even that the collections involved have to be on the same server. Said another way, the search arbiter is currently free to forward the query to a list of heterogeneous document management systems on the same or separate servers and merge the results. COLLECTION A (root) collection is an encapsulation of (1) resources, (2) metadata describing those resources, and (3) a query engine that both understands the metadata and has direct access to the resources. A (root) collection can have subcollections, which can have subcollections, etc. . Subcollections are obviously encapsulated by their root collection. SEARCH ARBITER The search arbiter performs the query by sending it to all the scope elements, merges the query results returned by each scope element, and returns the merged query results to the client. If ordering is specified and is supported, then either (1) each scope element must return ordered results to the arbiter, or (2) the search arbiter does all the sorting. In case (1), the arbiter does no sorting. It simply merges the sorted results from each scope element. In case (2), the arbiter does the sorting. In either case, to perform the merge or sort, the search arbiter software must understand the datatypes of the properties of each scope element, since the sort is done on an element basis. (*1: DASL is currently quiet on whether case (1) or case (2) pertains. DASL must become explicit on this issue.) Implicit in all the above behavior is that the search arbiter software, when asked, (1) retrieves and saves the metadata of each scope list element, (2) merges the metadata of all the scope elements, and (3) returns the merged metadata to the client. The merged metadata is returned to the client when the client software requests the query capabilities of the particular scope list submitted to the arbiter software. (*2: The exact details of how the client retrieves the metadata is work in progress. The search arbiter will use this same method to get the metadata of each individual scope list element.) METADATA The metadata of a particular query consists of (1) the properties, (2) the query operators, and the (3) query grammars supported by the scope list elements. The metadata of each scope list element can be exactly the same, mostly similar to, or mostly different from the metadata of other scope list elements. The most useful case is where there is a fair amount of overlap in the metadata of the scope list elements, but the metadata is not identical. For example, scope list element "A" may have the "Author" and "loan_number" properties, and scope list element "B" may have the "Author" and "purchase_order_number properties". In this example, both scope list elements have some common properties and some different properties. Scope list element "A" supports the "contains" operator, and scope list element "B" does not support the "contains" operator. So, the scope list elements of this example have query operators in common, and query operators that are different. MERGING METADATA There are two reasonable approaches to merging the metadata of the scope list elements: (1) take the set intersection of the query capabilities of each scope list element, and (2) take the set union of the query capabilities of each scope list element. INTERSECTION: Under intersection rules, the merged metadata consists only of (1) the properties that are common to all scope list elements, (2) the query operators that are common to all scope list elements, and (3) the query grammars that are common to all scope list elements. This is the easy case for the search arbiter software, since client queries can be forwarded to each scope list element unmodified. A problem with this case is that, in general, one can't rely on the intersection being as large as one would like, unless the situation has been prearranged to make all the schemas identical. (This eliminates querying across heterogeneous legacy repositories, which, almost by definition, had their schemas defined independently of each other.) UNION: Under union rules, the merged metadata consists of (1) the set union of the properties of all the scope list elements, (2) the set union of the query operators of all the scope list elements, and (3) the set intersection of the query grammars supported by the scope list elements. (The intersection of the grammars must be taken even under union merge, since the search arbiter software must send the query to all the scope list elements, so all the scope list elements must understand the grammar being used.) The union case probably has broader applicability in the real world than the intersection case. Since there is only one glob of metadata returned to the client for any particular scope list, the client's view of the world is the same whether there is one or multiple scope list elements in the query. In the intersection case, all client queries are fully understood by all scope list elements. In the union case, in general, client queries are only partially defined for each scope list element. The problems this causes are easily solved by the search arbiter's use of three valued elimination. (As discussed before, three valued elimination is an obvious straightforward extension of ANSI standard SQL three valued logic.) Sensible query results are returned for all queries that are valid with respect to the metadata returned to the client. (*3: DASL is currently quiet on whether union or intersection rules are used when merging metadata. DASL must be explicit about this.) ALTERNATIVES The current situation requires the search arbiter to be fully general: It must collect and merge metadata, perform three valued elimination, merge query results from multiple scope list elements, etc. Alternatives to the current situation are: (1) constrain the scope list to be a single element, and (2) constrain the scope list elements to be subcollections of the same collection, so that they all have exactly the same metadata. Then the search arbiter disappears. Said another way, the root collection becomes the arbiter for itself and its subcollections. In either case, there is only one metadata description for all the scope list elements of a query, so there is no concept of merging the metadata of the scope list elements, no three valued elimination, no forwarding of the query to multiple different servers, etc. If alternative (1) is chosen, it would nonetheless be possible for "anyone" to write a GENERIC SEARCH ARBITER that could interface to any list of DASL collections. The GSA would enhance the 1.0 DASL protocol to take a LIST of scope elements (instead of a single scope list element), collect the metadata from each scope list element, merge the metadata, return the merged metadata to the client, distribute queries to each scope list element, merge query results from all scope list elements, and return a single set of query results to the client. In fact, doing that wouldn't be difficult, because the generic arbiter would use the 1.0 DASL protocol unmodified to talk to the clients and the individual scope list elements. No special enhancements to the DASL 1.0 protocol would be necessary to enable the implementation of such a generic search arbiter. (I am assuming the generic arbiter would always use union merge. If intersection merge were deemed desirable as well, the generic arbiter would enhance the 1.0 protocol one step further to allow the client to specify intersection versus union merge when retrieving the metadata.) ISSUES (ACTION ITEMS) Not counting selecting an alternative as an issue, there are three DASL issues, flagged as *1, *2, *3, in the above. ---
Received on Tuesday, 30 June 1998 18:16:17 UTC