Re: Arbiter must merge the metadata of scope list elements

My comments are interspersed

Babich, Alan wrote:

> Heretofore, this hasn't been as explicit as it needs to be.
> This is an attempt to clarify the situation.
>
> LIST OF SCOPE ELEMENTS
>
> Currently, in a DASL query, there is a list of scope elements.
> Each scope element is a DAV:href element and an optional
> DAV:depth element. A scope list element can, for example,
> reference a collection that is a whole document
> space under the control of a DMS, or it can reference a
> collection that is a particular (sub)folder in a
> document space, etc.
>
> There is currently no constraint that the scope list elements
> all be subcollections of the same root collection or document
> space, or even that the collections involved have to be on the
> same server. Said another way, the search arbiter is currently
> free to forward the query to a list of heterogeneous
> document management systems on the same or separate servers
> and merge the results.
>
> COLLECTION
>
> A (root) collection is an encapsulation of (1) resources,
> (2) metadata describing those resources, and (3) a query
> engine that both understands the metadata and has direct
> access to the resources. A (root) collection can have
> subcollections, which can have subcollections, etc. .
> Subcollections are obviously encapsulated by their
> root collection.
>
> SEARCH ARBITER
>
> The search arbiter performs the query by sending it to
> all the scope elements, merges the query results returned
> by each scope element, and returns the merged query results
> to the client.
>
> If ordering is specified and is supported, then either (1) each
> scope element must return ordered results to the arbiter, or
> (2) the search arbiter does all the sorting. In case (1),
> the arbiter does no sorting. It simply merges the sorted results
> from each scope element. In case (2), the arbiter does the
> sorting. In either case, to perform the merge or sort, the
> search arbiter software must understand the datatypes of
> the properties of each scope element, since the sort is done
> on an element basis.
>
> (*1: DASL is currently quiet on whether case (1) or
> case (2) pertains. DASL must become explicit on this issue.)

No, DASL is appropriately quiet.  (1) or (2) or something elseis up to
the implementation.

>
>
> Implicit in all the above behavior is that the search arbiter
> software, when asked, (1) retrieves and saves the metadata of
> each scope list element, (2) merges the metadata of all the
> scope elements, and (3) returns the merged metadata to the
> client.
>
> The merged metadata is returned to the client when the
> client software requests the query capabilities of the
> particular scope list submitted to the arbiter software.
>
> (*2: The exact details of how the client retrieves the metadata
> is work in progress. The search arbiter will use this same
> method to get the metadata of each individual scope list
> element.)

The search arbiter *might* use this same method, but it mightknow all
about the scopes it can serve.  It's an implementation
choice.

>
>
> METADATA
>
> The metadata of a particular query consists of (1) the
> properties, (2) the query operators, and the (3) query grammars
> supported by the scope list elements.
>
> The metadata of each scope list element can be exactly the same,
> mostly similar to, or mostly different from the metadata of
> other scope list elements. The most useful case is where there
> is a fair amount of overlap in the metadata of the scope
> list elements, but the metadata is not identical. For example,
> scope list element "A" may have the "Author" and "loan_number"
> properties, and scope list element "B" may have the "Author"
> and "purchase_order_number properties". In this example,
> both scope list elements have some common properties
> and some different properties. Scope list element "A" supports
> the "contains" operator, and scope list element "B" does
> not support the "contains" operator. So, the scope list elements
> of this example have query operators in common, and query
> operators that are different.
>
> MERGING METADATA
>
> There are two reasonable approaches to merging the metadata
> of the scope list elements: (1) take the set intersection
> of the query capabilities of each scope list element, and
> (2) take the set union of the query capabilities of each scope
> list element.
>
> INTERSECTION: Under intersection rules, the merged metadata
> consists only of (1) the properties that are common to all
> scope list elements, (2) the query operators that are common
> to all scope list elements, and (3) the query grammars
> that are common to all scope list elements. This is the easy
> case for the search arbiter software, since client queries
> can be forwarded to each scope list element unmodified.
> A problem with this case is that, in general, one can't
> rely on the intersection being as large as one would like,
> unless the situation has been prearranged to make all the
> schemas identical. (This eliminates querying across
> heterogeneous legacy repositories, which, almost by definition,
> had their schemas defined independently of each other.)
>
> UNION: Under union rules, the merged metadata consists of
> (1) the set union of the properties of all the scope list
> elements, (2) the set union of the query operators of all
> the scope list elements, and (3) the set intersection of
> the query grammars supported by the scope list elements.
> (The intersection of the grammars must be taken even under
> union merge, since the search arbiter software must send
> the query to all the scope list elements, so all the scope
> list elements must understand the grammar being used.)
> The union case probably has broader applicability in the
> real world than the intersection case.
>
> Since there is only one glob of metadata returned
> to the client for any particular scope list, the client's
> view of the world is the same whether there is one or
> multiple scope list elements in the query.
>
> In the intersection case, all client queries are fully
> understood by all scope list elements. In the union case,
> in general, client queries are only partially defined for
> each scope list element. The problems this causes are easily
> solved by the search arbiter's use of three valued elimination.
> (As discussed before, three valued elimination is an obvious
> straightforward extension of ANSI standard SQL three valued
> logic.) Sensible query results are returned for all queries
> that are valid with respect to the metadata returned to the
> client.
>
> (*3: DASL is currently quiet on whether union or intersection
> rules are used when merging metadata. DASL must be explicit
> about this.)

Or the work in progress means of getting the metadata may
provideinformation on each specified scope, allowing the client to do
intersection, union, or perform more complex operations like mapping
some properties onto each other for the benefit of the user.

>
>
> ALTERNATIVES
>
> The current situation requires the search arbiter to be
> fully general: It must collect and merge metadata, perform
> three valued elimination, merge query results from multiple
> scope list elements, etc.
>
> Alternatives to the current situation are:
>
> (1) constrain the scope list to be a single element, and
>
> (2) constrain the scope list elements to be subcollections
> of the same collection, so that they all have exactly the
> same metadata.
>
> Then the search arbiter disappears. Said another way, the
> root collection becomes the arbiter for itself and its
> subcollections. In either case, there is only one
> metadata description for all the scope list
> elements of a query, so there is no concept of merging the
> metadata of the scope list elements, no three valued
> elimination, no forwarding of the query to multiple
> different servers, etc.
>
> If alternative (1) is chosen, it would nonetheless be possible
> for "anyone" to write a GENERIC SEARCH ARBITER that could
> interface to any list of DASL collections. The GSA would enhance
> the 1.0 DASL protocol to take a LIST of scope elements
> (instead of a single scope list element), collect the metadata
> from each scope list element, merge the metadata, return the
> merged metadata to the client, distribute queries to each
> scope list element, merge query results from all scope list
> elements, and return a single set of query results to the
> client. In fact, doing that wouldn't be difficult, because
> the generic arbiter would use the 1.0 DASL protocol unmodified
> to talk to the clients and the individual scope list elements.
>
> No special enhancements to the DASL 1.0 protocol
> would be necessary to enable the implementation of such a
> generic search arbiter. (I am assuming the generic arbiter
> would always use union merge. If intersection merge were
> deemed desirable as well, the generic arbiter would enhance the
> 1.0 protocol one step further to allow the client to specify
> intersection versus union merge when retrieving the metadata.)
>
> ISSUES (ACTION ITEMS)
>
> Not counting selecting an alternative as an issue, there
> are three DASL issues, flagged as *1, *2, *3, in the above.
>
>                            ---



--
*************************************************
Rick Henderson            (Netscape)(650)937-3152
rickh@netscape.com
*************************************************

Received on Tuesday, 30 June 1998 20:48:08 UTC