RE: "contains" must be optional from Babich, Alan on 1998-07-01 (www-webdav-dasl@w3.org from July to September 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Wed, 1 Jul 1998 16:50:21 -0700
To: "'Fisher Mark'" <fisherm@tce.com>, "'Jim Davis'" <jdavis@parc.xerox.com>, "'www-webdav-dasl@w3.org'" <www-webdav-dasl@w3.org>
Message-ID: <72B1992276A9D111A20E00805FEAC96DCC412D@cm-expo1.filenet.com>
Mark, Jim:

In order to clarify the issues involved, I shall 
invent just a few bits of terminology for purposes of 
this discussion: ROOT COLLECTION, SUBCOLLECTION,
VIRTUAL ROOT COLLECTION, and MIDDLEWARE.

I assume the reader understands three valued elimination
(see my e-mail "Three Valued Logic and Elimination Defined, 
7/1/98).

Let us hypothesize a document management system, A, with
a DASL interface exists. It advertises itself as a collection.
Let us hypothesize that this DMS has hierarchical folders, 
and that it advertises each folder as a collection. I shall
refer to the collection representing the DMS instance
as a whole as a ROOT COLLECTION, and a collection
representing a folder or subfolder within the root
collection as a SUBCOLLECTION.

Let us hypothesize another DMS instance, B, exists. 
This is a totally separate collection of documents. 
The DMS instance B may or may not be implemented by the 
same vendor, and may or may not be on the same server.
It may help to think of B as being implemented by a 
different vendor, and running on a different server. 
Thus, B represents a root collection and has subcollections 
as well.

Let us hypothesize that all the above collections advertise
their metadata, and give errors for properties and operators
that they don't understand, as you propose. (Advertising
their metadata and giving such errors is reasonable, 
because that is what the underlying DMS's did in the 
real world prior to DASL.)

Now let us suppose an entrprising company decides to
provide a piece of software that provides the functionality
to query across multiple heterogeneous DASL 1.0 root 
collections. (After all, we probably couldn't stop that
from happening, even if we wanted to, could we?) Let us 
assume that this search arbiter software runs on some server 
somewhere. Suppose this search arbiter is asked to query 
across A and B. Let us hypothesize that the PSEUDO ROOT 
COLLECTION it advertises that represents a merging of the 
metadata in A and B is called C.

Let us call this search arbiter software MIDDLEWARE,
since it sits between the client software and the target
collections A and B.

The most logical thing to do, in my opinion, is for
the middleware to form the metadata for C from the set
union of the properties and the set union of the operators
of A and B. (The only other reasonable alternative, in
my opinion, is to take the set intersection of the
properties and operators, but I believe that to be less
useful in most cases. However, I would not object to
intersection being offered as an option.)

So we have:

                |---|
                |   | client
                |---|
                  |
                  | DASL protocol (with an extension or two?)
                  |
                |---|
                | C | 
                |---|
                /   \
          DASL /     \ DASL protocol
              /       \
           |---|     |---|
           | A |     | B |
           |---|     |---|

OK. Now, if you understand three valued elimination, you
know that C gets the original query, and sends a possibly 
massaged copy to A, and a possibly massaged, possibly 
different, copy of the original query to B. In order to
massage the query and advertise the metadata for C, 
A and B must advertise their properties AND OPERATORS,
so that C can retrieve that information.

We want C to give errors for anything in the query that
is not defined RELATIVE TO THE METADATA OF C.
Similarly, we want A and B to give errors if anything in the
massaged copy of the query that they see is undefined RELATIVE
TO THEIR OWN METADATA. However, A and B WILL NOT SEE ANYTHING
UNDEFINED in their copy of the query if the middleware has
performed the three valued elimination properly. So, if they
see anything undefined, they caught a middleware bug.
Note that the behavior of A and B must be the same whether
they are called by middleware or the original client, since 
A and B don't know or care who is invoking them. 

OK. I think we're still on the same page -- all root 
collections, virtual or real, give errors for queries with 
elements undefined relative to their metadata.

This leaves us with the following issues: 
(1) Should the scope list be one element only, or should it
allow multiple scope elements?
(2) Should the protocol be enhanced to provide the option
of specifying union versus intersection merging of metadata?

My proposal on these two issues is guided by the fact that
I don't want middleware to need or want to offer extensions
to the DASL 1.0 protocol. Since I don't want that, what I
want is:

    (1) the scope list can have multiple elements (that's the
    way it is in the current draft), and 

    (2) to have the protocol optionally provide for 
    specification of union versus intersection merge, in the 
    case there is more than one scope list element, and more 
    than one root collection is involved. 
    (There is no merging if only subcollections of one root 
    collection is involved. I assume that the metadata of 
    all subcollections is exactly the same as the metadata 
    for the root collection of which they are a part. 
    Otherwise, we create a pointless barrier to entry.)
    This is enhancement is not necessary, if you believe 
    that union merge is good enough for 1.0 .

Note that in my proposal, existing DMS's aren't asked to do 
anything they aren't already doing. That is, they are NOT
asked to perform three valued elimination or merging of 
metadata. Only middleware search arbiters are expected to do 
that. (If middleware doesn't do that, it has no reason 
to exist.) 

If a collection is asked to do something it can't 
do, it should give an error. We need to make sure we 
document what the errors are for all such cases. For
example, a client could try to use A as middleware and
put elements from both A and B in the scope list. 
Since A is really only a DMS, not middleware, that must fail.

(Note that DMS's ARE required to perform three
valued LOGIC. Any collections implemented on top of an SQL
engine get that for free. Any that aren't, have to address
the exact same problems that three valued logic solves in any
event. Mandating a mature, widely known and accepted, industry 
standard for the solution to those problems is much better 
than allowing "one of" proprietary solutions to be intermingled, 
thereby negatively impacting interoperability.)


Alan Babich

> -----Original Message-----
> From:	Fisher Mark [SMTP:fisherm@tce.com]
> Sent:	July 01, 1998 10:44 AM
> To:	'Jim Davis'; 'www-webdav-dasl@w3.org'
> Subject:	RE: "contains" must be optional
> 
> Jim Davis writes:
> >What we have yet to agree on is the MEANS by which such optionality
> is
> >provided.  There are at least two alternatives:
> >
> >1) the operator itself is optional.  A server need not recognize it
> >and MUST return an error if it is used.
> >
> >2) Every DASL server MUST recognize every operator, and MUST treat
> the
> >results of applying unimplemented operators as unknown.  So
> implementation
> >is optional, but the syntax isn't.
> 
> There isn't a big gain from (2), as the UI result should be the same
> --
> an empty result set along with an error message.  If (2) is chosen,
> you
> still need to implement (1) for truly unknown (mistyped etc.)
> operators.
> So let's go with (1).
> > ==========================================================
> > Mark Leighton Fisher          Thomson Consumer Electronics
> > fisherm@indy.tce.com          Indianapolis, IN
> > "Browser Torture Specialist, First Class"
> >
Received on Wednesday, 1 July 1998 19:53:05 UTC