RE: Structure criteria for DASL queries from Babich, Alan on 1998-08-28 (www-webdav-dasl@w3.org from July to September 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Fri, 28 Aug 1998 14:03:54 -0700
To: "'Jim Davis'" <jdavis@parc.xerox.com>, www-webdav-dasl@w3.org
Message-ID: <C3AF5E329E21D2119C4C00805F6FF58F04AEA3@hq-expo2.filenet.com>
> The purpose of this message is to make (or reopen) the case for
> defining a structure criteria (also called "structured query") in
> DASL.

The purpose of this response it to state the case for
why I think that would be a bad idea for DASL 1.0.

> 2. Motivation
> 
> Several of the scenarios for DASL imply criteria that refer 
> to structure.
> These include:
> 
>  * resources that have been locked for more than 1 week.
>  * resources that are collections.

Yes, there are scenarios involving locks and collections. 
However, I believe the only critical scenarios for 1.0 are
filtering query results on whether the resources are 
collections or not. If DASL 1.0 couldn't do any of the
other scenarios involving locks, DASL would still be
extremely useful and quite adequate for 1.0.

> The protocol defines an ad-hoc property 
> dav:iscollection, which
> allows it to test whether a resource is a collection, but 
> this is not the
> same as testing the resourcetype property itself.  Aside from 
> that (which
> is a mere technicality), the real problems with this approach 
> (define an
> ad-hoc property that 'shadows' the structured value) are
> 
> 1) It does not scale well when new structured properties are added.
> Consider that the Advanced Collection team has proposed at 
> least two new
> structure properties, for example.

IMHO the resourcetype property should have been defined to
be a string (or an integer) in the first place. 
There are only two values for it in WebDAV, and Advanced 
Collections only adds two more. It is unfortunate that
resourcetype has an XML value. However, there is still time to
fix the problem with iscollection. The iscollection property
is currently defined to be a Boolean. Obviously, it should
be eliminated in favor of a property which is what the
resourcetype property should have been in the first place:

For example, we could replace iscollection with a membertype 
property of type string that has the possible values 
"collection", "indirect reference", "direct reference", 
"ordinary resource"; Or we could have membertype be an 
integer property that has the values 0, 1, 2, and 3 defined 
for it. Clearly, this approach eliminates the scalability 
problem.

> 
> 2) It is unworkable for structured values with repeated 
> elements. Suppose,
> for example, one wishes to determine whether "Joe" is holding 
> a lock on a
> resource.  You might think that you could define a 'shadow' property
> "lockowner" which holds the value of the owner element within the
> lockdiscovery, but unlike resourcetype, there can be a potentially
> unlimited number of locks on a resource (perhaps not more than one
> exclusive lock, but surely more than shared lock).

It is true that the approach doesn't apply to array
properties, but, then, resourcetype is not an array
property -- it's a scalar. Only the locking stuff has arrays, 
and I have already argued above that we don't need to worry 
about querying locks, only returning them, for 1.0.

> I conclude that such criteria are desirable.

I agree that having the ability to filter results on
array properties and structured properties (I tend to
distinguish these as being separate) is useful and
desirable. However, this desirability does not imply
that we should attempt to standardize it in DASL 1.0.

> 
> At the least, I think it should go into the requirements 
> document, and I
> think it should be mandatory, as there seems to be no way to 
> meet all the
> scenarios without it.

I agree that it is reasonable to put querying arrays
and structured properties in the requirements document,
but not as mandatory requirements -- as "nice to have"
features. The scenarios can similarly be divided into
"critical" and "nice to have" scenarios, and filtering
on locks can be put in the "nice to have" section.

> 
> I see several possible objections to defining such a criteria:
> 
> 1) It is too hard to define.
> 
> We don't know until we try.  It may be easier than you think.
> 
> 2) It is too expensive to implement.
> 
> We don't know until it's defined.  If so, it can be made optional.

I'm sure that it would raise the barrier of entry to most
or all document management systems, SQL based systems,
and other, simpler systems, because server work would be
required in the existing server engines, new API entry points
would have to be defined, and the DASL wedge would have to
call the new API entry points. Therefore, if it were to be done,
it very clearly must be optional, because we want existing
systems to play with WebDAV and DASL. In other words,
we want the implementation effort involved to be limited
to writing a layer of software I call the DASL wedge
that runs on the server and translates between the API
to the existing server engine, and DASL XML on the 
wire -- we don't what to cause any impact on the 
existing server engines if we can avoid it.

> 
> 3) Someone else is already doing it.
> 
> Possibly so, but who?

Major RDBMS vendors, for example.

> I can well believe someone is defining 
> an XML query,
> but this might not necessarily fit the needs of the WebDAV 
> object model.

I believe it probably will.

> Remember, the WebDAV object model is NOT XML.  Second, it's 
> not clear that
> that other group or groups has the same schedule we do.  If 
> as I claim,
> structure criteria is mandatory for DASL's success, 

I don't believe that structure criteria are mandatory for
DASL's success at all. SQL (through SQL 92) doesn't have
array or structure queries, or even the ability to
declare arrays or structured values, and look how successful 
SQL is.

> we may 
> not want to wait
> for them, and we may not want our protocol held hostage to a design we
> don't control and which might change unexpectedly.  Finally, 
> with all due
> humility, it's not clear that we couldn't do just as well, or 
> even better,
> _especially_ if we also create some running code to go along 
> with the design.

It may be that we are so smart that we end up doing a better
job than the RDBMS vendors (however, that's debatable), 
but, even if that were true, it would be irrelevant. When there 
are ANSI standard extensions to SQL, or even implementations 
by major RDBMS vendors that aren't yet standardized by ANSI, 
they will be what is followed, and what DASL defines will be 
ignored, if not easily mappable. In fact, what the RDBMS 
vendors provide is that will be demanded of DASL. A lot of 
implementations in the present, and, I boldly predict, in 
the future, will base their server engine on commercial RDBMS 
and content search engines rather than trying to implement 
a significant part of that functionality themselves.
It is hard to justify doing that kind of work in house
any more. It generally makes better business sense to
leverage the commercial RDBMS's and content search engines.

I will send a separate e-mail putting things into a
broader perspective.

Alan Babich
Received on Friday, 28 August 1998 19:02:39 UTC