RE: design issue: query type checking via DTD from Babich, Alan on 1998-06-02 (www-webdav-dasl@w3.org from April to June 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Mon, 1 Jun 1998 20:43:22 -0700
To: "'Jim Davis'" <jdavis@parc.xerox.com>, www-webdav-dasl@w3.org
Cc: "Babich, Alan" <ABabich@felix.filenet.com>
Message-ID: <72B1992276A9D111A20E00805FEAC96DCC40B2@cm-expo1.filenet.com>
I would like to respond to Jim's mail.

P2 invalidates C2 as an argument against client
side error checking:

Quoting from Jim's mail below, P2 is:
"P2: It enables clients to do client side checking, which is good
because you get a better UI by detecting errors as early as possible.
Advanced UIs are likely to require strong typing anyway, when
prompting for values and accepting input (e.g. to accept only
correctly formatted dates or integers.)"

Similarly, C2 is:
"C2. Servers can't rely on clients to do the syntax checking anyway, as
some clients won't bother, and others may be either buggy or malicious"

It is a fact that servers can't rely on clients to do 
complete syntax and semantic checking. However, this 
in no way reduces the desirability or the validity
of client side error checking. It is a fact that
quality client side implementations will make as
many validity checks on the client side as they
reasonably can. Therefore, C2, while correct as
a statement of fact, is invalid as a line of
reasoning against client side error checking.

This eliminates C2.

Similarly, C3 is (almost) valid as a fact, but invalid as
a line of reasoning against client side error checking. 
C3 is:

"C3. A DTD may not be sufficient for type-checking, because WebDAV
properties
are not typed."

In fact, in reality all properties, WebDAV or not, are
of some type. So, C3 is slightly misstated.
The correct statement of C3 would include something like
"WebDAV" is silent on the types of properties."
It is a fact that my proposal will not make it possible
catch all errors, but given XML, catching all errors is 
impossible, anyway, and, therefore, is not an issue.
You don't have to catch every single possible error
on the client side to have something very valuable
and very important.

This eliminates C3.

The reasoning given for C4 stated within C4
is invalid. C4 is:

"C4. A DTD is not sufficient for all error checking, since for example
a property might not be defined on a given resource."

The first part, "a DTD is not sufficient for all 
error checking" is true. However, the second part 
does not give a valid reason why that is true, 
because it is not an error to mention a property 
in a query that doesn't exist on some of the 
resources. In fact, it is extremely valuable to 
be able to do so, and to have the query make sense.
Even if a correct example of support were given
within C4, C4 WOULD STILL BE AN INVALID LINE
OF REASONING as to why client side error checking
is not valuable, since by P2 it is, in fact, valuable.

This eliminates C4, leaving only C1.

Before we discuss C1, let is explore further 
my assertion above that missing properties are not a 
problem.

Suppose one wants to write a DASL query application
using pulldown menus. A query could
look something like the following:

|------------|-----------|-------------|-----------|
|  author    |     =     |  "Smith"    |   AND     |
|------------|-----------|-------------|-----------|
|  title     |  LIKE     |  "*Dogs*"   |           |
|------------|-----------|-------------|-----------|

This is a common UI style many of us are familiar
with. The end user pulls down column 1 to get
a list of all the possible properties. Let us
suppose that the user can not type the name of a
property "freehand" into a box in column 1 -- the user
is forced to make a selection from the pulldown.
Similarly, the user is forced to make a choice from
the pulldown for columns 2 and 4 as well. The
only box the user can type freehand in is column 3.
And, for column 3, the UI gives an error if the type 
of the constant the user types in is incorrect.

This is not an oddball case. This is a mainstream
case, and I believe we should make it possible
(possible, but not mandatory). In order for this
UI to be implementable, the collection must be
able to advertise its properties and the operators
it supports. Advertising of query capabilities 
is required by the DASL charter. The client UI code
pulls this information across and saves it in order
to enforce the constraints. THEN, THE END USER
IS PRECLUDED FROM MAKING A WHOLE BUNCH OF MISTAKES
by this UI. This is extremely valuable.

I sort of wish this type of UI were always 
possible. Sort of. However, it is not always
possible, because not all collections are capable 
enough to advertise all their properties. We want 
DASL (and WebDAV) to apply to collections that are 
quite primitive by document management system 
standards, e.g., a collection of files in a 
hierarchical file system. Then, it is asking 
a bit much, in that case, to REQUIRE the 
collection to advertise all its properties. 

In some file systems, files can have
properties. But files in a directory are a
heterogeneous set of things, and it is not
reasonable to expect any sort of discipline
to be enforced. So, users could throw in files
with never before seen properties any time.
The server software would have to scan every
file in the file system to discover all the
properties, and would either have to periodically
rescan all the files, or have a hook so that
every time a file was added, it would check
if any new properties were added.

So, what I expect DASL has to say about such
collections is that they are allowed (but not
required) to limit their advertisement to only 
the properties defined and
required by WebDAV, i.e., that the set of 
properties they advertise can be incomplete.

Therefore, users are sometimes going to
to guess at the name of properties that might
exist on such collections and construct
queries against them. This can not be
distinguished by the software from the 
error case where the user mistypes the 
name of the property.

In theory, there is a way to eliminate 
the typing error: The client side software 
could discover all the properties that exist 
at an instant in time by performing a query 
that selects "*" (i.e., all the properties), 
and has a "where" condition of "TRUE".
The "where" condition being "TRUE" includes
all resources in the answer set, and every
resource returns all its properties. The client
side can make a list of all the different
properties encountered, and force the user to
choose properties from a pulldown. However, 
for a large collection, this would be bad 
from a performance and response time point 
of view, so I don't think DASL should require 
that approach.

Therefore, DASL should allow queries where the
user guesses properties. Furthermore, since
the resources in a collection are permitted
to be heterogeneous, we want queries to make
sense on heterogeneous collections. We also
want to be able to query across multiple
collections in the same query, and that
increases the probability that the total
set of resources is heterogeneous. 

The point is this: IT MUST NOT BE A FATAL ERROR 
IF A QUERY MENTIONS A PROPERTY THAT DOES NOT 
EXIST ON A RESOURCE, whether because it is
null, or because it is not defined for that
resource at all. That is a situation which will
be normally encountered, and it is dealt with
in the industry standard way, i.e. ANSI standard
three valued logic. 

For example, suppose a collection contains 
books and reviews of those books.
(This reminds me very much of amazon.com .)
The properties on books include title, 
author, and publisher. The properties on 
memos include author and which_book. 
(Reviews don't themselves have titles,
but they are about one particular book.)

Suppose we want all books and all reviews that
were written by "Smith". Then we would have
the query 

    SELECT title, publisher, which_book, author 
    FROM the whole collection
    WHERE author = "Smith"

When a resource doesn't have a value for a
property that is selected, that must be so indicated
in the result row. One thing DASL could do is to
follow the WebDAV approach and return an embedded
error in the result row like PropFind does.
I'm not familiar with the details, but I seem
to remember that 404 is "property not found".
We could return that embedded error, or, we could 
return two different errors -- one for 
"is defined but is null in this resource" or
a different error for "is not defined at all for this
resource". Personally, I prefer two separate errors.

For example, for books, "title", "author",
and "publisher" would be defined and have a value, but
"which_book" would not be defined. For memos, author
and which_book would be defined and have a value, 
but title and publisher would not be defined.

Suppose we just want to find all books written
by Smith. Then I assume, for this example, that
we will have the "IS_NULL" and "PROPERTY_IS_DEFINED" 
operators in DASL. The query would be

    SELECT title, author, publisher
    FROM the whole collection
    WHERE author = "Smith" AND 
          PROPERTY_IS_DEFINED(publisher)

Similarly, if we want to find all reviews
written by Smith, we would have

    SELECT author, which_book
    FROM the whole collection
    WHERE author = "Smith" AND
          PROPERTY_IS_DEFINED(which_book)

Having disposed of C2 through C4, only C1 remains
as potentially valid. However, this e-mail is already 
lengthy, so I will send another e-mail discussing C1.

                     ---


> -----Original Message-----
> From:	Jim Davis [SMTP:jdavis@parc.xerox.com]
> Sent:	May 29, 1998 3:26 PM
> To:	www-webdav-dasl@w3.org
> Subject:	design issue: query type checking via DTD
> 
> This message attempts to summarize a design issue that has been
> discussed at some length in this forum, but not yet resolved.  This
> message summarizes the design issue and the arguments that have thus
> far been advanced for and against it.
> 
> Appropriate responses are:
>  * The design issue is not correctly stated, but should instead be ...
>  * An Additional argument (pro or con) is ...
>  * Argument X is misstated, it should be ...
> 
> Issue: should the DASL DTD for query be defined such that one can do
> type-checking on a query simply by validating the query against the
> DTD?
> 
>   Context: This issue is, in some sense, a followon to an earlier
>   issue summarized in "Design issue: polymorphism".  That issue was
>   whether DASL operators should be polymorphic.  There, the chief
>   argument against such polymorphism was that it prevented type
>   checking.  We now have a proposed DTD (see Alan Babich's email of
>   Sun, 10 May 1998 18:45:18 PDT) that is polymorphic while still
>   allowing type checking, so that particular design issue is moot.
>   While some have objected to the complexity of that DTD, its claimed
>   advantage is that it supports type checking.  This raises the
>   logically prior issue of whether a DTD *should* support type
>   checking.  If we agree that it should, we can then debate specific
>   proposed DTDs that do that.  If we agree that it needn't (or
>   shouldn't) then we should consider different DTDs.
> 
> Arguments are labelled P for Pro and C for Con.
> 
> P1. It makes error checking easy, because it suffices to use a
> validating XML parsers, which is easily obtained.
> 
> P2. It enables clients to do client side checking, which is good
> because you get a better UI by detecting errors as early as possible.
> Advanced UIs are likely to require strong typing anyway, when
> prompting for values and accepting input (e.g. to accept only
> correctly formatted dates or integers.)
> 
>   [This is really an argument for strongly typed queries in general,
>    not for the specific approach of using a DTD for such strong
>    typing.]
> 
> P3. It relies only on core XML features, as opposed to extensions or
> conventions such as XML-Data that are still being designed.
> 
> C1. It makes the query syntax more complicated.  Compare
> 
> <where>
>   <boolean_op>
>     <eq>
>       <string_expr>
>         <string_prop>dav:getcontenttype</string_prop>
>       <string_expr>
>       <string_expr>
>         <string_const>image/gif</string_const>
>       </string_expr>
>     </eq>
>   </boolean_op>
> </where>
> 
> with 
> 
> <where>
>     <eq>
>       <prop>dav:getcontenttype</prop>
>       <string_const>image/gif</string_const>
>     </eq>
> </where>
> 
> 
> C2. Servers can't rely on clients to do the syntax checking anyway, as
> some clients won't bother, and others may be either buggy or malicious
> 
> C3. A DTD may not be sufficient for type-checking, because WebDAV
> properties
> are not typed.
> 
> C4. A DTD is not sufficient for all error checking, since for example
> a property might not be defined on a given resource.
>
Received on Monday, 1 June 1998 23:45:39 UTC