RE: Does DASL need to support structured queries? from Babich, Alan on 1998-11-21 (w3c-dist-auth@w3.org from October to December 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Fri, 20 Nov 1998 18:21:28 -0800
To: "'jamsden@us.ibm.com'" <jamsden@us.ibm.com>, w3c-dist-auth@w3.org
Message-ID: <C3AF5E329E21D2119C4C00805F6FF58F04AFEA@hq-expo2.filenet.com>
Just to keep the discussion of this thread in perspective, 
let it be clear that I (along with everyone else, 
apparently) believe that it would be a good thing for DASL 
to query hierarchical WebDAV properties -- *eventually*. If 
that's true, then the real issue of this e-mail thread is 
"on what release of DASL". I personally believe that we 
should wait for 1.x . Others believe that we should try 
to jam it into 1.0.

Having clarified what I think is the real issue, I shall 
now attempt to unconfuse the *secondary* discussions about 
property models, the meaning of the word "structure", etc. .

<Jim Amsden>
The S in SQL stands for Structured Query Language. A table 
captures this structure, and its format is defined by a schema.
</Jim Amsden>

Jim and I apparently use different meanings for the
word "structure". When I say "structure" in this thread, 
I mean specifically the hierarchical organization of values 
of WebDAV properties.  In an RDBMS, that would map to 
hierarchical nesting of tables (i.e., a denormalized 
relational databases). It is a fact that no ANSI standard 
version of SQL through SQL 92 yet has the ability to define,
let alone query, nested tables. (I believe SQL 92 is the 
most recent ANSI standard version of SQL.) Another interesting 
issue for query is repeated values. It is a fact that no 
ANSI standard version of SQL has arrays of integers, 
floating or fixed point numbers, or datetimes. SQL has 
strings and bitfields, but these are not treated as arrays 
of characters or arrays of bits. They are treated as a whole,
i.e., they are treated as scalars. Standard SQL doesn't
have arrays.

Therefore, it is true that the "S" in "SQL" literally stands 
for "structured", but not structured in the sense of 
hierarchical WebDAV property values. In other words, the 
"S" in "SQL" means something like "the data is structured 
into tables with scalar valued columns".
This meaning of "structure" is less general than WebDAV 
"structured property values", since standard SQL tables are 
flat. Therefore, the argument that the "S" in "SQL" stands 
for "structured" is irrelevant to the question of DASL querying
"hierarchically structured" WebDAV properties.

Of course, many of you have seen recent releases of 
commercial RDBMS's with the ability to defined nested tables 
(i.e., denormalized relational databases) and the ability to 
query them. They typically have arrays as well. These features 
are proprietary extensions to SQL, and are not part of the 
SQL standard. Denormalized relational databases 
are useful, but the point is that the acronym SQL was coined 
*many years ago*, and had nothing to do with nested tables 
when it was coined. These proprietary implementations should 
be called "SQL with proprietary extensions" to avoid semantic 
confusion. A denormalized RDBMS can, of course, represent 
hierarchical WebDAV properties directly using nested tables.

In non normalized RDBMS's one typically achieves the effect 
of nested tables by using flat tables and a convention for
joining on a "row instance" property. This is indirect
representation of hierarchically structured property
values, and the "S" in "SQL" does not refer to this 
particular schema definition technique (or any other).

<Jim>
So I guess those years of experience support my claim.
</Jim>

Quite the opposite. The confusion arises from our different
meanings of the word "structure" as explained above. 
There are many, many more years of experience with SQL 
*without* nested tables (or arrays) that have proved its 
usefulness than there are months or a few of years of
experience with the relatively recent releases of RDBMS's 
with proprietary extensions for nested tables (and arrays). 
Obviously, hierarchies were not a 1.0 feature of SQL, and 
didn't exist for very many years. This makes it very plausible 
that DASL could get away without queries on hierarchically 
structured values for its first release.

<Jim>
With XML, we'll have the real business data and meta-data (in WebDAV
properties) to search on and can make much better searches.
</Jim>

This seems to conflate property values with metadata.
A WebDAV resource has property *values* and, possibly, content.
The metadata is information about the *form and meaning* 
of the property values -- data type (e.g., integer), 
maximum and minimum values, list of legal values, 
property name, domain of values (e.g., the subset of 
integers that are zipcodes), character set, natural 
language, whether string comparisons are case
sensitive, a narrative description of what the property
is and how it should be used, etc., etc. . All that is 
very different from  the property values themselves. 
The metadata is analogous to the schema in an RDBMS. 
It is invariant now matter what values are in the columns 
of the rows.

The usual paradigm is (1) suck up the metadata once when your
client program starts up (if it's not already hardwired 
into your client application), and (2) send and receive 
property values and content from then on, not metadata.

Searching on content is good. (Find documents containing 
"computer".) Searching on property values is good. (Find
documents authored by "Joe".) You find resources (e.g., 
documents) in both cases. Searching metadata is a 
qualitatively different operation: (Find all the 
properties having data type "integer". Or find all 
the properties whose domain is  "zipcode". Or find 
out all about the definition of the Author property.) 
Searching metadata is much less important that simply 
retrieving all of it at once for client program
initialization purposes.

Retrieval and maintenance of metadata is important, and
there are a lot of issues to address including the property
model. The normal cases are two: (1) High end document
management systems have well defined schemas, and only the
docbase administrator can change the schema, because
organization and discipline are critical for enterprise
wide applications. (2) Some low end systems have hard
wired properties and don't allow clients to define additional
properties. In both cases, the server has all the metadata
-- it knows about all the properties, and
enforces consistency (e.g, it won't allow the value of
the Author property to be an integer). 

The rare case is that any old client can define 
any old property any old time any time it wants to. 
This would probably only be workable in a personal 
document management system, or a very small work group. 
Even in that case, it is not strictly necessary or even 
a good idea to send the metadata along with the data 
at the same time. The property metadata only has to be
sent once no matter how many resources have or will have
that property. The high end document management systems 
will have some protocol for the docspace administrator 
to update and define the metadata. If and when some 
WebDAV subcommittee defines such a protocol, that would 
be a general solution that could be used for any WebDAV 
server that allowed clients to define their own properties.

We clearly can't take the time to address all these 
issues in DASL 1.0 .

<Jim>
SQL is exactly analogous to XML and DTDs. But since XML is a 
tagged language, it carries an instance of the schema with
each document.
</Jim>

No, not "exactly". For one thing, XML can describe nested
structures. ANSI standard SQL can not. Only the proprietary
denormalized RDBMS's can. For another, the database schema
is never factored out of a SQL database and stored separately. 
It is always embedded in the database. In 
contrast, the same DTD file could be shared by multiple 
XML documents. There are other differences as well.

*More importantly*, the XML representation of the value of a 
WebDAV property indicates only *small part* of the metadata 
of the property -- the hierarchical structure of the value. 
The hierarchical structure of the value is nowhere near the 
whole metadata. (That is important for efficiency. DBMS's 
have known that for decades, which is why they do *not* store 
the entire metadata of the table with each row, nor do they 
send the complete metadata over the network with each creation 
of a new row or retrieval of a row). XML, even with DTD, 
can not even represent the most basic part of the metadata, 
the base data type, without adding some conventions to the 
protocol or formally extending XML to include data types. 
Data types are just the beginning of defining the metadata.

Some people are erroneously conflating property value 
models with XML, conflating serialization formats with XML, 
and conflating metadata models with XML. These are all 
distinct. To untangle the confusion on those and other 
related issues would take far more time than DASL can 
afford on 1.0 .

Fortunately, that is not necessary -- DASL is extremely 
useful for the most common types of queries that WebDAV 
needs to support. 

So what if DASL 1.0 can't do any possible query that 
anyone could ever think of? Why should DASL 1.0
have to? No standard or proprietary implementation of SQL 
can do all possible queries anyone has ever thought of, and
SQL has been around a very, very long time. 

For example, in SQL 92 or 89, try to express "find the properties
of the employee ranked tenth in decreasing order of salary 
*and no others*" in a single query. (Remember that if two 
people make the maximal salary, there is no employee ranked 
second.) You can't do that in SQL with a single query (assuming 
there is no preexisting salary rank column). (If you think you 
can, send me e-mail, and if you're right, I'll find a different 
example.) The list of queries SQL can NOT do is endless. 
Obviously, that didn't stop SQL from being extremely useful. 
Nor would the temporary absence of the ability to query 
hierarchical WebDAV property values stop DASL 1.0 from being 
extremely useful.

Alan Babich
Received on Friday, 20 November 1998 21:21:44 UTC