- From: Babich, Alan <ABabich@filenet.com>
- Date: Fri, 20 Nov 1998 18:21:28 -0800
- To: "'jamsden@us.ibm.com'" <jamsden@us.ibm.com>, w3c-dist-auth@w3.org
Just to keep the discussion of this thread in perspective, let it be clear that I (along with everyone else, apparently) believe that it would be a good thing for DASL to query hierarchical WebDAV properties -- *eventually*. If that's true, then the real issue of this e-mail thread is "on what release of DASL". I personally believe that we should wait for 1.x . Others believe that we should try to jam it into 1.0. Having clarified what I think is the real issue, I shall now attempt to unconfuse the *secondary* discussions about property models, the meaning of the word "structure", etc. . <Jim Amsden> The S in SQL stands for Structured Query Language. A table captures this structure, and its format is defined by a schema. </Jim Amsden> Jim and I apparently use different meanings for the word "structure". When I say "structure" in this thread, I mean specifically the hierarchical organization of values of WebDAV properties. In an RDBMS, that would map to hierarchical nesting of tables (i.e., a denormalized relational databases). It is a fact that no ANSI standard version of SQL through SQL 92 yet has the ability to define, let alone query, nested tables. (I believe SQL 92 is the most recent ANSI standard version of SQL.) Another interesting issue for query is repeated values. It is a fact that no ANSI standard version of SQL has arrays of integers, floating or fixed point numbers, or datetimes. SQL has strings and bitfields, but these are not treated as arrays of characters or arrays of bits. They are treated as a whole, i.e., they are treated as scalars. Standard SQL doesn't have arrays. Therefore, it is true that the "S" in "SQL" literally stands for "structured", but not structured in the sense of hierarchical WebDAV property values. In other words, the "S" in "SQL" means something like "the data is structured into tables with scalar valued columns". This meaning of "structure" is less general than WebDAV "structured property values", since standard SQL tables are flat. Therefore, the argument that the "S" in "SQL" stands for "structured" is irrelevant to the question of DASL querying "hierarchically structured" WebDAV properties. Of course, many of you have seen recent releases of commercial RDBMS's with the ability to defined nested tables (i.e., denormalized relational databases) and the ability to query them. They typically have arrays as well. These features are proprietary extensions to SQL, and are not part of the SQL standard. Denormalized relational databases are useful, but the point is that the acronym SQL was coined *many years ago*, and had nothing to do with nested tables when it was coined. These proprietary implementations should be called "SQL with proprietary extensions" to avoid semantic confusion. A denormalized RDBMS can, of course, represent hierarchical WebDAV properties directly using nested tables. In non normalized RDBMS's one typically achieves the effect of nested tables by using flat tables and a convention for joining on a "row instance" property. This is indirect representation of hierarchically structured property values, and the "S" in "SQL" does not refer to this particular schema definition technique (or any other). <Jim> So I guess those years of experience support my claim. </Jim> Quite the opposite. The confusion arises from our different meanings of the word "structure" as explained above. There are many, many more years of experience with SQL *without* nested tables (or arrays) that have proved its usefulness than there are months or a few of years of experience with the relatively recent releases of RDBMS's with proprietary extensions for nested tables (and arrays). Obviously, hierarchies were not a 1.0 feature of SQL, and didn't exist for very many years. This makes it very plausible that DASL could get away without queries on hierarchically structured values for its first release. <Jim> With XML, we'll have the real business data and meta-data (in WebDAV properties) to search on and can make much better searches. </Jim> This seems to conflate property values with metadata. A WebDAV resource has property *values* and, possibly, content. The metadata is information about the *form and meaning* of the property values -- data type (e.g., integer), maximum and minimum values, list of legal values, property name, domain of values (e.g., the subset of integers that are zipcodes), character set, natural language, whether string comparisons are case sensitive, a narrative description of what the property is and how it should be used, etc., etc. . All that is very different from the property values themselves. The metadata is analogous to the schema in an RDBMS. It is invariant now matter what values are in the columns of the rows. The usual paradigm is (1) suck up the metadata once when your client program starts up (if it's not already hardwired into your client application), and (2) send and receive property values and content from then on, not metadata. Searching on content is good. (Find documents containing "computer".) Searching on property values is good. (Find documents authored by "Joe".) You find resources (e.g., documents) in both cases. Searching metadata is a qualitatively different operation: (Find all the properties having data type "integer". Or find all the properties whose domain is "zipcode". Or find out all about the definition of the Author property.) Searching metadata is much less important that simply retrieving all of it at once for client program initialization purposes. Retrieval and maintenance of metadata is important, and there are a lot of issues to address including the property model. The normal cases are two: (1) High end document management systems have well defined schemas, and only the docbase administrator can change the schema, because organization and discipline are critical for enterprise wide applications. (2) Some low end systems have hard wired properties and don't allow clients to define additional properties. In both cases, the server has all the metadata -- it knows about all the properties, and enforces consistency (e.g, it won't allow the value of the Author property to be an integer). The rare case is that any old client can define any old property any old time any time it wants to. This would probably only be workable in a personal document management system, or a very small work group. Even in that case, it is not strictly necessary or even a good idea to send the metadata along with the data at the same time. The property metadata only has to be sent once no matter how many resources have or will have that property. The high end document management systems will have some protocol for the docspace administrator to update and define the metadata. If and when some WebDAV subcommittee defines such a protocol, that would be a general solution that could be used for any WebDAV server that allowed clients to define their own properties. We clearly can't take the time to address all these issues in DASL 1.0 . <Jim> SQL is exactly analogous to XML and DTDs. But since XML is a tagged language, it carries an instance of the schema with each document. </Jim> No, not "exactly". For one thing, XML can describe nested structures. ANSI standard SQL can not. Only the proprietary denormalized RDBMS's can. For another, the database schema is never factored out of a SQL database and stored separately. It is always embedded in the database. In contrast, the same DTD file could be shared by multiple XML documents. There are other differences as well. *More importantly*, the XML representation of the value of a WebDAV property indicates only *small part* of the metadata of the property -- the hierarchical structure of the value. The hierarchical structure of the value is nowhere near the whole metadata. (That is important for efficiency. DBMS's have known that for decades, which is why they do *not* store the entire metadata of the table with each row, nor do they send the complete metadata over the network with each creation of a new row or retrieval of a row). XML, even with DTD, can not even represent the most basic part of the metadata, the base data type, without adding some conventions to the protocol or formally extending XML to include data types. Data types are just the beginning of defining the metadata. Some people are erroneously conflating property value models with XML, conflating serialization formats with XML, and conflating metadata models with XML. These are all distinct. To untangle the confusion on those and other related issues would take far more time than DASL can afford on 1.0 . Fortunately, that is not necessary -- DASL is extremely useful for the most common types of queries that WebDAV needs to support. So what if DASL 1.0 can't do any possible query that anyone could ever think of? Why should DASL 1.0 have to? No standard or proprietary implementation of SQL can do all possible queries anyone has ever thought of, and SQL has been around a very, very long time. For example, in SQL 92 or 89, try to express "find the properties of the employee ranked tenth in decreasing order of salary *and no others*" in a single query. (Remember that if two people make the maximal salary, there is no employee ranked second.) You can't do that in SQL with a single query (assuming there is no preexisting salary rank column). (If you think you can, send me e-mail, and if you're right, I'll find a different example.) The list of queries SQL can NOT do is endless. Obviously, that didn't stop SQL from being extremely useful. Nor would the temporary absence of the ability to query hierarchical WebDAV property values stop DASL 1.0 from being extremely useful. Alan Babich
Received on Friday, 20 November 1998 21:21:44 UTC