- From: Babich, Alan <ABabich@filenet.com>
- Date: Sun, 1 Nov 1998 19:07:29 -0800
- To: "'David G. Durand'" <dgd@cs.bu.edu>, w3c-dist-auth@w3.org
"By allowing dead properties to be expressed in XML, we are giving people a notation and telling them to use it. They need to know either that their notation will be preserved or what equivalence class of notations is being used." Notation need *not* be preserved. Presumably the end user is totally oblivious to the protocol being sent over the wire. Thus, the end user doesn't even know what the notation on the wire is. Nor should he. The end user is only concerned with his property values as he understands them being preserved, and with their semantics being preserved, and with querying them. Thus, the property *values* and their semantics are what need to be preserved, not some arbitrary notation buried in the software of the system that goes quietly over the wire in the dark of night. The value corresponds to your "equivalence class of notations", and is a better way to think about the problem, so that's what I'll do. It may be that the disconnect in our communications centers around datatypes. You seem to have one view (which, to me, seems to be that the data type is a mystery to the server), while I have another (the server knows what it is). I believe that there are two types of property values in the world, simple values, and compound values. People think they *already know* what the syntax and semantics of simple property values are. They have been using them for decades. They are integers, strings, datetimes, etc. People know, and, therefore, servers know, exactly what to do with them. That is why doing anything different than what people think they already know will result in congnitive dissonance (negative transfer of training) at best, and failure of the standard at worst. It would be bad design in any case. Compound values are just a hierarchical arrangement of simple values. Think of C structures (with no pointer valued fields). The WebDAV property model is harmonious with this view of property values. By induction on the hierarchy, as the hierarchy is traversed, you encounter nothing but simple values, and you know the semantics of each one. Therefore, the server *does* know the semantics of a whole *dead* compound property value. (Dead properties do *not* have null semantics.) I feel very strongly that it should not be more complicated than this. By allowing arbitrary XML documents as property values, it becomes considerably more complicated that this, and there is no justification for complexifying the situation. The WebDAV draft ducked the datatypes issue, unfortunately. When an end user does queries, he has very ingrained expectations about what strings, integers, datetime, etc. do -- and we MUST match the end user's expectations to have a useful design. So, query forced the issue. Consequently, datatypes show up in the DASL draft. It turns out that there is a small universal set of simple datatypes adequate for most needs. Furthermore, the XML Data effort is introducing datatypes to XML (note that I said XML, not just WebDAV). I am assuming that the XML Data effort will succeed. The XML Data effort uses an attribute to decorate the value (i.e., content) of an element with its datatype. XML data defines the universal datatypes (integer, string, datetime, etc.) and refinements of them (eight bit signed integer, etc.). XML Data illustrates the best use of attributes -- as decorations of the value (i.e., the element content), not as part of the value. (Clearly, the datatype of a property is not part of its value, nor is its name.) So, the client can tell the server what the datatypes are of each property in his hierarchical property. For values that are just bits, servers can typically store arrays of binary bytes. If they are truly arrays of bytes, querying them is a useless thing to do in most cases. However, if some compound data structure is obscured by being defined as an array of bytes, then the array of bytes should be defined as a compound property instead. Furthermore, in the case of arrays of bytes, byte ordering rears its ugly head. If a client running on a system with a byte order different than that of the client that stored the value, it gets an array of binary bytes that has all the embedded longs, shorts, floats, doubles, integer64's, etc. byte swapped, which makes them unusable until they are unswapped. Byte ordering can be a good reason why an array of bytes should have been defined as a hierarchy of simple values. (If you think the value is a string, not an array of bytes, then you don't have byteorder issues. But, of course, then you have the simple data type called String, and the server knows about its semantics.) As far as the server rewriting values, it does not -- it can, however, return them in a different format. The server merely stores and returns the abstract values of live and dead properties in a concrete serialized form. Any equivalent representation of a simple value will do as an input or output value. But, of course, the server can *not* "return any damn thing it wants". It must produce an equivalent value. Clients can reformat input values to what the server requires, and can reformat output values into whatever format they please. Of course, defining a canonical form for values can simplify life by reducing possibilities. That is why WebDAV defines the "creationdate" property to be in ISO 8601 format, and even replicates a tiny part of the 8601 standard in an appendix. But DAV rejected forcing this format on servers, and left them free to define, accept, and return whatever format they like for specific datetime properties. IMHO, that is the right choice for DAV. That also accommodates the way that many existing systems work -- they are liberal in accepting datetimes, but they always output them in a canonical way (which can usually be specified by the system administrator). This is very predictable behavior, not "unpredictable" behavior. The way servers operate today is to store the property values in native binary format and discard any memory of the character format in which the property value was expressed on input. No commercial server I aware of stores property values as XML documents. The property values get converted by software on the way in and the way out to some human readable form. There are no commercial implementations I am aware of that store property values as the literal input strings they got over the wire or from an application program. I strongly believe that this will continue to be true of commercial systems in the future, i.e., that the default behavior of most implementations will *not* be to store the literal XML as the property value. RDBMS's will have no trouble in principle storing hierarchical property values as outlined above, because SQL supports the universal set of basic datatypes -- integers, strings, datetimes, etc. . All you have to do is tell the server what the datatype is by using the XML Data's approach, if the server doesn't already know the datatype. (There might be a default data type, and that default might be String. I'd have to look it up to be sure.) The "game" the IETF plays (and should play) is that what is sent across the wire is merely an on-the-wire representation of the actual state of the thing being transported. One must to think of the property model in the abstract, and forget about XML or any other serialization format when doing so. Once the property model (or set of property models) to be supported is chosen, then the adequacy of any particular serialization format, e.g., XML, can be evaluated. That is the proper way for the design to proceed. All of the above should not be controversial. Metadata (information about properties) other than property name and data type, on the other hand, is not directly addressed by any draft in the WebDAV draft family. So, retrieval and manipulation of property metadata should not be part of the current discussion. An industrial strength metadata effort would be a separate effort. Alan Babich
Received on Sunday, 1 November 1998 22:07:01 UTC