RE: property value clarification from Babich, Alan on 1998-11-02 (w3c-dist-auth@w3.org from October to December 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Sun, 1 Nov 1998 19:07:29 -0800
To: "'David G. Durand'" <dgd@cs.bu.edu>, w3c-dist-auth@w3.org
Message-ID: <C3AF5E329E21D2119C4C00805F6FF58F04AF8E@hq-expo2.filenet.com>
"By allowing dead properties to be expressed in XML, we are giving 
people a notation and telling them to use it. They need to know either
that their notation will be preserved or what equivalence class of
notations is being used."

Notation need *not* be preserved. Presumably the
end user is totally oblivious to the protocol being
sent over the wire. Thus, the end user doesn't even know
what the notation on the wire is. Nor should he.
The end user is only concerned with his property values
as he understands them being preserved, and with
their semantics being preserved, and with querying
them. Thus, the property *values* and their semantics 
are what need to be preserved, not some arbitrary 
notation buried in the software of the system that 
goes quietly over the wire in the dark of night.
The value corresponds to your "equivalence class of 
notations", and is a better way to think about the
problem, so that's what I'll do.

It may be that the disconnect in our communications
centers around datatypes. You seem to have one view
(which, to me, seems to be that the data type is a
mystery to the server), while I have another (the
server knows what it is).

I believe that there are two types of property values
in the world, simple values, and compound values.

People think they *already know* what the syntax and
semantics of simple property values are. They have been 
using them for decades. They are integers, strings, 
datetimes, etc. People know, and, therefore, servers know,
exactly what to do with them. That is why doing anything 
different than what people think they already know will 
result in congnitive dissonance (negative transfer of 
training) at best, and failure of the standard at worst. 
It would be bad design in any case.

Compound values are just a hierarchical arrangement
of simple values. Think of C structures (with no
pointer valued fields).

The WebDAV property model is harmonious with this view
of property values.

By induction on the hierarchy, as the hierarchy
is traversed, you encounter nothing but simple values,
and you know the semantics of each one. Therefore,
the server *does* know the semantics of a whole 
*dead* compound property value. (Dead properties
do *not* have null semantics.)

I feel very strongly that it should not be more
complicated than this. By allowing arbitrary XML
documents as property values, it becomes considerably
more complicated that this, and there is no
justification for complexifying the situation.

The WebDAV draft ducked the datatypes issue, 
unfortunately. When an end user does queries, he
has very ingrained expectations about what strings, 
integers, datetime, etc. do -- and we MUST match the end
user's expectations to have a useful design. So, query
forced the issue. Consequently, datatypes show up in 
the DASL draft. It turns out that there is a small 
universal set of simple datatypes adequate for most 
needs.

Furthermore, the XML Data effort is introducing
datatypes to XML (note that I said XML, not just WebDAV). 
I am assuming that the XML Data effort will succeed. 
The XML Data effort uses an attribute to decorate 
the value (i.e., content) of an element with its 
datatype. XML data defines the universal datatypes 
(integer, string, datetime, etc.) and refinements of 
them (eight bit signed integer, etc.). XML Data 
illustrates the best use of attributes -- as decorations 
of the value (i.e., the element content), not as part 
of the value. (Clearly, the datatype of a property
is not part of its value, nor is its name.)

So, the client can tell the server what the datatypes
are of each property in his hierarchical property.
For values that are just bits, servers can typically 
store arrays of binary bytes. If they are truly arrays 
of bytes, querying them is a useless thing to do in most 
cases. However, if some compound data structure is 
obscured by being defined as an array of bytes, then 
the array of bytes should be defined as a compound 
property instead.

Furthermore, in the case of arrays of bytes, byte 
ordering rears its ugly head. If a client running on
a system with a byte order different than that of the 
client that stored the value, it gets an array of
binary bytes that has all the embedded longs, shorts,
floats, doubles, integer64's, etc. byte swapped,
which makes them unusable until they are unswapped.
Byte ordering can be a good reason why an array of 
bytes should have been defined as a hierarchy of simple
values.

(If you think the value is a string, not an array
of bytes, then you don't have byteorder issues.
But, of course, then you have the simple data type called
String, and the server knows about its semantics.)

As far as the server rewriting values, it does not -- it
can, however, return them in a different format. 
The server merely stores and returns the abstract 
values of live and dead properties in a concrete 
serialized form. Any equivalent representation of a simple 
value will do as an input or output value. But, of course, 
the server can *not* "return any damn thing it wants". It
must produce an equivalent value. Clients can reformat 
input values to what the server requires, and can reformat 
output values into whatever format they please. Of course, 
defining a canonical form for values can simplify life 
by reducing possibilities. That is why WebDAV defines the 
"creationdate" property to be in ISO 8601 format, and 
even replicates a tiny part of the 8601 standard in an 
appendix. But DAV rejected forcing this format on servers, 
and left them free to define, accept, and return whatever 
format they like for specific datetime properties. IMHO, 
that is the right choice for DAV. That also accommodates
the way that many existing systems work -- they are
liberal in accepting datetimes, but they always output 
them in a canonical way (which can usually be specified 
by the system administrator).

This is very predictable behavior, not "unpredictable"
behavior.

The way servers operate today is to store the property
values in native binary format and discard any memory of
the character format in which the property value was
expressed on input. No commercial server I aware of stores
property values as XML documents. The property values 
get converted by software on the way in and the 
way out to some human readable form. There are no 
commercial implementations I am aware of that store 
property values as the literal input strings they got 
over the wire or from an application program.
I strongly believe that this will continue to be true 
of commercial systems in the future, i.e., that the
default behavior of most implementations will *not*
be to store the literal XML as the property value.

RDBMS's will have no trouble in principle storing 
hierarchical property values as outlined above, 
because SQL supports the universal set of basic 
datatypes -- integers, strings, datetimes, etc. . All 
you have to do is tell the server what the datatype 
is by using the XML Data's approach, if the server 
doesn't already know the datatype. (There might be a 
default data type, and that default might be String. 
I'd have to look it up to be sure.)

The "game" the IETF plays (and should play) is that
what is sent across the wire is merely an on-the-wire 
representation of the actual state of the thing being
transported. One must to think of the property model 
in the abstract, and forget about XML or any other
serialization format when doing so.
Once the property model (or set of property models)
to be supported is chosen, then the adequacy
of any particular serialization format, e.g., XML,
can be evaluated. That is the proper way for the
design to proceed.

All of the above should not be controversial.

Metadata (information about properties) other than
property name and data type, on the other hand, is not 
directly addressed by any draft in the WebDAV draft 
family. So, retrieval and manipulation of property 
metadata should not be part of the current discussion.
An industrial strength metadata effort would be
a separate effort.

Alan Babich
Received on Sunday, 1 November 1998 22:07:01 UTC