RE: property value clarification from David G. Durand on 1998-10-31 (w3c-dist-auth@w3.org from October to December 1998)

From: David G. Durand <dgd@cs.bu.edu>
Date: Sat, 31 Oct 1998 19:36:25 -0400
To: w3c-dist-auth@w3.org
Message-Id: <v04011701b26147afa625@[24.0.249.126]>
At 8:01 PM -0400 11/1/98, Babich, Alan wrote:
>"Frankly, accepting full XML properties seems like a no-brainer to me:
>the
>parsers are small, and the data model is simple. (RDBMs are going to
>have
>much more trouble handling nesting than they ever will with
>attributes)."

>I disagree with both sentences. It would be a very
>large mistake to burden the DAV property model with
>the artifacts of XML (e.g., saving original property
>value representation including whitespace,
>recording attributes, etc.), and to require servers to
>handle arbitrary XML documents as property values.
>Servers do not store property values as XML documents
>for very good reasons, e.g., efficiency of storage,
>efficiency of processing, and simplicity of concepts
>for implementers and end users. Servers should not
>have to be burdened with the requirement of storing
>arbitrary XML documents as property values.
>Just as XML is a poor format for storing some types
>of document content (e.g., images), it is not the
>best format for storing simple properties or
>supporting queries against them.

Right, for those properties with server-understood semantics, the server
can do anything that makes sense. That was what I was explicitly proposing.\

You refer to "simple" properties. But an unknown property is neither simple
nor complex -- it's semantically meaningless in the framework of the
server's ontology, and the server must have a clear definition of the
limits to any transformations that it may want to apply.

However, many clients will want to store specialized properties for
resources. All servers will not be aware of the semantics of those
properties. This is one of the reasons for dead properties, right?

When they do that, they are providing a representation of a structure. By
choosing a notation, we delimit the sorts of structures we allow people to
attach. Since the server does _not_ know the semantics of the data inside
the property tag, it must preserve the data inside in a way that is
fauthful to XML semantics. Otherwise, I _can't_ get useful behavior out of
the server. If it is legitimate for the server to rewrite any occurrence of
"04/28/61" to "April 28. 1961", then I can't count on the server. Less
fancifully, if I attach an XML-structured change log to a resource, the
server can't arbitrarily add or delete whitespace or anything else, unless
it knows the semantics of that property.

I don't see that this _can_ be controversial. If the server knows that a
property is a date it can do anything it wants with that date (within the
semantics of the date's function). But there are many kinds of properties
that a server will not deal with (like some kinds of metadata).

When asked to store an unkown property, the server should do exactly that:
store it!

>Fortunately, the DAV property model is *not* XML.
>Yaron tried to make that clear in his e-mail.

If it walks like XML, and quacks like XML, it is XML. The only way you can
avoid this issue is by banning the use of tags not explicitly defined in
the protocol -- because no predictable behavior will exist for tags not
defined in the protocol. This is the kind of thing people are doing for EDI
and other protocols. If we leave the "dead property" field open to
arbitrary XML, then we need to respect the data content inplicit in those
(small) XML documents.

We could simply disallow dead properties, and have a special tag that
refers a property value to a URI where that value is to be found: then
dead, user-defined properties would just be resources, and the server could
store them and send them back on request.

>The draft does not completely constrain the property
>model. IMHO, that is OK for this draft. The property
>model is and should be independent of any particular
>serialization format that happens to be chosen for
>the protocol.

However, it must be defined how the dead properties are treated by the
protocol, and that implies that whatever the serialization format is, it
must either:

1. return a byte-identical string to the one that appeared when the client
set a dead property; or
2. define in what precise ways the property set may be different from the
one retrieved later.

Point 1 is of course the simplest way to answer the definitional
requirement in point 2.

Not answering that question makes dead properties almost semantically null,
since the server can return any damn thing it wants, regardless of the
property value that was set. If this is not to be true, then we need to
define the limits on the transformations the server can apply. Character
encoding only is very resonable. Some might prefer to use the canonical XML
format that I mentioned, as a way to define the significant data in the
document. Some might prefer the DOM. Just saving the bytes literally is
always safe, and strikes me as the most likely default implementation of
many servers.

> The property model should not be
>overconstrained: It should accommodate what people
>have done and are doing, and much of what they
>would like to do. I would welcome a little bit
>more clarification of the property model in future
>work. For example, it would be helpful if the draft
>explicitly said that the property model is not XML.
>That would avoid a lot of confusion.

It sounds to me like a lot of confusion is being created by an insistence
on a strange and silly restriction that doesn't even have normative text to
substantiate it. If the text does exist, I think we need to delete it, for
the reasons I've given, or fix the standard in one of the ways I've
suggested, or some other way.

By allowing dead properties to be expressed in XML, we are giving people a
notation and telling them to use it. They need to know either that their
notation will be preserved or what equivalence class of notations is being
used.

>Regarding the second sentence, dealing with nesting
>is not more difficult than dealing with attributes for
>RDBMS's. Dealing with a statically defined property
>structure or a statically defined set of attributes
>is straightforward. What is a pain is dealing with
>arbitrary nested properties and arbitrary XML attributes
>discovered dynamically at run time, because columns and
>tables will have to be created dynamically by the server.

Right. XML is not limited to expressing such things. Therefore there is a
mismatch between the facilities available to a relational DBMS and the
"serialization format" we've chosen. If such a database wants to represent
arbitrary dead properties as anything other than a BLOB, they've got their
hands full.

It is, by the way, possible to make RDMS tables that do represent the
information in such documents, it's just hard to query them in sensible
ways.

So that was exactly my point, and rather than refuting it you are
supporting it.

  -- David
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Sunday, 1 November 1998 19:38:48 UTC