- From: David G. Durand <dgd@cs.bu.edu>
- Date: Sat, 31 Oct 1998 19:36:25 -0400
- To: w3c-dist-auth@w3.org
At 8:01 PM -0400 11/1/98, Babich, Alan wrote: >"Frankly, accepting full XML properties seems like a no-brainer to me: >the >parsers are small, and the data model is simple. (RDBMs are going to >have >much more trouble handling nesting than they ever will with >attributes)." >I disagree with both sentences. It would be a very >large mistake to burden the DAV property model with >the artifacts of XML (e.g., saving original property >value representation including whitespace, >recording attributes, etc.), and to require servers to >handle arbitrary XML documents as property values. >Servers do not store property values as XML documents >for very good reasons, e.g., efficiency of storage, >efficiency of processing, and simplicity of concepts >for implementers and end users. Servers should not >have to be burdened with the requirement of storing >arbitrary XML documents as property values. >Just as XML is a poor format for storing some types >of document content (e.g., images), it is not the >best format for storing simple properties or >supporting queries against them. Right, for those properties with server-understood semantics, the server can do anything that makes sense. That was what I was explicitly proposing.\ You refer to "simple" properties. But an unknown property is neither simple nor complex -- it's semantically meaningless in the framework of the server's ontology, and the server must have a clear definition of the limits to any transformations that it may want to apply. However, many clients will want to store specialized properties for resources. All servers will not be aware of the semantics of those properties. This is one of the reasons for dead properties, right? When they do that, they are providing a representation of a structure. By choosing a notation, we delimit the sorts of structures we allow people to attach. Since the server does _not_ know the semantics of the data inside the property tag, it must preserve the data inside in a way that is fauthful to XML semantics. Otherwise, I _can't_ get useful behavior out of the server. If it is legitimate for the server to rewrite any occurrence of "04/28/61" to "April 28. 1961", then I can't count on the server. Less fancifully, if I attach an XML-structured change log to a resource, the server can't arbitrarily add or delete whitespace or anything else, unless it knows the semantics of that property. I don't see that this _can_ be controversial. If the server knows that a property is a date it can do anything it wants with that date (within the semantics of the date's function). But there are many kinds of properties that a server will not deal with (like some kinds of metadata). When asked to store an unkown property, the server should do exactly that: store it! >Fortunately, the DAV property model is *not* XML. >Yaron tried to make that clear in his e-mail. If it walks like XML, and quacks like XML, it is XML. The only way you can avoid this issue is by banning the use of tags not explicitly defined in the protocol -- because no predictable behavior will exist for tags not defined in the protocol. This is the kind of thing people are doing for EDI and other protocols. If we leave the "dead property" field open to arbitrary XML, then we need to respect the data content inplicit in those (small) XML documents. We could simply disallow dead properties, and have a special tag that refers a property value to a URI where that value is to be found: then dead, user-defined properties would just be resources, and the server could store them and send them back on request. >The draft does not completely constrain the property >model. IMHO, that is OK for this draft. The property >model is and should be independent of any particular >serialization format that happens to be chosen for >the protocol. However, it must be defined how the dead properties are treated by the protocol, and that implies that whatever the serialization format is, it must either: 1. return a byte-identical string to the one that appeared when the client set a dead property; or 2. define in what precise ways the property set may be different from the one retrieved later. Point 1 is of course the simplest way to answer the definitional requirement in point 2. Not answering that question makes dead properties almost semantically null, since the server can return any damn thing it wants, regardless of the property value that was set. If this is not to be true, then we need to define the limits on the transformations the server can apply. Character encoding only is very resonable. Some might prefer to use the canonical XML format that I mentioned, as a way to define the significant data in the document. Some might prefer the DOM. Just saving the bytes literally is always safe, and strikes me as the most likely default implementation of many servers. > The property model should not be >overconstrained: It should accommodate what people >have done and are doing, and much of what they >would like to do. I would welcome a little bit >more clarification of the property model in future >work. For example, it would be helpful if the draft >explicitly said that the property model is not XML. >That would avoid a lot of confusion. It sounds to me like a lot of confusion is being created by an insistence on a strange and silly restriction that doesn't even have normative text to substantiate it. If the text does exist, I think we need to delete it, for the reasons I've given, or fix the standard in one of the ways I've suggested, or some other way. By allowing dead properties to be expressed in XML, we are giving people a notation and telling them to use it. They need to know either that their notation will be preserved or what equivalence class of notations is being used. >Regarding the second sentence, dealing with nesting >is not more difficult than dealing with attributes for >RDBMS's. Dealing with a statically defined property >structure or a statically defined set of attributes >is straightforward. What is a pain is dealing with >arbitrary nested properties and arbitrary XML attributes >discovered dynamically at run time, because columns and >tables will have to be created dynamically by the server. Right. XML is not limited to expressing such things. Therefore there is a mismatch between the facilities available to a relational DBMS and the "serialization format" we've chosen. If such a database wants to represent arbitrary dead properties as anything other than a BLOB, they've got their hands full. It is, by the way, possible to make RDMS tables that do represent the information in such documents, it's just hard to query them in sensible ways. So that was exactly my point, and rather than refuting it you are supporting it. -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Sunday, 1 November 1998 19:38:48 UTC