RE: property value clarification from Babich, Alan on 1998-11-02 (w3c-dist-auth@w3.org from October to December 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Mon, 2 Nov 1998 13:05:53 -0800
To: "'Greg Stein'" <gstein@lyra.org>
Cc: w3c-dist-auth@w3.org
Message-Id: <98Nov2.125437pst.55814@firewall.saros.com>
Greg Stein wrote:
"Allowing a server to rewrite properties that it knows nothing about
would seem quite dangerous."

Yes, it would, wouldn't it? I never said that servers 
should do arbitrary "rewriting" of properties they
know nothing about, thereby causing unknown problems.
(Note that the use of the term "rewriting" implies
XML centric thinking instead of abstract-property-model
centric thinking. At the property model level, one thinks
of various serializations that are possible to express
the abstract value. The abstract value is central,
not any particular serialization of it. That is why
I don't like to use the term "rewrite". It leads to
confusion immediately by implicitly accepting some 
unstated assumptions and orientations.)
(1) Servers take a position on the data type of the
values for dead and live properties that they store.
I never met a server that didn't know the data types
of its values. Such a hypothetical server would not
be able to do an acceptable job on query, so nobody
would be interested in it.
(2) The default data type has to be String. I.e., if you
don't tell the server the data type, the server must 
assume it is String. No other data type seems reasonable.
So, there are no simple values with unknown datatypes.
If nothing says otherwise, it is String.
(3) Servers don't get to do much "rewriting" of strings.
(a)XML documents often include whitespace that is just 
there so humans can read them more conveniently. XML has 
rules to tell applications (e.g., the server) about 
ignoring such whitespace, and even comments. We have 
to assume the applications follow the rules. So, I 
do not consider this bad or arbitrary "rewriting". 
It is good, well defined "rewriting".
(b) It is increasingly important to do a better 
job on localization than has sometimes been done 
in the past, especially in the context of the 
internet. In a perfect world, there would be no 
losses, perfect collation, high efficiency for compares, 
etc. . Translation losses could be considered a 
potential type of bad "rewriting". I've reconsidered, 
and do not advocate accepting losses in translation.
(c) Servers store integers, real numbers, Booleans,
and datetimes in binary format. Whitespace, comma
delimiters for numbers, etc. are clearly not part
of the abstract value. They are just an artifact
of a particular serialization of the value.
The variations of the syntax of numbers, datetimes,
and Booleans don't need to be specified in the WebDAV 
draft, because they are extremely well known. This is
not bad "rewriting". (Format masks and conventions for 
serialization are a separate discussion from abstract
property values for numbers, Booleans, and Datetimes. 
I don't want to get into that discussion here.)
(d) The conclusion is that the server knows the
data types and hierarchical structure of its properties. 
There are no totally unknown properties.

Thus, I am agreeing with you. I hope this
puts fears of arbitrary bad "rewriting" to bed.

Greg Stein wrote:
"I would much rather see the server reject
the unknown property, than to believe that I can store the property
without fear of losing my (application's) semantics on that property."

High end document management systems have well defined
schemas for the properties of documents, and documents
are grouped into document classes. Only the database
administrator can futz with the schema. This discipline
is absolutely required in a large organization.
It is necessary for such systems to reject
a client that tries to set a property that is not defined
for the particular document resource at hand.

Some low end systems have only hardwired properties. 
They, also, must reject clients that try to add a new 
property to a resource.

The case that most of the discussion seems
to focus on, is that clients can add any new, client
defined, simple-or-compound property value to any old 
resource at any old time. Implicit in this is that there 
probably isn't an overall schema for the resources in the 
collection. Ironically, this is the rarest case of all in 
my experience. I am familiar with only one commercial 
system that allows that, and it is dying. 

There is no problem, even in this case. Even if the
value is compound, the server knows the data types of
all the simple values, and from the structure of the
XML tags, it knows the structure of the values. (If you
don't tell it anything about the datatypes of the
simple values, and the server doesn't already know about
the property, the server assumes String for the simple
values.) So, the server has no trouble in principle in 
storing or retrieving the value, or in preserving the 
abstract value, or in querying it.

Generally, the server doesn't need to know the 
semantics of properties other than the simple 
data types, and their hierarchical structure. 
With String as the default data type, 
and XML providing hierarchical structure
for hierarchical values, the server can faithfully 
implement the abstract value, whether simple or compound,
and whether predefined in an underlying schema, or
seen for the first time in a PROPPATCH.

So, I am agreeing with you again. I would reword your 
sentence to be slightly stronger: "The server must either 
reject the property or implement its abstract value 
correctly."

Greg Stein wrote:
"My initial question which created this thread was more about how far
the
property value extended within the XML framework which carries it,
versus what the semantic nature of those properties are."

The right order to consider the questions is (1) what is the
property model or models we want to support, and (2) what
makes sense to express property values in XML. With the
property model in mind, it is clear that XML attributes
are best used for decorations of literal values, e.g.,
data type and language, and not for the abstract value 
itself.

Alan Babich
Received on Monday, 2 November 1998 16:05:30 UTC