RE: property value clarification from Babich, Alan on 1998-11-02 (w3c-dist-auth@w3.org from October to December 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Sun, 1 Nov 1998 16:36:09 -0800
To: "'David G. Durand'" <dgd@cs.bu.edu>, w3c-dist-auth@w3.org
Message-ID: <C3AF5E329E21D2119C4C00805F6FF58F04AF8D@hq-expo2.filenet.com>

David Durand wrote:
"There are issues here: most character encodings are a subset of Unicode,
so that there are legitimate properties that are unlikely to be translatable
into likely server encodings. This is not to say that we should prohibit
internal translation, but we should require 
that _if_ the encoding changes, _then_ the characters will be 
preserved in the result returned by the server.

Transcoding is a reality, but it's also a minefield, and servers 
that do it should be held accountable for not corrupting data.
Furthermore, even transcodable character-set encodings are not 
always invertible: the precomposed characters in unicode are a 
canonical example."

RDBMS's typically store things in one, maybe two, character sets.
(Sometimes you can use both ASCII and Unicode in the same 
database.) If you want to put data in or get it out in another
character set, a character set translation is invoked.
Unfortunately, the world isn't perfect: Unicode is very
popular, but as you have pointed out, losses in character
set translation can and do occur. Facing reality, that fact
must be accepted. So, the draft must allow that "best 
efforts" behavior.

It is reasonable to translate character sets mechanically
and on the fly, but not languages. The best that you can do 
is to use an appropriate collation for the characters in 
queries. If losses occur in character set translation, then
you can expect improper collation for those strings.
This is reality, and the draft must accept this "best
efforts" behavior.

"XML explicitly does _not_ ignore whitespace in any situation. When
validating against a DTD (and only when validating against a DTD) it 
flags data (whitespace) that would be ignored by an SGML processor, 
and that many applications will also want to ignore.

The flagging often confuses people, but it is an application issue as to
whether the data can be ignored. Since WebDAV _has_ no DTD for all
properties (since additional tags are not an error), there is _no_
ignorable whitespace in any DAV property."

I did not say XML ignored whitespace. I said whitespace is
ignored. I meant that applications often ignore it.
This is true whether or not you have a DTD. (Applications
do not always follow the rules perfectly.) If I give
a DAV store an integer property value:

    <AgeInYears> 023 </AgeInYears>

I would bet that the whitespace would be ignored by 
the server, the value would be stored in binary, and 
the value returned later by PROPFIND or query would be

    <AgeInYears>23</AgeInYears>

There is nothing in the draft that prevents the server 
from doing that. And that, IMHO, is a very good thing -- it
is the solution, not a problem. In order to avoid
making this a longer e-mail, I'll refrain 
from listing all the reasons why, except to point out that 
as far as query is concerned, considering the following 
representations of the property value on the wire to be 
equivalent is a very good thing:

    <AgeInYears>23</AgeInYears>

    <AgeInYears> 023 </AgeInYears>

    <AgeInYears> 23
    </AgeInYears>

"Many of the issues you raise are only appropriate for live properties."

I disagree. In the above example, AgeInYears is a dead property.

Alan Babich

Received on Sunday, 1 November 1998 19:35:42 UTC