- From: Babich, Alan <ABabich@filenet.com>
- Date: Sun, 1 Nov 1998 16:36:09 -0800
- To: "'David G. Durand'" <dgd@cs.bu.edu>, w3c-dist-auth@w3.org
David Durand wrote: "There are issues here: most character encodings are a subset of Unicode, so that there are legitimate properties that are unlikely to be translatable into likely server encodings. This is not to say that we should prohibit internal translation, but we should require that _if_ the encoding changes, _then_ the characters will be preserved in the result returned by the server. Transcoding is a reality, but it's also a minefield, and servers that do it should be held accountable for not corrupting data. Furthermore, even transcodable character-set encodings are not always invertible: the precomposed characters in unicode are a canonical example." RDBMS's typically store things in one, maybe two, character sets. (Sometimes you can use both ASCII and Unicode in the same database.) If you want to put data in or get it out in another character set, a character set translation is invoked. Unfortunately, the world isn't perfect: Unicode is very popular, but as you have pointed out, losses in character set translation can and do occur. Facing reality, that fact must be accepted. So, the draft must allow that "best efforts" behavior. It is reasonable to translate character sets mechanically and on the fly, but not languages. The best that you can do is to use an appropriate collation for the characters in queries. If losses occur in character set translation, then you can expect improper collation for those strings. This is reality, and the draft must accept this "best efforts" behavior. "XML explicitly does _not_ ignore whitespace in any situation. When validating against a DTD (and only when validating against a DTD) it flags data (whitespace) that would be ignored by an SGML processor, and that many applications will also want to ignore. The flagging often confuses people, but it is an application issue as to whether the data can be ignored. Since WebDAV _has_ no DTD for all properties (since additional tags are not an error), there is _no_ ignorable whitespace in any DAV property." I did not say XML ignored whitespace. I said whitespace is ignored. I meant that applications often ignore it. This is true whether or not you have a DTD. (Applications do not always follow the rules perfectly.) If I give a DAV store an integer property value: <AgeInYears> 023 </AgeInYears> I would bet that the whitespace would be ignored by the server, the value would be stored in binary, and the value returned later by PROPFIND or query would be <AgeInYears>23</AgeInYears> There is nothing in the draft that prevents the server from doing that. And that, IMHO, is a very good thing -- it is the solution, not a problem. In order to avoid making this a longer e-mail, I'll refrain from listing all the reasons why, except to point out that as far as query is concerned, considering the following representations of the property value on the wire to be equivalent is a very good thing: <AgeInYears>23</AgeInYears> <AgeInYears> 023 </AgeInYears> <AgeInYears> 23 </AgeInYears> "Many of the issues you raise are only appropriate for live properties." I disagree. In the above example, AgeInYears is a dead property. Alan Babich
Received on Sunday, 1 November 1998 19:35:42 UTC