ContentLength in Unicode Strings from Doug Daniels on 2003-03-04 (www-annotation@w3.org from January to June 2003)

From: Doug Daniels <rainking@rice.edu>
Date: Tue, 04 Mar 2003 15:08:52 -0600
To: www-annotation@w3.org
Message-ID: <3E6515E4.5080603@rice.edu>

Hi,

In the process of trying to post annotations whose bodies use Unicode, I 
came across a problem with the current way of embedding annotation 
bodies into the RDF.  The problem is that the ContentLength property for 
the body text has to be in bytes, not in number of characters, and 
accessing byte-count information in most high-level languages is 
difficult if not impossible.

Whenever you use Unicode characters inside an annotation body, the 
length of the string (which has units==number of characters) may not 
correspond to its ContentLength (which has units==number of bytes).  In 
JavaScript, as well as Java, there is no way to determine the byte 
length of string, and thus no way to set the ContentLength correctly.

It seems like this is an issue of mislayering--it's like the 
application-level code is trying to do the job of the HTTP layer, and 
isn't equipped with the proper tools.

The good news is that omitting the ContentLength in the body seems to 
cause no problems, both for the W3 server and the Zope ZAnnot server. 
The protocol is unclear as to whether the ContentLength is actually 
required, though.

I would suggest that we make the ContentLength optional, since it often 
*cannot* be computed.

Thanks,
Doug

Received on Tuesday, 4 March 2003 16:09:26 UTC