- From: santhanakrishnan <santhana@huawei.com>
- Date: Tue, 25 Mar 2008 18:08:25 +0530
- To: 'Jaakko Kangasharju' <jkangash@hiit.fi>
- Cc: public-exi@w3.org
Hi Jaakko Thanks for your explanation. I could appreciate it well. When optimized for frequent use of compact identifiers and when "hit" the uri or prefix we encode the compact identifier incremented by 1. When we miss we encode the length prefixed string as such. When optimized for frequent use of string literals and when "miss" the localname or value we encode the length of the string incremented by 1 or 2. When we hit we encode 0 or 1 followed by the compact identifier. But how exactly this encoding helps in the optimization of frequent use of compact identifier(uri or prefix) or string literal(localname or value) ? Thanks in advance Santhanakrishnan -----Original Message----- From: Jaakko Kangasharju [mailto:jkangash@hiit.fi] Sent: Tuesday, March 25, 2008 3:14 PM To: santhana@huawei.com Cc: public-exi@w3.org Subject: Re: [EXI] String Encoding in case of a string table miss santhanakrishnan <santhana@huawei.com> writes: > In case of a string table miss and value table miss we encode the > string or value as > > Length prefixed, length incremented by 1 string > > Length prefixed, length incremented by 2 string > > Can anybody explain the reason why the length of the string or value is > incremented by 1 or 2. The encoding needs to indicate in some manner whether there is a string table hit or miss. The string encoding always begins with a non-negative integer that can be used to determine this. Of the range of this integer, some values are reserved to only indicate either a hit or miss, and the rest of the values are used to directly encode the case that the partition is optimized for. In the case of the local-name and value partitions, which you are asking about, the optimization is for the frequent use of string literals, so the reserved integer values are for string table hits. In the local-name case there is only one partition, so only one integer (0) is required to indicate a string table hit and the rest, from 1 upwards, are used for string table misses. As noted, these other integers are already a part of the encoding of the string, that is, they denote the encoded string's length. Since all strings still need to be representable, 1 must be subtracted from the encoded value to get it into the range from 0 upwards. And viewing from the encoder side, this translates into having to add 1 to the string length when encoding. In the value case, there are two partitions, local and global, so two integer values are needed for string table hits, 0 for local values and 1 for global values. Thus the available range to indicate a string literal length is from 2 upwards, so 2 has to be subtracted to get it into the range from 0 upwards. The same applies to the partitions optimized for frequent use of compact identifiers, except there the reserved value 0 is used to indicate a string table miss, that is, the 0 is followed by a normal length-prefixed encoding of a string. Again, values from 1 upwards denote compact identifiers, and 1 must be subtracted to get them into the range from 0 upwards. Ultimately, the reason for having the partitions optimized for either hits or misses is to achieve better compactness, since this technique avoids the use of indicator values in the optimized-for case. Hope this helps, -- Jaakko Kangasharju, Helsinki University of Technology Paperi soveltuu vain piirusteluun ja pyyhkimiseen
Received on Tuesday, 25 March 2008 12:40:05 UTC