- From: Jaakko Kangasharju <jkangash@hiit.fi>
- Date: Tue, 25 Mar 2008 16:15:07 +0200
- To: santhana@huawei.com
- Cc: public-exi@w3.org
santhanakrishnan <santhana@huawei.com> writes: > Thanks for your explanation. I could appreciate it well. > When optimized for frequent use of compact identifiers and when "hit" the > uri or prefix we encode the compact identifier incremented by 1. When we > miss we encode the length prefixed string as such. > When optimized for frequent use of string literals and when "miss" the > localname or value we encode the length of the string incremented by 1 or 2. > When we hit we encode 0 or 1 followed by the compact identifier. > But how exactly this encoding helps in the optimization of frequent > use of compact identifier(uri or prefix) or string literal(localname or > value) ? It helps with compactness. Say that we have the string "hello" and the string table partition used for it has 10 entries in it. Now, if the partition is optimized for frequent use of compact identifiers, the string will be encoded as follows: hit: compact identifier + 1 = 4 bits total miss: 0 in 4 bits + length in 8 bits + characters = 52 bits total On the other hand, if the partition is optimized for frequent use of string literals, the string will be encoded as follows: hit: 0 (or 1) in 8 bits + compact identifier in 4 bits = 12 bits total miss: length + 1 (or 2) in 8 bits + characters = 48 bits total (The 4-bit parts come from the partition having 10 entries, which requires 4 bits for indexing into it, and the 8-bit parts (including the characters in the string) from the unsigned integer encoding, which represents an integer as a variable-length sequence of octets (here, the integers are small enough to be represented in one octet, but a sufficiently long string or non-ASCII characters would require more octets).) As you can see, the case which each partition is optimized for is encoded in a smaller number of bits than the other case, and also a smaller number of bits than it would be if every encoding started with just an indicator of hit or miss. The partition types for each kind of string are selected according to which case is expected to be more common in documents, so that the smaller encoding is usually used when it's more appropriate. -- Jaakko Kangasharju, Helsinki University of Technology Miksi valita pienempi paha? Cthulhu presidentiksi!
Received on Tuesday, 25 March 2008 14:15:58 UTC