- From: Takuki Kamiya <tkamiya@us.fujitsu.com>
- Date: Thu, 23 Jul 2015 14:44:58 -0700
- To: Rick van Rein <rick@openfortress.nl>, "public-exi-comments@w3.org" <public-exi-comments@w3.org>
Hi Rick, Thank you for your comments. With regards to the opportunity "PRELOADED STRING TABLES" we do see the rational of having shared string values between sender and receiver. This is something we plan to re-consider for the next version of EXI. One simple trick that may be useful already now is to make use of enumerated values. Note: this is not what you were asking for but may be useful also. Let's suppose you have to represent the brand name of a car. One way to do so is to type the brand name as xs:string. Another way could be to type it as enumerated value. <xs:element name="brandName"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element> In the case the actual brand is part of the enumerated values, the value is encoded as n-bit Unsigned Integer [1]. If this is not successful one can still use schema-invalid productions. As for MODULO COMPACTION, Using the following event codes as an example, I calculated the numbers in a manner you suggested. 0 0 = 0 1.0 1 + 3 * 0 = 1 1.1 1 + 3 * 1 = 4 2.0 2 + 3 * 0 = 2 2.1 2 + 3 * 1 = 5 2.2 2 + 3 * 2 = 8 2.3 2 + 3 * 3 = 11 2.4.0 2 + 3 * 4 + 8 * 0 = 14 2.4.1 2 + 3 * 4 + 8 * 1 = 22 2.4.2 2 + 3 * 4 + 8 * 2 = 30 2.5 2 + 3 * 5 = 17 2.6 2 + 3 * 6 = 20 2.7.0 2 + 3 * 7 + 8 * 0 = 23 2.7.1 2 + 3 * 7 + 8 * 1 = 31 In order to differentiate values ranging from 0 to 31, we need 5 bits. Then it means it even requires 5 bits for representing the most likely occurrence of event code 0 in 5 bits instead of 2 bits. Similarly, 1.0 takes 5 bits instead of 3 bits. Besides this observation, there is expected some slightly extra computation involved to do mod and div operation, which is an additional cost. Hope this helps, Takuki Kamiya [1] http://www.w3.org/TR/exi/#encodingEnumerations -----Original Message----- From: Rick van Rein [mailto:rick@openfortress.nl] Sent: Tuesday, June 23, 2015 8:18 AM To: public-exi-comments@w3.org Subject: Profile-specific URI/string tables -- Modulo compaction Hello, Thanks for the work on EXI, it looks very good! If I am correct there are two opportunities for compaction that have been missed. I hope I am not repeating earlier discussions, but by all means refer me if I've failed at finding any. PRELOADED STRING TABLES: EXI makes good use of static knowledge such as Schemas. It is smart enough to preload a number of URIs and other strings. The reflective nature of RDF triples stretches the requirements to EXI, by placing information into URI fields, rather than schema elements. A similar thing applies to applications that avoid using attributes but instead setup reflective attributes holding string values that are application-interpreted. Imagine wanting to compress RDF, and use it within an application that mostly uses a certain set of URIs. This would ideally preload a custom-defined set of URIs for that specific application profile. A similar thing would apply to string tables. As far as I could tell, this is not supported, although it might be practical. Pros: * Better compaction. * Apps do not need to match the URI/string and/or assign dynamic identifiers to them based on document occurrence order to their fixed set of used URIs. * As a result, RDF/EXI profiles can be better equiped for embedded applications. Cons: * More parameterisation of the conversion process (a URI table and/or string table) similar to Schema preloading. * Profiles would need to be described / standardised and perhaps recognised. MODULO COMPACTION: When a grammar could produce (say) 0, 1.0, 1.1 and 2 the number of bits for the first term is 2. This reserves unused space for value 3. One might consider multiplying the values following it by 3, and adding the first value (0, 1 or 2). Reversing this action would require splitting the value with DIV and MOD operations, which are often paired. The amount of following values could be capped off to fit in 32 bits, or any other practical boundary. This means that reserved unused space only occurs once per boundary, instead of multiple times. Pros: * Better compaction, ballpark figure 10% to 20% improvement? Cons: * Tedious to implement on 8-bit platforms. Some 32-bit platforms might only support DIVMOD into 16 bit fragments? * Even worse readability of binary code. Again, if these things were discussed and I failed finding them, then I apologise. I am only writing in the hope to improve your highly interesting work! Cheers, Rick van Rein ARPA2.net
Received on Thursday, 23 July 2015 21:45:54 UTC