- From: Rick van Rein <rick@openfortress.nl>
- Date: Tue, 23 Jun 2015 17:17:36 +0200
- To: public-exi-comments@w3.org
Hello, Thanks for the work on EXI, it looks very good! If I am correct there are two opportunities for compaction that have been missed. I hope I am not repeating earlier discussions, but by all means refer me if I've failed at finding any. PRELOADED STRING TABLES: EXI makes good use of static knowledge such as Schemas. It is smart enough to preload a number of URIs and other strings. The reflective nature of RDF triples stretches the requirements to EXI, by placing information into URI fields, rather than schema elements. A similar thing applies to applications that avoid using attributes but instead setup reflective attributes holding string values that are application-interpreted. Imagine wanting to compress RDF, and use it within an application that mostly uses a certain set of URIs. This would ideally preload a custom-defined set of URIs for that specific application profile. A similar thing would apply to string tables. As far as I could tell, this is not supported, although it might be practical. Pros: * Better compaction. * Apps do not need to match the URI/string and/or assign dynamic identifiers to them based on document occurrence order to their fixed set of used URIs. * As a result, RDF/EXI profiles can be better equiped for embedded applications. Cons: * More parameterisation of the conversion process (a URI table and/or string table) similar to Schema preloading. * Profiles would need to be described / standardised and perhaps recognised. MODULO COMPACTION: When a grammar could produce (say) 0, 1.0, 1.1 and 2 the number of bits for the first term is 2. This reserves unused space for value 3. One might consider multiplying the values following it by 3, and adding the first value (0, 1 or 2). Reversing this action would require splitting the value with DIV and MOD operations, which are often paired. The amount of following values could be capped off to fit in 32 bits, or any other practical boundary. This means that reserved unused space only occurs once per boundary, instead of multiple times. Pros: * Better compaction, ballpark figure 10% to 20% improvement? Cons: * Tedious to implement on 8-bit platforms. Some 32-bit platforms might only support DIVMOD into 16 bit fragments? * Even worse readability of binary code. Again, if these things were discussed and I failed finding them, then I apologise. I am only writing in the hope to improve your highly interesting work! Cheers, Rick van Rein ARPA2.net
Received on Tuesday, 23 June 2015 15:18:10 UTC