- From: Gunnar Andersson <gandersson@genivi.org>
- Date: Tue, 1 Sep 2020 12:18:26 +0000
- To: Ulf Bjorkengren <ulfbjorkengren@geotab.com>, public-automotive <public-automotive@w3.org>
On Wed, 2020-08-26 at 11:25 +0200, Ulf Bjorkengren wrote: > I tried some online compression tools to see what kind of compression > standard compression algorithms can achieve on a typical Gen2 response > payload, shown below. > The results show that they do not perform well on this type of short > payloads, and cannot compete with a tailormade algorithm. > As a comparison, version two of the proprietary algorithm I mentioned in > the presentation will compress the same payload to 17 bytes. > If there is interest for it, this algorithm will be implemented and > available on > https://github.com/MEAE-GOT/W3C_VehicleSignalInterfaceImpl > in both a Go impl and a JS impl. > > Payload: > {“action”:”get”, “timestamp”:”2020-08-25T13:37:00Z”, “value”:”123”, > “requestId”:”999”} > The above payload is 86 chars. > > http://www.txtwizard.net/compression > GZ: Execution time: 11875 us Compression ratio: 112 % Original size: 118 > bytes Result size: 105 bytes I noticed that here it says the original size is 118, but above it is 86. It doesn't change anything fundamental, just pointing it out. It could be because something happened when pasting into a web tool. There might be some other encoding of the text going on. I actually noticed when I copied the example from your HTML-formatted email, that the above was using left-and-right leaning " characters instead of a plain ASCII ", and I then get a 120 byte file with UTF-8 encoding, which suggests there were some multi-byte characters that snuck in. After fixing that and running gzip on the command line on pure ASCII, gzip causes an increase of the size from 86 to 100 bytes. But none of those details really matter and the results are expected behavior on very small files - we already agreed to that yesterday. I don't want to waste time on the exact number of bytes in that comparison. It is well known that much better compression is possible with any method where a predefined dictionary exists than with a general-purpose compression that is not allowed to agree on a lookup dictionary beforehand (which you have done for the keywords, and kind of indirectly also for the shortened UUID). One thing that might still be useful, just to see if sticking to plain HTTP-supported compression is an option (which would be A LOT easier), is to perform a comparison of a large response to a large query. I'm thinking that is where compression is also most important? If the goal is to truly minimize transfers, both large and small, then I wonder why the approach is not to use a full binary encoding instead. (Or if there are other goals, let's discuss them?). As you probably understand I am sceptical to the idea of creating a custom "compression algorithm" without clarifying what the point of that is. It's might be a matter of definition, but I already tend to think about what you created as an /alternative encoding/ more than compression, and I'm thinking that a mind-shift towards that term might uncover even better alternatives? A super-tailored encoding for any task could be made optimal, but building on standards is worthwhile. It doesn't hurt to use a formal language to describe the message schema anyway. (The Gen2- specification might ought to do that for the original JSON too?). And as I mentioned on our previous call, Avro and Protobuf has languages to describe such schemas. If you then use the associated tooling then the resulting binary encoding could be studied. There are /MANY/ other options out there, many of which were also discussed in a previous GENIVI project named "Generic Communication Protocols Evaluation". I'm sure this is not a new discussion in W3C either and I may have heard Ted mention that also. Sincerely, - Gunnar
Received on Tuesday, 1 September 2020 12:18:41 UTC