CBOR-LD from Ivan Herman on 2019-02-19 (public-json-ld-wg@w3.org from February 2019)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 19 Feb 2019 18:05:00 +0100
To: W3C JSON-LD Working Group <public-json-ld-wg@w3.org>
Message-Id: <7AB5CBF7-E40B-478F-896F-4F400E7BF49A@w3.org>

Because we discussed it at the F2F meeting, I was looking at the CBOR spec[1] and some other documents to see what it would mean for us to write a note. The answer is: it is probably trivial:-). The simplest approach is not to refer to the abstract data model and concepts but start from the JSON serialization. Indeed, the spec includes a section on JSON<->CBOR conversion (section 4) and it would be foolish not to use that. (The RFC text says it is non-normative, but that is not a real problem for us, because we are considering a note only…)

[1] https://www.rfc-editor.org/rfc/rfc7049.txt <https://www.rfc-editor.org/rfc/rfc7049.txt>

## Conversions

### Converting CBOR to JSON

This is the bit which is not 100% obvious, because CBOR is a superset of JSON in terms of expressivity. It allows the storage of binary data, allows for non-string and possibly repeated keys for dictionaries (maps, as they call it), etc. However, the section proposes describes a possible strategy to take care of each of those, and we can just simply say 'do what is in the RFC!'. (E.g., binary data is base64url encoded and stored as a string, non-string keys are dropped, etc.)

### Converting JSON to CBOR

There are some notes there on how numbers should be stored; my impression that there is nothing special for our case, the issues are more how to choose among semantically equivalent representations of numbers.

## Canonical CBOR

There is a concept of "canonical CBOR" (section 3.9): "…two encoder implementations starting with the same input data will produce the same CBOR output". (E.g., choose a specific number representation, order the keys, etc.)

I am not sure this is important for us, although maybe there are corner cases where roundtripping may require it (although nothing comes to my mind right now).

## CBOR-LD as Binary RDF?

I was wondering about the compression ratio of CBOR; I have not found real data. I have tested some of my JSON-LD files and, on the average, the compression of the JSON data was around 50%. But my files are small, ie, this may not be significant. Anyone has some bigger data that one can test with?

As a comparison, a minified version for the same JSON files was about 70% of the original but a simple gzip was around 25%. (As far as I could see, Unicode character strings remain unchanged in the CBOR encoded file, which may explain this.) I.e., CBOR is not all that great in terms of compression; the noted goal in the CBOR spec is that they have put a higher priority on being able to write a very light coder/decoder that would require a very small processing footprint, even if that made the compression less efficient. I guess this would be of interest for our WoT friends, but may not make CBOR-LD very interesting for those who want to achieve better compression for JSON-LD data storage...

I am not sure where we would go from here…

Ivan

----
Ivan Herman, W3C
Publishing@W3C Technical Lead
Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
mobile: +31-641044153
ORCID ID: https://orcid.org/0000-0003-0782-2704 <https://orcid.org/0000-0003-0782-2704>

Received on Tuesday, 19 February 2019 17:05:05 UTC