Re: RDF/CBOR: A CBOR-based Serialization of RDF

Hello Miel,

Miel Vander Sande <miel.vandersande@meemoo.be> writes:

> - it's inspired by HDT, does that mean it's self-indexed and queryable?

The structures used are exactly the same as in HDT (front-coded
dictionary and BitMapTriples).

However, RDF/CBOR uses variable-length encoding of the integers that
reference dictionary terms (we inherit this from CBOR). Thus, it is not
possible to compute fixed offses and zip around structures on-disk.

RDF/CBOR was designed for small pieces of content that can be handled in
memory. On-disk query-ability was not a design goal and has been lost. On
the other hand, the variable-length encoding should in principle allow
more compact encodings.

> - How does it compare to https://rdf4j.org/documentation/reference/rdf4j-binary/,
> https://jena.apache.org/documentation/io/rdf-binary ?

As far as I understand, the encoding you link requires a Apache Thrift
or Google Protocol Buffers definition and tools to generate serializes.

RDF/CBOR is defined (via CBOR) directly in bytes and bits. There is no
external tooling required to implement the serialization.

We do use CDDL to define the serialization, however this is just a
documentation tool.

Unfortunately, I have not been able to perform quantative performance
tests. So I can't make any statements on efficiency compared to other
binary serializations.

> - Does it integrate with any of the existing frameworks for handling
> RDF? Can you work with the OCaml implementation in Python, Java or
> RDF.js?

Unfortunately, I don't think integrating the OCaml implementation in any
other language would be a good way of using the serialization.

A major objective of RDF/CBOR is that it is re-implementable from
scratch (this is a design goal shared with CBOR).

Although this has yet to be done, I believe it should be feasible to
re-implement RDF/CBOR in your favorite language with reasonable
effort. CBOR libraries that can be used for the low-level encodings
exist and can be used. I would be very happy to assist you in such an
endeavor.

Best regards,
pukkamustard


> Op zo 18 sep. 2022 om 22:03 schreef pukkamustard <pukkamustard@posteo.net>:
>
>  Hello semantic-web,
>
>  I'd like to share some recent work towards a binary serialization of RDF
>  using CBOR:
>
>  https://openengiadina.codeberg.page/rdf-cbor/
>
>  CBOR (RFC 8949) is a binary data serialization that provides basic data
>  types (string, integer, arrays, etc.) as well as extendable tags for
>  annotating more complex data types. RDF/CBOR encodes RDF into CBOR
>  types. CBOR types are re-used for efficient binary serialization of
>  literal values and certain binary IRIs (e.g. UUIDs).
>
>  RDF/CBOR is very much inspired by the HDT serialization and uses a very
>  similar encoding (front-coded dictionaries and BitMapTriples). Unlike
>  HDT, RDF/CBOR is optimized for small pieces of content that are created,
>  transported and read by possibly constrained devices.
>
>  The serialization is defined using the Concise Data Definition Language
>  (CDDL; RFC 8610) which allows a very concise and precise specification.
>
>  RDF/CBOR also allows groups of RDF statements to be content-addressed,
>  i.e. identifiers are the cryptographic hash of the serialized
>  statements. This can be used for cryptographic signature schemes and
>  makes RDF viable on distributed, peer-to-peer systems.
>
>  I look forward to your feedback and comments.
>
>  Best regards,
>  pukkamustard

Received on Wednesday, 21 September 2022 12:08:41 UTC