About the `rdf-types` library from Timothée Haudebourg on 2024-08-22 (public-r2c2@w3.org from August 2024)

From: Timothée Haudebourg <timothee.haudebourg@spruceid.com>
Date: Thu, 22 Aug 2024 17:31:03 +0200
To: public-r2c2@w3.org
Message-ID: <e1893390-b912-48df-9e9b-9cb5657844e9@spruceid.com>
Hi everyone,

As I said in my introduction, I am the author of the `rdf-types` library 
[1] defining types and traits for RDF (triples, quads, datasets, etc.). 
In this email I would like to take this time to explain the philosophy 
and design choice I adopted while writing `rdf-types`.

I wanted `rdf-types` to be both strongly-typed and unopinionated, but 
with "good default opinions".

  * By "strongly-typed" I mean that it provides and uses precise
    dedicated types for each RDF concept. Never use `str` or `String`
    when we can be more precise. I created the `iref` crate [2] to
    define IRIs (we can discuss why not `iri-string` another time), but
    `rdf-types` by itself also provide a `BlankId`
    (unsized)/`BlankIdBuf` (sized) type for blank node identifiers,
    `Literal` for literal values with strongly typed lang tags, etc.
  * By "unopinionated" I mean that it uses type parameters as to provide
    flexibility, allowing the use of even more refined types, or just
    different types. It stills try to provide good defaults so type
    parameters can be ignored, for instance the `Quad` type is
    equivalent to `Quad<Id, IriBuf, Term, Id>` (a lexical RDF quad).

# Vocabularies

Because it can be really expensive to manipulate strings directly 
(linear time comparisons, memory usage), it is even possible to replace 
IRIs, blank ids and literal values with cheap types, like `u32` (but it 
can be any type really).
To find the term associated to each `u32`, `rdf-types` define various 
`Vocabulary*` traits, mapping each value to a term.
I think this is a really important feature to have when manipulating 
large datasets.

# Interpretations

`rdf-types` is not only about terms (the syntactic representations of 
resources), but also interpretations.
Users can use any type to represent interpreted resources (like `u32` 
again), and define custom interpretations using the `Interpretation*` 
traits.
In practice this means that a user will define what are lexical quads 
and interpreted quads like this:

type LexicalQuad = Quad<Id, Iri, Term, Id>;
type InterpretedQuad = Quad<Resource>;

Then its possible to go from `Quad` to `LexicalQuad` using a vocabulary, 
and from `LexicalQuad` to `InterpretedQuad` using an interpretation.
Of course interpretations can be reversible, so we can retreive the 
(potentially multiple) lexical representation(s) of a resource.
In practice I find that it is much much much easier to manipulate 
interpreted datasets.

# Datasets

`rdf-types` provide traits for interpreted datasets and graphs. Each 
trait define a feature of the dataset, for instance:
- DatasetMut: a dataset with an `insert` method
- TraversableDataset: a dataset that can provide an iterator over the quads
- PatternMatchingDataset: a dataset that provides a pattern matching method
- etc.

Again, users can implement those traits as they see fit, but `rdf-types` 
also provide good defaults with `BTreeDataset`.

In conclusion, I know `rdf-types` is far from perfect and I welcome any 
improvement, be it in `rdf-types` directly or a brand new W3C-approved 
library, but I really wish to preserve the same kind of type-oriented 
and flexible API that let me deal with interpreted datasets with the 
custom types that best fit my application.

[1]: https://github.com/timothee-haudebourg/rdf-types
[2]: https://github.com/timothee-haudebourg/iref
Received on Thursday, 22 August 2024 15:31:10 UTC