- From: Timothée Haudebourg <timothee.haudebourg@spruceid.com>
- Date: Thu, 22 Aug 2024 17:31:03 +0200
- To: public-r2c2@w3.org
- Message-ID: <e1893390-b912-48df-9e9b-9cb5657844e9@spruceid.com>
Hi everyone,
As I said in my introduction, I am the author of the `rdf-types` library
[1] defining types and traits for RDF (triples, quads, datasets, etc.).
In this email I would like to take this time to explain the philosophy
and design choice I adopted while writing `rdf-types`.
I wanted `rdf-types` to be both strongly-typed and unopinionated, but
with "good default opinions".
* By "strongly-typed" I mean that it provides and uses precise
dedicated types for each RDF concept. Never use `str` or `String`
when we can be more precise. I created the `iref` crate [2] to
define IRIs (we can discuss why not `iri-string` another time), but
`rdf-types` by itself also provide a `BlankId`
(unsized)/`BlankIdBuf` (sized) type for blank node identifiers,
`Literal` for literal values with strongly typed lang tags, etc.
* By "unopinionated" I mean that it uses type parameters as to provide
flexibility, allowing the use of even more refined types, or just
different types. It stills try to provide good defaults so type
parameters can be ignored, for instance the `Quad` type is
equivalent to `Quad<Id, IriBuf, Term, Id>` (a lexical RDF quad).
# Vocabularies
Because it can be really expensive to manipulate strings directly
(linear time comparisons, memory usage), it is even possible to replace
IRIs, blank ids and literal values with cheap types, like `u32` (but it
can be any type really).
To find the term associated to each `u32`, `rdf-types` define various
`Vocabulary*` traits, mapping each value to a term.
I think this is a really important feature to have when manipulating
large datasets.
# Interpretations
`rdf-types` is not only about terms (the syntactic representations of
resources), but also interpretations.
Users can use any type to represent interpreted resources (like `u32`
again), and define custom interpretations using the `Interpretation*`
traits.
In practice this means that a user will define what are lexical quads
and interpreted quads like this:
type LexicalQuad = Quad<Id, Iri, Term, Id>;
type InterpretedQuad = Quad<Resource>;
Then its possible to go from `Quad` to `LexicalQuad` using a vocabulary,
and from `LexicalQuad` to `InterpretedQuad` using an interpretation.
Of course interpretations can be reversible, so we can retreive the
(potentially multiple) lexical representation(s) of a resource.
In practice I find that it is much much much easier to manipulate
interpreted datasets.
# Datasets
`rdf-types` provide traits for interpreted datasets and graphs. Each
trait define a feature of the dataset, for instance:
- DatasetMut: a dataset with an `insert` method
- TraversableDataset: a dataset that can provide an iterator over the quads
- PatternMatchingDataset: a dataset that provides a pattern matching method
- etc.
Again, users can implement those traits as they see fit, but `rdf-types`
also provide good defaults with `BTreeDataset`.
In conclusion, I know `rdf-types` is far from perfect and I welcome any
improvement, be it in `rdf-types` directly or a brand new W3C-approved
library, but I really wish to preserve the same kind of type-oriented
and flexible API that let me deal with interpreted datasets with the
custom types that best fit my application.
[1]: https://github.com/timothee-haudebourg/rdf-types
[2]: https://github.com/timothee-haudebourg/iref
Received on Thursday, 22 August 2024 15:31:10 UTC