Re: About the `rdf-types` library from Pierre-Antoine Champin on 2025-02-25 (public-r2c2@w3.org from February 2025)

From: Pierre-Antoine Champin <pierre-antoine@w3.org>
Date: Tue, 25 Feb 2025 12:56:28 +0100
To: Timothée Haudebourg <timothee.haudebourg@spruceid.com>, public-r2c2@w3.org
Message-ID: <65169e49-2e1b-496c-863d-ddd8eb92189a@w3.org>
Hi Timothée,

I've been meaning to respond this email for ages... Now that the charter 
proposal is out of the way, let me give it a shot.

On 22/08/2024 17:31, Timothée Haudebourg wrote:
>
> Hi everyone,
>
> As I said in my introduction, I am the author of the `rdf-types` 
> library [1] defining types and traits for RDF (triples, quads, 
> datasets, etc.). In this email I would like to take this time to 
> explain the philosophy and design choice I adopted while writing 
> `rdf-types`.
>
> I wanted `rdf-types` to be both strongly-typed and unopinionated, but 
> with "good default opinions".
>
>   * By "strongly-typed" I mean that it provides and uses precise
>     dedicated types for each RDF concept. Never use `str` or `String`
>     when we can be more precise. I created the `iref` crate [2] to
>     define IRIs (we can discuss why not `iri-string` another time),
>     but `rdf-types` by itself also provide a `BlankId`
>     (unsized)/`BlankIdBuf` (sized) type for blank node identifiers,
>     `Literal` for literal values with strongly typed lang tags, etc.
>   * By "unopinionated" I mean that it uses type parameters as to
>     provide flexibility, allowing the use of even more refined types,
>     or just different types. It stills try to provide good defaults so
>     type parameters can be ignored, for instance the `Quad` type is
>     equivalent to `Quad<Id, IriBuf, Term, Id>` (a lexical RDF quad).
>
> # Vocabularies
>
> Because it can be really expensive to manipulate strings directly 
> (linear time comparisons, memory usage), it is even possible to 
> replace IRIs, blank ids and literal values with cheap types, like 
> `u32` (but it can be any type really).
> To find the term associated to each `u32`, `rdf-types` define various 
> `Vocabulary*` traits, mapping each value to a term.
> I think this is a really important feature to have when manipulating 
> large datasets.
>
> # Interpretations
>
> `rdf-types` is not only about terms (the syntactic representations of 
> resources), but also interpretations.
> Users can use any type to represent interpreted resources (like `u32` 
> again), and define custom interpretations using the `Interpretation*` 
> traits.
> In practice this means that a user will define what are lexical quads 
> and interpreted quads like this:
>
> type LexicalQuad = Quad<Id, Iri, Term, Id>;
> type InterpretedQuad = Quad<Resource>;
>
> Then its possible to go from `Quad` to `LexicalQuad` using a 
> vocabulary, and from `LexicalQuad` to `InterpretedQuad` using an 
> interpretation.
> Of course interpretations can be reversible, so we can retreive the 
> (potentially multiple) lexical representation(s) of a resource.
> In practice I find that it is much much much easier to manipulate 
> interpreted datasets.
>
This is a very nice design indeed. May I suggest that you reuse this 
text in the documentation of the crate itself, by the way, because I 
confess I found this design also a bit confusing when I first tried to 
use rdf-type, and this kind of explanations would have been helpful :-D

That being said, this design goes a little beyond what I have in mind 
for R2C2:

- the idea is that values produced by an implementation could be (as 
much as possible) reused directly by another implementation, with no or 
very little conversion needed. To achieve that, I believe we should 
mostly define traits, rather than structs/enums (even generic ones).

- the notion of interpreted graph/dataset is interesting, but goes 
beyond the RDF spec in my opinion. Furthermore, the *interpretation* of 
a graph does not have to be an isomorphic graph... For examples, many 
properties with literal values may be internalized as attributes of the 
resources...

Of course, those opinions reflect my own biases, it does not mean that 
the CG has to follow them :)

>
> # Datasets
>
> `rdf-types` provide traits for interpreted datasets and graphs. Each 
> trait define a feature of the dataset, for instance:
> - DatasetMut: a dataset with an `insert` method
> - TraversableDataset: a dataset that can provide an iterator over the 
> quads
> - PatternMatchingDataset: a dataset that provides a pattern matching 
> method
> - etc.
>
> Again, users can implement those traits as they see fit, but 
> `rdf-types` also provide good defaults with `BTreeDataset`.
>
This approach is pretty much aligned with the one I took in Sophia

https://pchampin.github.io/sophia_rs/

>
> In conclusion, I know `rdf-types` is far from perfect and I welcome 
> any improvement, be it in `rdf-types` directly or a brand new 
> W3C-approved library, but I really wish to preserve the same kind of 
> type-oriented and flexible API that let me deal with interpreted 
> datasets with the custom types that best fit my application.
>
> [1]: https://github.com/timothee-haudebourg/rdf-types
> [2]: https://github.com/timothee-haudebourg/iref
>
Received on Tuesday, 25 February 2025 11:56:30 UTC