- From: Timothée Haudebourg <timothee.haudebourg@spruceid.com>
- Date: Thu, 22 Aug 2024 17:31:03 +0200
- To: public-r2c2@w3.org
- Message-ID: <e1893390-b912-48df-9e9b-9cb5657844e9@spruceid.com>
Hi everyone, As I said in my introduction, I am the author of the `rdf-types` library [1] defining types and traits for RDF (triples, quads, datasets, etc.). In this email I would like to take this time to explain the philosophy and design choice I adopted while writing `rdf-types`. I wanted `rdf-types` to be both strongly-typed and unopinionated, but with "good default opinions". * By "strongly-typed" I mean that it provides and uses precise dedicated types for each RDF concept. Never use `str` or `String` when we can be more precise. I created the `iref` crate [2] to define IRIs (we can discuss why not `iri-string` another time), but `rdf-types` by itself also provide a `BlankId` (unsized)/`BlankIdBuf` (sized) type for blank node identifiers, `Literal` for literal values with strongly typed lang tags, etc. * By "unopinionated" I mean that it uses type parameters as to provide flexibility, allowing the use of even more refined types, or just different types. It stills try to provide good defaults so type parameters can be ignored, for instance the `Quad` type is equivalent to `Quad<Id, IriBuf, Term, Id>` (a lexical RDF quad). # Vocabularies Because it can be really expensive to manipulate strings directly (linear time comparisons, memory usage), it is even possible to replace IRIs, blank ids and literal values with cheap types, like `u32` (but it can be any type really). To find the term associated to each `u32`, `rdf-types` define various `Vocabulary*` traits, mapping each value to a term. I think this is a really important feature to have when manipulating large datasets. # Interpretations `rdf-types` is not only about terms (the syntactic representations of resources), but also interpretations. Users can use any type to represent interpreted resources (like `u32` again), and define custom interpretations using the `Interpretation*` traits. In practice this means that a user will define what are lexical quads and interpreted quads like this: type LexicalQuad = Quad<Id, Iri, Term, Id>; type InterpretedQuad = Quad<Resource>; Then its possible to go from `Quad` to `LexicalQuad` using a vocabulary, and from `LexicalQuad` to `InterpretedQuad` using an interpretation. Of course interpretations can be reversible, so we can retreive the (potentially multiple) lexical representation(s) of a resource. In practice I find that it is much much much easier to manipulate interpreted datasets. # Datasets `rdf-types` provide traits for interpreted datasets and graphs. Each trait define a feature of the dataset, for instance: - DatasetMut: a dataset with an `insert` method - TraversableDataset: a dataset that can provide an iterator over the quads - PatternMatchingDataset: a dataset that provides a pattern matching method - etc. Again, users can implement those traits as they see fit, but `rdf-types` also provide good defaults with `BTreeDataset`. In conclusion, I know `rdf-types` is far from perfect and I welcome any improvement, be it in `rdf-types` directly or a brand new W3C-approved library, but I really wish to preserve the same kind of type-oriented and flexible API that let me deal with interpreted datasets with the custom types that best fit my application. [1]: https://github.com/timothee-haudebourg/rdf-types [2]: https://github.com/timothee-haudebourg/iref
Received on Thursday, 22 August 2024 15:31:10 UTC