Dataset spec versioning, repo for testing/keeping track of implementations from Alex Kreidler on 2020-10-24 (public-rdfjs@w3.org from October 2020)

From: Alex Kreidler <alexkreidler2020@gmail.com>
Date: Sat, 24 Oct 2020 01:12:17 -0400
To: public-rdfjs@w3.org
Message-ID: <CABRqtphZ8yuaJG9jyJEuw9FENEOk57v8PL36FGAVoMnvt-ZK8Q@mail.gmail.com>
I noticed this comment
<https://github.com/blake-regalia/graphy.js/issues/14#issue-528283274> that
indicated Graphy is compliant with the RDF/JS Dataset spec.

However I recently started writing my own Typescript declarations for the
library as I'm trying to use it in a TS project and there are none publicly
available. I may publish them soon.

While writing the TS declarations based on Graphy docs, I noticed that
there were many methods from the RDF/JS Dataset spec that weren't
implemented by Graphy.

It's possible that Graphy was compliant at the time but then methods were
added to the Dataset spec after the fact, leaving Graphy in its current
state non-compliant. However, if the spec had been versioned throughout
that period, the divergence wouldn't have happened.

I propose the RDF/JS community does a few things to foster compliance with
the spec, one fairly simple, another more involved:
1. Commit to strict versioning of the spec. For example, I noticed several
breaking changes, e.g. adding new methods
<https://github.com/rdfjs/dataset-spec/commit/4f121a552f75214dd54d1788438f3c07a50b14d5>
to the Dataset interface, where the version (still 1.0) hasn't been
updated. This likely led to the situation I mentioned where an
implementation (Graphy) was compliant, but the spec was updated without a
proper version bump/notification, and clients of the new types now may call
functions that don't actually exist if they assume Graphy implements
Dataset. I have some more thoughts on this here
<https://github.com/rdfjs/dataset-spec/issues/60>.
2. Create a Github repository whose sole purpose is to test compliance with
the RDF/JS Dataset Spec. We can implement a few basic test cases (like
adding, retrieving, matching quads), and then test them across all
libraries that properly implement the API. This repo should also handle
partial implementations, e.g. a library just missing a few methods, and
notify the user. The main purpose of this would be to make sure libraries
are up to date with the spec.

For libraries that have been added to that testing repo and pass the tests,
we could have library authors say something like RDF/JS Dataset v2.4
Verified. We could also then add an icon to the rdf.js.org site to indicate
to users which libraries are more thoroughly tested.

Having a single testing repo would be more work, but it could also do a few
things:
- Eliminate duplication of testing code/maintenance from each library (e.g.
allow N3 and Graphy to be tested with the same code)
- Potentially, expand to include benchmarking and comparing RDF/JS
implementations
- Provide real insights to spec authors how implementations are faring in
complying with the spec. It would centralize a place where implementations
have committed to abiding by the spec, and those implementations would have
input on new versions. In fact, new versions of the spec might be marked as
a "dev" version for a while so that implementations can add the appropriate
features and then the testing repo can be updated, and then released when
the impls are done. This would better integrate the libraries and the spec,
so that things like the `contains` method wouldn't be added without the
libraries' feedback and actual implementation of it. This would discourage
extraneous modifications/feature creep in the spec.
- Finally, and most out-there: it could allow testing the interoperability
of RDF/JS implementations.

Some functions may be fine for this already: e.g. addAll on a Dataset. If
the implementation B simply turns the Dataset into a Stream and consumes
it, then it should interoperate with other implementations (e.g. impl A),
because it simply relies on well-known methods straight from the spec.
However, what about calling `D1.equals(D2)`, where D1 and D2 are backed by
N3 and Graphy for example
These libraries use optimized in-memory storage representations that are
likely unique from each other, and implement those comparison and other
more algorithmic aspects of the RDF/JS spec by relying on their own
internal structure to increase performance. Here, it might not be desirable
to even allow interoperability. Theoretically, though, one could write some
of the higher-level algorithms solely based on lower-level constructs like
the `match` and other functions. A library could then indicate to users
that it accepts Datasets from other implementations for these functions.

There's a lot here, and versioning might be all that's needed for now. But
I'd love any feedback you have.

Thanks for all the awesome libraries and this great community!
Received on Saturday, 24 October 2020 16:23:54 UTC