Re: Dataset spec versioning, repo for testing/keeping track of implementations

Hey Alex

First of all, do you follow discussion Gitter [1]? There was a proposal to move @types/rdf-js [2] package from DefinitelyTyped [3] into the @rdfjs GitHub organization. A new repository has already been set up but now work commenced yet [4]. Not exactly related to your message below but is a related effort which might interest you.


On 24 October 2020 at 18:24:36, Alex Kreidler (alexkreidler2020@gmail.com) wrote:
> I noticed this comment
> that  
> indicated Graphy is compliant with the RDF/JS Dataset spec.
>  
> However I recently started writing my own Typescript declarations for the
> library as I'm trying to use it in a TS project and there are none publicly
> available. I may publish them soon.

Going to contribute to DefinitelyTyped?

>  
> While writing the TS declarations based on Graphy docs, I noticed that
> there were many methods from the RDF/JS Dataset spec that weren't
> implemented by Graphy.
>  
> It's possible that Graphy was compliant at the time but then methods were
> added to the Dataset spec after the fact, leaving Graphy in its current
> state non-compliant. However, if the spec had been versioned throughout
> that period, the divergence wouldn't have happened.

The compliance is a bit more than just implementing interfaces. I used Graphy serialisers a little and noticed that, unlike those of the @rdfjs scope on NPM, they are stateful and cannot be used to parse multiple streams. This is something currently not covered by the spec I think but would indeed be caught by a testing suite you propose.

>  
> I propose the RDF/JS community does a few things to foster compliance with
> the spec, one fairly simple, another more involved:
> 1. Commit to strict versioning of the spec. For example, I noticed several
> breaking changes, e.g. adding new methods
>  
> to the Dataset interface, where the version (still 1.0) hasn't been
> updated. This likely led to the situation I mentioned where an
> implementation (Graphy) was compliant, but the spec was updated without a
> proper version bump/notification, and clients of the new types now may call
> functions that don't actually exist if they assume Graphy implements
> Dataset. I have some more thoughts on this here

So, @rdfjs/data-model [5] and @rdfjs/dataset [6] do actually come with a script to run a set of test cases against the DataFactory and DatasetCore respectively.

However do note that 

1. they are hosted by a different GitHub org by Thomas Bergwinkl
2. the latter does not test, nor implement Dataset interface,

To the best of my knowledge, there exists no fully compliant implementation of Dataset. I’m not even sure that Dataset is considered stable at this moment.

> .
> 2. Create a Github repository whose sole purpose is to test compliance with
> the RDF/JS Dataset Spec. We can implement a few basic test cases (like
> adding, retrieving, matching quads), and then test them across all
> libraries that properly implement the API. This repo should also handle
> partial implementations, e.g. a library just missing a few methods, and
> notify the user. The main purpose of this would be to make sure libraries
> are up to date with the spec.

Sounds reasonable. I would start by migrating the test cases from [5] and [6] into @rdfjs and packaging them as test harnesses(es).

Work on such a test harness would have to be always done in sync with changes to the spec itself.

>  
> For libraries that have been added to that testing repo and pass the tests,
> we could have library authors say something like RDF/JS Dataset v2.4
> Verified. We could also then add an icon to the rdf.js.org site to indicate
> to users which libraries are more thoroughly tested.
>  
> Having a single testing repo would be more work, but it could also do a few
> things:
> - Eliminate duplication of testing code/maintenance from each library (e.g.
> allow N3 and Graphy to be tested with the same code)
> - Potentially, expand to include benchmarking and comparing RDF/JS
> implementations
> - Provide real insights to spec authors how implementations are faring in
> complying with the spec. It would centralize a place where implementations
> have committed to abiding by the spec, and those implementations would have
> input on new versions. In fact, new versions of the spec might be marked as
> a "dev" version for a while so that implementations can add the appropriate
> features and then the testing repo can be updated, and then released when
> the impls are done. This would better integrate the libraries and the spec,
> so that things like the `contains` method wouldn't be added without the
> libraries' feedback and actual implementation of it. This would discourage
> extraneous modifications/feature creep in the spec.
> - Finally, and most out-there: it could allow testing the interoperability
> of RDF/JS implementations.
>  
> Some functions may be fine for this already: e.g. addAll on a Dataset. If
> the implementation B simply turns the Dataset into a Stream and consumes
> it, then it should interoperate with other implementations (e.g. impl A),
> because it simply relies on well-known methods straight from the spec.
> However, what about calling `D1.equals(D2)`, where D1 and D2 are backed by
> N3 and Graphy for example
> These libraries use optimized in-memory storage representations that are
> likely unique from each other, and implement those comparison and other
> more algorithmic aspects of the RDF/JS spec by relying on their own
> internal structure to increase performance. Here, it might not be desirable
> to even allow interoperability. Theoretically, though, one could write some
> of the higher-level algorithms solely based on lower-level constructs like
> the `match` and other functions. A library could then indicate to users
> that it accepts Datasets from other implementations for these functions.
>  
> There's a lot here, and versioning might be all that's needed for now. But
> I'd love any feedback you have.
>  
> Thanks for all the awesome libraries and this great community!
> 

[1]: https://gitter.im/rdfjs/Representation-Task-Force?at=5f70c4008fe6f11963656c31
[2]: https://www.npmjs.com/package/@types/rdf-js
[3]: https://github.com/DefinitelyTyped/DefinitelyTyped/tree/master/types/rdf-js
[4]: https://github.com/rdfjs/types 
[5]: https://github.com/rdfjs-base/data-model
[6] https://github.com/rdfjs-base/dataset

Received on Saturday, 24 October 2020 17:39:23 UTC