Re: Dataset spec versioning, repo for testing/keeping track of implementations

Hi Alex,

However I recently started writing my own Typescript declarations for the
> library as I'm trying to use it in a TS project and there are none publicly
> available. I may publish them soon.


Happy to have a contributor! However, the next major release of graphy is
in the works as we speak, and I have already implemented typings for all
public methods (some sourced in TypeScript, others added by hand). I am
working on getting an alpha version published soon for testing and getting
the branch online once it's cleaned up a bit so that others are able to
build it. I can ping the list once that's up.

 I noticed that there were many methods from the RDF/JS Dataset spec that
> weren't implemented by Graphy.


The `Dataset` interface is experimental (see the warning in the spec) and
has changed several times without versioning. I agree this is bad practice
but the spec has simply not received much attention. We should definitely
try to be more diligent going forward.

However, what about calling `D1.equals(D2)`, where D1 and D2 are backed by
> N3 and Graphy for example


These cases are actually covered by the spec, and implementations SHOULD
safely be able to compare even if it means using a less efficient
algorithm.

I agree with Tomasz' response about test suite, and would add that the
Dataset spec needs to resolve some standing issues first.

 - Blake


On Sat, Oct 24, 2020 at 10:40 AM Tomasz Pluskiewicz <tomasz@t-code.pl>
wrote:

> Hey Alex
>
> First of all, do you follow discussion Gitter [1]? There was a proposal to
> move @types/rdf-js [2] package from DefinitelyTyped [3] into the @rdfjs
> GitHub organization. A new repository has already been set up but now work
> commenced yet [4]. Not exactly related to your message below but is a
> related effort which might interest you.
>
>
> On 24 October 2020 at 18:24:36, Alex Kreidler (alexkreidler2020@gmail.com)
> wrote:
> > I noticed this comment
> > that
> > indicated Graphy is compliant with the RDF/JS Dataset spec.
> >
> > However I recently started writing my own Typescript declarations for the
> > library as I'm trying to use it in a TS project and there are none
> publicly
> > available. I may publish them soon.
>
> Going to contribute to DefinitelyTyped?
>
> >
> > While writing the TS declarations based on Graphy docs, I noticed that
> > there were many methods from the RDF/JS Dataset spec that weren't
> > implemented by Graphy.
> >
> > It's possible that Graphy was compliant at the time but then methods were
> > added to the Dataset spec after the fact, leaving Graphy in its current
> > state non-compliant. However, if the spec had been versioned throughout
> > that period, the divergence wouldn't have happened.
>
> The compliance is a bit more than just implementing interfaces. I used
> Graphy serialisers a little and noticed that, unlike those of the @rdfjs
> scope on NPM, they are stateful and cannot be used to parse multiple
> streams. This is something currently not covered by the spec I think but
> would indeed be caught by a testing suite you propose.
>
> >
> > I propose the RDF/JS community does a few things to foster compliance
> with
> > the spec, one fairly simple, another more involved:
> > 1. Commit to strict versioning of the spec. For example, I noticed
> several
> > breaking changes, e.g. adding new methods
> >
> > to the Dataset interface, where the version (still 1.0) hasn't been
> > updated. This likely led to the situation I mentioned where an
> > implementation (Graphy) was compliant, but the spec was updated without a
> > proper version bump/notification, and clients of the new types now may
> call
> > functions that don't actually exist if they assume Graphy implements
> > Dataset. I have some more thoughts on this here
>
> So, @rdfjs/data-model [5] and @rdfjs/dataset [6] do actually come with a
> script to run a set of test cases against the DataFactory and DatasetCore
> respectively.
>
> However do note that
>
> 1. they are hosted by a different GitHub org by Thomas Bergwinkl
> 2. the latter does not test, nor implement Dataset interface,
>
> To the best of my knowledge, there exists no fully compliant
> implementation of Dataset. I’m not even sure that Dataset is considered
> stable at this moment.
>
> > .
> > 2. Create a Github repository whose sole purpose is to test compliance
> with
> > the RDF/JS Dataset Spec. We can implement a few basic test cases (like
> > adding, retrieving, matching quads), and then test them across all
> > libraries that properly implement the API. This repo should also handle
> > partial implementations, e.g. a library just missing a few methods, and
> > notify the user. The main purpose of this would be to make sure libraries
> > are up to date with the spec.
>
> Sounds reasonable. I would start by migrating the test cases from [5] and
> [6] into @rdfjs and packaging them as test harnesses(es).
>
> Work on such a test harness would have to be always done in sync with
> changes to the spec itself.
>
> >
> > For libraries that have been added to that testing repo and pass the
> tests,
> > we could have library authors say something like RDF/JS Dataset v2.4
> > Verified. We could also then add an icon to the rdf.js.org site to
> indicate
> > to users which libraries are more thoroughly tested.
> >
> > Having a single testing repo would be more work, but it could also do a
> few
> > things:
> > - Eliminate duplication of testing code/maintenance from each library
> (e.g.
> > allow N3 and Graphy to be tested with the same code)
> > - Potentially, expand to include benchmarking and comparing RDF/JS
> > implementations
> > - Provide real insights to spec authors how implementations are faring in
> > complying with the spec. It would centralize a place where
> implementations
> > have committed to abiding by the spec, and those implementations would
> have
> > input on new versions. In fact, new versions of the spec might be marked
> as
> > a "dev" version for a while so that implementations can add the
> appropriate
> > features and then the testing repo can be updated, and then released when
> > the impls are done. This would better integrate the libraries and the
> spec,
> > so that things like the `contains` method wouldn't be added without the
> > libraries' feedback and actual implementation of it. This would
> discourage
> > extraneous modifications/feature creep in the spec.
> > - Finally, and most out-there: it could allow testing the
> interoperability
> > of RDF/JS implementations.
> >
> > Some functions may be fine for this already: e.g. addAll on a Dataset. If
> > the implementation B simply turns the Dataset into a Stream and consumes
> > it, then it should interoperate with other implementations (e.g. impl A),
> > because it simply relies on well-known methods straight from the spec.
> > However, what about calling `D1.equals(D2)`, where D1 and D2 are backed
> by
> > N3 and Graphy for example
> > These libraries use optimized in-memory storage representations that are
> > likely unique from each other, and implement those comparison and other
> > more algorithmic aspects of the RDF/JS spec by relying on their own
> > internal structure to increase performance. Here, it might not be
> desirable
> > to even allow interoperability. Theoretically, though, one could write
> some
> > of the higher-level algorithms solely based on lower-level constructs
> like
> > the `match` and other functions. A library could then indicate to users
> > that it accepts Datasets from other implementations for these functions.
> >
> > There's a lot here, and versioning might be all that's needed for now.
> But
> > I'd love any feedback you have.
> >
> > Thanks for all the awesome libraries and this great community!
> >
>
> [1]:
> https://gitter.im/rdfjs/Representation-Task-Force?at=5f70c4008fe6f11963656c31
> [2]: https://www.npmjs.com/package/@types/rdf-js
> [3]:
> https://github.com/DefinitelyTyped/DefinitelyTyped/tree/master/types/rdf-js
> [4]: https://github.com/rdfjs/types
> [5]: https://github.com/rdfjs-base/data-model
> [6] https://github.com/rdfjs-base/dataset
>
>

Received on Saturday, 24 October 2020 18:13:31 UTC