Re: Release of ShEx.ex 0.1 from Jose Emilio Labra Gayo on 2019-07-15 (public-shex@w3.org from July 2019)

From: Jose Emilio Labra Gayo <jelabra@gmail.com>
Date: Tue, 16 Jul 2019 01:02:46 +0200
To: Marcel Otto <marcelotto.de@googlemail.com>
Cc: public-shex@w3.org
Message-ID: <CAJadXXJ18E1Sq3c-q5d9F_FjQ-JPiWEgk5=QwWcNdXO1xxfkTA@mail.gmail.com>
Congratulations for the new ShEx implementation. It is really great to have
a new one.

About public data and examples for benchmarks. Two years ago, we wrote a
paper where we proposed a possible benchmark based on a real project called
the WebIndex. I had implemented a program that generated both valid and
non-valid RDF data according to a data model inspired by the WebIndex
model.

That work was published as a draft paper here [1]. I was planning to resume
that work once there were more implementations and/or I had more time,
maybe you want to reuse part of that work and test your system with it. If
you do it, let me know if I can help.

The source code of the benchmark data generation tool is here:
http://labra.weso.es/wiGen/

[1] Validating and describing linked data portals using shapes, Jose-Emilio
Labra-Gayo, Eric Prud'hommeaux, Harold Solbrig, Iovka Boneva,
arXiv:1701.08924 [cs.DB] https://arxiv.org/abs/1701.08924


Best regards, Jose Labra

On Mon, Jul 15, 2019 at 10:45 PM Marcel Otto <marcelotto.de@googlemail.com>
wrote:

> Hi,
>
> I'm very happy to announce the first release of ShEx.ex, an Elixir
> implementation of the ShEx and ShapeMap specs. You can find the source at
> https://github.com/marcelotto/shex-ex and a short guide at
> https://rdf-elixir.dev/shex-ex/
>
> One distinguishing feature of ShEx.ex might be its support for parallel
> processing of larger amounts of nodes out-of-the-box. This feature is
> however is still considered experimental, as it currently lacks empirically
> founded parameters for the workload distribution (batch sizes etc). For
> this reason, I want to ask if there are some public example data and
> schemas for testing and comparison purposes.
>
> ShEx.ex already passes large parts of the official test suite. However,
> here are the ones that still fail:
>
> - the `negativeStructure` tests are not passing yet, because the schema is
> not yet validated for the respective problems on schema creation time (I
> hope to deliver this soon)
> - the following features are in general not supported yet, so all test
> with the resp. traits are not passing: imports, external shapes,
> annotations, semantic actions
> - `1literalPattern_with_ascii_boundaries_fail` and
> `1literalPattern_with_all_controls_fail` are failing because of some issues
> with non-ascii characters
> - `nPlus1` and `PTstar-greedy-fail` are failing because of an issue with
> greediness
> - `FocusIRI2groupBnodeNested2groupIRIRef` and
> `FocusIRI2EachBnodeNested2EachIRIRef` are failing
> - A major issue for now is the limited set of supported datatypes in
> RDF.ex (on top of which ShEx.ex is implemented): xsd:boolean, xsd:integer,
> xsd:decimal, xsd:double, xsd:time, xsd:date, xsd:dateTime.  This limits the
> applicability of numeric value constraints and the lexical form checks for
> datatype constraints and makes 29 tests using unsupported datatypes in
> these circumstances fail. Addressing the limited set of supported datatypes
> is one of the next planned features for RDF.ex, but will take some time, as
> I'm generally not happy with the current implementation of the XSD
> datatypes and want to do a rewrite of that part.
> - I also had some struggles with the JSON-based format for ShapeMaps used
> in the tests with `sht:ShapeMap` trait as I couldn't find any information
> about it. The only place it seems to be mentioned is in "Example 1" of the
> ShapeMap spec. But it doesn't mention any further details for how to encode
> for example literals. Is this format even intended to be used outside of
> the test suite?
>
> I hope you find ShEx.ex nevertheless useful already and would be happy to
> hear your thoughts on it.
>
> Best,
> Marcel
>
>

-- 
-- Jose Labra
Received on Monday, 15 July 2019 23:03:22 UTC