Thoughts on validation requirements from Dimitris Kontokostas on 2014-07-22 (public-rdf-shapes@w3.org from July 2014)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Tue, 22 Jul 2014 18:24:02 +0300
To: "public-rdf-sha." <public-rdf-shapes@w3.org>
Message-ID: <CA+u4+a0Mey7DLbOxJerxx3jm=9uUJD7Ohu+6p38oVwu8d2C--A@mail.gmail.com>

Although I do not have any industry experience in this field, I have the
following to note from my related research.

If we want RDF to become mainstream we shouldn't expect people to learn
OWL, logics & Manchester syntax in order to formulate or understand a
simple constraint.
They should exist somehow but should be moved as many levels up as
possible. Similarly for SPARQL.

Regarding ShEx:

- I am also unconfortable with the un-typed validation but I also see the
need to support it. Unless of course RDF somewhere specifies that every
resource MUST have a rdf:type. This however should not be the primary focus
of ShEx since it is not the common case.

- Shapes related to types (as described in Resource Shapes) should be
specified more explicitly and promoted. In general, these rules are easier
to validate since you can define the selectivity based on the type and is
more common in practice.

- I also agree with Antoine Issac that some more emphasis should be given
to OWL

- further modularization is needed to the syntax. In almost all cases a a
foaf:name has the same range (and the same domain) in a single
document/graph. Stating these rules separately make the rule execution more
efficient.
e.g. I can independently check the range (and domain) of foaf:name and
inside the shape I only check it's existence (if specified).

General requirements from a validation solution

- Rule severity level. Not all errors are equal and we need somehow to
distinguish them. RDFUnit uses rlog [RLOG] but anything related (e.g. part
of RFC2119) could do. (see [LEVEL])

- Annotations: There should be a (standard) way of people to define
annotation on top of rules. These annotations could serve many purposes
from error classification to commands on how to process the errors.

- Descriptions: Every rule should attach an error message for the end user.
Some messages can be generated automatically but some cannot and the
language must provide this facility

- Results & execution level. There should be different execution models
with different results serializations. e.g. I want only a success / fail,
only the error count per rule, all the individual erroneous resources or
error instances with annotations. (I know that we need to fix the
validation language first)

- I also mentioned earlier about owl-reuse for automatic rule generation
and rules attached to vocabularies [REUSE] as well as type inference
[INFERENCE].

RDFUnit in the middle too

I try to tackle all these issues in my implementation but I had to develop
my own rdf model and it's quite hard to write RDF & SPARQL manually.
We support OWL (partially) so I used it when possible but it is not so
straightforward as well.
if OSLC resource shapes was submitted earlier I might have used that
instead for common cases (although it can be further extended).
>From the top of my head implementing OSLC would be as easy as providing a
configuration file such as this [OWL-CONFIG] to cover the (typed) spec.
SPIN was also limiting in our approach, not only for the aforementioned
requirements, but for reasons described in [RDFUNIT section 7]. However,
RDFUnit could easily export everything to SPIN as well. My point is that
all three existing solutions and more or less interoperable in terms or
verifying constraints.

RDFUnit is a 1 year R&D project and of course I do not dare to compare it
to full-stack enterprise solutions like SPIN & ICV. We reused concepts from
both approaches but I think neither of them is perfect as is. What I miss
is an easy & compact syntax to write validation rules and looks like ShEx
has a good potential on providing that.
(also note that this refers to writing/reading rules in a text editor,
behind a rich user interfaces everything looks nice & easy)

Best,
Dimitris

[RLOG] http://persistence.uni-leipzig.org/nlp2rdf/ontologies/rlog#
[LEVEL]
http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jun/0009.html
[OWL-CONFIG]
https://github.com/AKSW/RDFUnit/blob/master/rdfunit-core/src/main/resources/org/aksw/rdfunit/testAutoGenerators.ttl
[RDFUNIT] http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf
[INFERENCE]
http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jul/0088.html
[REUSE]
http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jul/0019.html
--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

Received on Tuesday, 22 July 2014 15:24:59 UTC