Re: AW: Thoughts on validation requirements from Dimitris Kontokostas on 2014-07-28 (public-rdf-shapes@w3.org from July 2014)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Mon, 28 Jul 2014 10:38:41 +0300
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: "Bosch, Thomas" <Thomas.Bosch@gesis.org>, "public-rdf-sha." <public-rdf-shapes@w3.org>
Message-ID: <CA+u4+a2R3yrMMYTW4exJOLy2P0tNj5SY3C_YgaJNb_2f1piviQ@mail.gmail.com>
On Sun, Jul 27, 2014 at 10:21 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

>
> On Jul 27, 2014 11:37 AM, "Bosch, Thomas" <Thomas.Bosch@gesis.org> wrote:
> >
> > Hi Dimitris
> >
> >
> >
> > Although I do not have any industry experience in this field, I have the
> following to note from my related research.
> >
> > If we want RDF to become mainstream we shouldn't expect people to learn
> OWL, logics & Manchester syntax in order to formulate or understand a
> simple constraint.
> > They should exist somehow but should be moved as many levels up as
> possible. Similarly for SPARQL.
> >
> > Regarding ShEx:
> >
> > - I am also unconfortable with the un-typed validation but I also see
> the need to support it. Unless of course RDF somewhere specifies that every
> resource MUST have a rdf:type. This however should not be the primary focus
> of ShEx since it is not the common case.
> >
> >
> > -->
> http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-143-CONDITIONAL-TYPED-VALIDATION
> >
> >
> > - Shapes related to types (as described in Resource Shapes) should be
> specified more explicitly promoted. In general, these rules are easier to
> validate since you can define the selectivity based on the type and is more
> common in practice.
>
>  I think we need to discriminate between the case where a shape requires
> that a particular type must be present versus where the type is sufficient
> to uniquely identify the shape which must be applied. E.g. of former -- the
> object of the dc: creator must be a foaf:Person. E.g. of the latter -- some
> service where one submits a mysrvc:NewlyMintedIssue. I think you were
> talking about the latter.
>
exactly

>  > -->
> http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-191-SHAPES-RELATED-TO-TYPES
> >
> >
> > - I also agree with Antoine Issac that some more emphasis should be
> given to OWL
> >
> > - further modularization is needed to the syntax. In almost all cases a
> a foaf:name has the same range (and the same domain) in a single
> document/graph. Stating these rules separately make the rule execution more
> efficient.
> > e.g. I can independently check the range (and domain) of foaf:name and
> inside the shape I only check it's existence (if specified).
> >
> >
> > -->
> lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-136-MODULARITY-OF-CONSTRAINT-DEFINITIONS
>
> >
> >
> > General requirements from a validation solution
> >
> > - Rule severity level. Not all errors are equal and we need somehow to
> distinguish them. RDFUnit uses rlog [RLOG] but anything related (e.g. part
> of RFC2119) could do. (see [LEVEL])
> >
> >
> > -->
> http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-158-SEVERITY-LEVELS
> >
> >
> > - Annotations: There should be a (standard) way of people to define
> annotation on top of rules. These annotations could serve many purposes
> from error classification to commands on how to process the errors.
> >
> >
> > -->
> http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-192-DEFINE-ANNOTATIONS-FOR-CONSTRAINTS
> >
> >
> > - Descriptions: Every rule should attach an error message for the end
> user. Some messages can be generated automatically but some cannot and the
> language must provide this facility
> >
> >
> > -->
> lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-159-EXPLAIN-REASONS-OF-CONSTRAINT-VIOLATIONS
> >
> >
> > - Results & execution level. There should be different execution models
> with different results serializations. e.g. I want only a success / fail,
> only the error count per rule, all the individual erroneous resources or
> error instances with annotations. (I know that we need to fix the
> validation language first)
> >
> >
> > -->
> lelystad.informatik.uni-mannheim.de/rdf-validation/?q=R-193-MULTIPLE-EXECUTION-LEVELS
> >
> >
> > - I also mentioned earlier about owl-reuse for automatic rule generation
> and rules attached to vocabularies [REUSE] as well as type inference
> [INFERENCE].
> >
> > RDFUnit in the middle too
> >
> > I try to tackle all these issues in my implementation but I had to
> develop my own rdf model and it's quite hard to write RDF & SPARQL
> manually.
> > We support OWL (partially) so I used it when possible but it is not so
> straightforward as well.
> > if OSLC resource shapes was submitted earlier I might have used that
> instead for common cases (although it can be further extended).
> > From the top of my head implementing OSLC would be as easy as providing
> a configuration file such as this [OWL-CONFIG] to cover the (typed) spec.
> > SPIN was also limiting in our approach, not only for the aforementioned
> requirements, but for reasons described in [RDFUNIT section 7]. However,
> RDFUnit could easily export everything to SPIN as well. My point is that
> all three existing solutions and more or less interoperable in terms or
> verifying  constraints.
> >
> > RDFUnit is a 1 year R&D project and of course I do not dare to compare
> it to full-stack enterprise solutions like SPIN & ICV. We reused concepts
> from both approaches but I think neither of them is perfect as is. What I
> miss is an easy & compact syntax to write validation rules and looks like
> ShEx has a good potential on providing that.
> > (also note that this refers to writing/reading rules in a text editor,
> behind a rich user interfaces everything looks nice & easy)
> >
> > Best,
> > Dimitris
> >
> > [RLOG] http://persistence.uni-leipzig.org/nlp2rdf/ontologies/rlog#
> > [LEVEL]
> http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jun/0009.html
> > [OWL-CONFIG]
> https://github.com/AKSW/RDFUnit/blob/master/rdfunit-core/src/main/resources/org/aksw/rdfunit/testAutoGenerators.ttl
> > [RDFUNIT] http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf
> > [INFERENCE]
> http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jul/0088.html
> > [REUSE]
> http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jul/0019.html
> > --
> > Dimitris Kontokostas
> > Department of Computer Science, University of Leipzig
> > Research Group: http://aksw.org
> > Homepage:http://aksw.org/DimitrisKontokostas
>



-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
Received on Monday, 28 July 2014 07:39:43 UTC