Re: Proposal to close ISSUE-51 as specified in shacl-ref from Dimitris Kontokostas on 2015-08-03 (public-data-shapes-wg@w3.org from August 2015)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Mon, 3 Aug 2015 14:01:35 +0300
To: Holger Knublauch <holger@topquadrant.com>
Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <CA+u4+a3m8JjR2v5x1YWQd3_S60UTDKLDRqtF2fUKEV-Xbg0ffQ@mail.gmail.com>
On Mon, Aug 3, 2015 at 4:05 AM, Holger Knublauch <holger@topquadrant.com>
wrote:

> Hi Dimitris,
>
> On 7/31/2015 23:38, Dimitris Kontokostas wrote:
>
> Coming back at this after yesterdays call.
>
> The most common use case that the current approach cannot easily answer is
> "get me all violations results"
> in order to do this we would have to enumerate explicitly all severity
> levels in the SHACL hierarchy (and possibly the user's extensions), use
> rdfs reasoning or merge the results in the same dataset with the shacl +
> user ontology in order to use property paths in the sparql query.
>
>
> The way that this was designed to work in my draft is that the engine is
> invoked with a minimum severity (see ?minSeverity in
> http://w3c.github.io/data-shapes/shacl/#operation-validateNode). The
> engine does always have full access to the shapes graph, and the shapes
> graph may include user-defined extensions, e.g. subclasses of sh:Info. RDFS
> reasoning is not needed here - similar to our handling of subclasses
> elsewhere it is sufficient to just walk the rdfs:subClassOf triples. So I
> cannot follow your line of reasoning above, as the question can be answered
> already. The SPARQL queries do not need access to the result classes - they
> just return SELECT variable bindings and the engine turns them into actual
> results.
>
> Having said this, my draft is too vague right now and implicitly assumes
> that there is a natural ordering of result classes. In response to this, I
> have added a numeric severity index to each severity level, which can be
> used to determine the ordering. Once this number exists, the subClassOf
> hierarchy becomes less relevant, and rather a mechanism to communicate the
> shape of supported properties.
>

Note that independent if we allow access to the shapes graph during
validation I don't think we should also require access the shapes graph /
shacl ontology when one reads the validation results.
Thus a very simple query such as get me all results gets quite complex
especially when results are stored offline and processed later.

note that I agree that access to the shapes graph / shacl ontology can be
handy for more advanced processing but we shouldn't have this dependency
for simple selections



> I have created a branch where we can hopefully finalize the revised design
> before making it a submission:
>
> https://github.com/w3c/data-shapes/tree/ISSUE-51
>
> (Only the turtle file is updated - I didn't want to go too far ahead on
> speculative grounds)
>
> See the rest of this email, I believe I have captured the spirit of your
> proposal, albeit with some minor and syntactical differences (see below).
> Please correct me if I have missed the point.
>

Looks good to me, I suggest we move the sh:root, sh:subject, sh:predicate,
sh:object to sh:ValidationResult class and see some minor comments inline


>
> On the other hand, Hoger's use case about attaching different properties
> based on the severity level could be possibly handled with shapes,
> using sh:ConstraintViolation as scope and filtering based on the severity
> level with sh:hasValue
>
>
> While I would not object to using shapes here, I think classes already
> provide a very natural way of modeling this. However, now reading about
> your email on ISSUE-75 and the list of SQL error enumerated by Ted in the
> previous call, I can see better why having severity level as a separate
> entity can also have its advantages - especially to better distinguish
> error handling from proper results. Furthermore, although the subclasses
> may in theory add new properties, there is no such example in the spec, so
> the issue is probably not as important as I thought.
>
>
> I re-propose my suggestion from
> https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015May/0145.html
> with renaming based on the current draft
>
> ========================
> #remove sh:ResultClass
>
>
> sh:ResultClass is now replaced by sh:Severity in my branch.
>
>
> sh:Result a rdfs:Class # the super class of all results (abstract)
>
> sh:severity a owl:ObjectProperty ; rdfs:domain sh:Result;
> rdfs:range sh:SeverityLevel .
>
> #sh:Result also contains sh:source (maybe sh:detail too)
>
> #Severity definitions
> sh:SeverityLevel a rdfs:Class
> sh:Error a sh:SeverityLevel, owl:NamedIndividual .
> sh:Warn a sh:SeverityLevel, owl:NamedIndividual .
>
>
> I am very much opposed to adding an (unnecessary) dependency on the OWL
> namespace here. If someone wants to treat the error objects as
> owl:NamedIndividual, then they can add this triple themselves in their
> local copies. But nothing in SHACL requires them to be named OWL
> individuals.
>

I used owl as it is a simpler / shorter way of defining the proposed
resolution, I am open to use shapes directly as you do in your draft


> Likewise I don't see why we should use rdfs:domain and rdfs:range here.
> Either we believe SHACL is suitable to communicate a data structure or we
> don't. rdfs:domain and ranges open up implications about inferencing that
> are unnecessary red herrings here. All we want to communicate is
> "sh:Results can have a sh:severity". SHACL is perfectly capable of doing
> this via sh:property. Again, if anyone wants to use this with RDFS tools,
> they can add these domain triples themselves. Or maybe the WG wants to
> produce an alternative version of the SHACL namespace especially for
> backward compatibility with pure RDFS/OWL tools - this would be quite easy
> to produce based on the shape definitions.
>
>
> #We could attach an integer/float property in sh:SeverityLevel, e.g.
> sh:severityFactor that could be used for ordering severity levels
>
>
> The term "severity factor" is used for different purposes elsewhere (
> https://en.wikipedia.org/wiki/Severity_factor) and I guess "factor"
> implies some kind of multiplication. In my draft I am using
> sh:severityIndex for now:
>
> sh:Info  0.0
> sh:Warning  1.0
> sh:Error 2.0
>
> (all xsd:decimal so that people can place new values in between)
>

I thing we should start with 10.0 for sh:Info with a 10.0 step between each
value in order to give room for others to define intermediate levels easier


> sh:ConstraintViolation a rdfs:Class; # the existing class in the spec
>    rdfs:subClassOf sh:Result .
> # sh:ConstraintViolation contains sh:root, sh:subject, sh:object, ...
> =============================
>
> Notes:
>
> I would also propose a minor renaming that would result in more accurate
> meaning
> sh:Result -> sh:AbstractResult
>
>
> Yes renaming to sh:AbstractResult makes sense here.
>
> sh:ConstraintViolation -> sh:ViolationInstance
>
>
> For now I picked sh:ValidationResult rdfs:subClassOf sh:AbstractResult. I
> believe the term violation no longer fits the bill because results may also
> include INFO or DEBUG messages. And everything is an "instance"...
>



> Then I added a class sh:Failure to enumerate the various reasons for
> "unexpected" situations such as timeouts:
> - sh:IOFailure
> - sh:UnsupportedRecursionFailure
> - Are there any other identifiable causes?
>

what about syntax error or timeout?
Q: For syntax errors (esp for sparql) do we report them during loading or
during execution?

Thanks,
Dimitris



> They are used via sh:failure by the class sh:FailureResult rdfs:subClassOf
> sh:AbstractResult.
>
> Other people may add other result classes such as your accumulated results.
>
>
> sh:Result / sh:AbstractResult
> This can be used in case someone wants to provide alternative results for
> SHACL and means that the minimum information one should have is a severity
> level and a link to the source (shape/facet/...) this result came from
>
>
> Another small change I did was to rename sh:source to sh:sourceConstraint
> and to add sh:sourceShape. The reason for this is that multiple shapes may
> share the same sh:Constraint, so we want to remember the context (if we
> can).
>
> Cheers,
> Holger
>
>
-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://http://aligned-project.eu,
http://rdfunit.aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
Research Group: http://aksw.org
Received on Monday, 3 August 2015 11:02:33 UTC