Re: Human-readable error messages in an R2RML validator from Richard Cyganiak on 2015-03-23 (public-data-shapes-wg@w3.org from March 2015)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 23 Mar 2015 12:05:12 +0000
To: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Cc: Holger Knublauch <holger@topquadrant.com>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-Id: <F8A23000-6EFC-4DE9-A5A2-9CD90324CC36@cyganiak.de>
Hi Dimitris,

> On 23 Mar 2015, at 07:45, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> wrote:
> 
> What we further need to investigate is the different types of messages a SHACL engine would produce. I see the following possible types of messages:
> 
> 1) violations of the high level vocabulary facets. e.g. minCount
> In this case we can either let each SHACL engine produce the messages they want i.e. property {x} has less than {y} occurrences in shape {z}
> or we can define a message vocabulary where each engine must|should pick up the template messages to use. 
> The advantage in the latter approach is that all engines will generate the same messages to the end user and by accepting outside contributions the vocabulary will be easily translated to many languages (as Jose also noted).
> I don't think there is any reason for an end-user to redefine these messages since their interpretation is very well defined.
> 
> 2) messages for a sh:property / sh:Shape
> I am not sure what type of messages these should produce if we provide messages for the facets, maybe some high level comments only? 
> Also by wanting to display this level of messages we would need to somehow aggregate errors at the property / shape level and we'd miss the error details facets provide. 

I think for some facets, in particular regular expression facets, having custom messages is necessary. I want to tell users that “this isn’t a valid username” instead of “this doesn’t match the regex [a-zA-Z0-9_-]+”.

Here’s an example of a message that might be associated with a shape:

    "The property {property} is only allowed on term maps that generate literals, but the rr:termType of {resource} is not rr:Literal.”

Technically speaking this only says that under certain conditions there is a cardinality constraint, and that constraint has been violated, but that’s much easier to explain to a user with a custom message.

> 3) messages for sparql queries
> if the query does not specify the message in the sparql body how would it differ from the shape message or a message from another sparql query in the same shape? If we want this functionality we'd need to put SPARQL queries in intermediate/blank nodes

SPARQL queries might bind extra variables that could become available for use in message templates.

> 4) messages for property templates.
> If we define a message vocabulary we can re-use it for defining these types of messages
> 
> Finally, I also don't see any reason to formalize an intermediate node structure for the creation of messages.
> Since a SHACL engine can return sh:root, sh:predicate & sh:value as results, we don't need anything else for an agent to post-process the results.
> messages are for human consumption only and each engine can create it's internal intermediate structure for the message generation

Formalising a data structure for validation reports may be a very simple affair. It may just be a matter of defining some names like ?root and ?value that have special meaning. A validation report is then simply some key-value pairs with keys like ?root and ?value. The output of a SELECT-style SPARQL query would directly produce a validation report, but other non-SPARQL methods could of course be used to produce an equivalent data structure.

Best,
Richard


> 
> Best,
> Dimitris
> 
> 
> On Mon, Mar 23, 2015 at 3:36 AM, Holger Knublauch <holger@topquadrant.com> wrote:
> On 3/20/2015 22:01, Richard Cyganiak wrote:
> 
> An advantage of using templates for the validation messages, rather than producing a string message through a SPARQL expression, is that we can format the nodes nicely or make things interactive. For example, the {object} placeholder in the R2RML validator will intelligently pretty-print the node in Turtle style as a prefixed name, full URI, literal, or blank node, using the prefix mapping of the file under validation. And in a hypothetical graphical environment, URI nodes in the rendered message could be rendered using its rdfs:label, and still be made clickable.
> 
> Absolutely. Indeed we use SPIN label templates for the same purpose in various TopBraid user interfaces.
> 
> Not having thought about it too much, my intuition is that instead of conditional insertion, I’d prefer the option of having multiple template strings on a single constraint. Only those where all placeholders have bound values in the validation data structure would produce a message. 
> 
> This sounds like a good idea!
> 
> I will add some TODO item to the sh:message to make sure we revisit this topic once there is time for such details.
> 
> Thanks,
> Holger
> 
> 
> 
> 
> 
> -- 
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig
> Research Group: http://aksw.org
> Homepage:http://aksw.org/DimitrisKontokostas
Received on Monday, 23 March 2015 12:05:38 UTC