Re: Human-readable error messages in an R2RML validator

> On 20 Mar 2015, at 03:20, Holger Knublauch <holger@topquadrant.com> wrote:
> 
> Hi Richard,
> 
> thanks for sharing this, which is relevant for
> 
> https://www.w3.org/2014/data-shapes/wiki/Requirements#Human-readable_Violation_Messages
> 
> which currently has some objections. I notice that you are (also) using a string template mechanism with placeholders such as {varName}. I believe we could/should use something similar for the values of sh:message. This may address some concerns by Jose, because it would allow the constraints/templates to only produce the human-readable messages on demand, from an abstract validation data structure (sh:root, sh:path etc).

This makes sense to me.

> SHACL templates already have a simple outline for such a mechanism, inherited from SPIN:
> 
>    http://w3c.github.io/data-shapes/shacl/#template-labelTemplate
> 
> In many examples that I have seen though, simple value replacements are not sufficient, and messages need to be assembled more dynamically. So I think there remains value in having the fallback to produce sh:message strings as part of the WHERE clause.

An advantage of using templates for the validation messages, rather than producing a string message through a SPARQL expression, is that we can format the nodes nicely or make things interactive. For example, the {object} placeholder in the R2RML validator will intelligently pretty-print the node in Turtle style as a prefixed name, full URI, literal, or blank node, using the prefix mapping of the file under validation. And in a hypothetical graphical environment, URI nodes in the rendered message could be rendered using its rdfs:label, and still be made clickable.

The templates are also more readable than a SPARQL concatenation expression, and might even serve as a form of documentation for the constraint when one is looking at the raw SHACL file.

And one can always fall back to a trivial template of the form “{?theMessage}”, and compute ?theMessage in the SPARQL query.

> Question: do you have any specific grammar for the string substitution templates,

In this R2RML validator, the templates all render the same underlying data structure, so there’s a limited, fixed set of placeholder names. We’ve grown this template system to the point where it could support everything we wanted to do for R2RML validation, but no further. The supported placeholders are:

{resource}
{property}
{properties}
{properties|or}
{object}
{objects}
{objects|or}
{object|nodetype}
{string}
{details}
{details|withcode}
{detailcode}

The singular forms {property} and {object} are only replaced if there’s exactly one. The plural forms {properties} and {objects} are replaced if there’s one or more, using “A, B and C” form. The “|or” version outputs “A, B or C”. The “|and” is implicit if “|or” is absent.

The plural forms are handy for things like:

    “{resource} is missing one of the properties {properties|or}”
    “{resource} can’t be of types {objects|and} at the same time”

{object|nodetype} produces one of the following: “IRI”, “blank node”, “language-tagged string”, “plain literal”, “string literal”, “non-string typed literal”. These made sense in the context of R2RML validation. Typical use: “{property} of {resource} must have an integer value but is {object|nodetype}.”

{string} is only replaced if the object is a literal, with just the lexical form. Used if we don’t want double quotes, language tag, etc. in the error message.

The various {details} forms are usually used when the error was reported by some black-box subsystem such as the SQL parser or the IRI parser. Typical use: “The rr:sqlQuery of {resource} is not a valid SQL query: {details}” And {details} is whatever the SQL parser reported.

> and do you see a realistic way to also support conditional insertions, e.g. "Count must be > {?minCount}. Count must be < {?maxCount}" so that the min and max sections only appear if a template has a value for ?min/maxCount?

In this particular project I may have handled that as two separate constraints. Sometimes, splitting a complex constraint into multiple more verbose simpler constraints can be worth it, if it allows generating more specific validation messages.

Not having thought about it too much, my intuition is that instead of conditional insertion, I’d prefer the option of having multiple template strings on a single constraint. Only those where all placeholders have bound values in the validation data structure would produce a message. So,

    sh:violationMessageTemplate "Count must be > {?minCount}";
    sh:violationMessageTemplate "Count must be < {?maxCount}";

would produce zero, one or two validation messages.

Richard




> 
> Holger
> 
> 
> On 3/19/2015 23:36, Richard Cyganiak wrote:
>> Here’s some input that might be relevant to the discussion about human-readable validation reports.
>> 
>> This is part of a validator for R2RML documents, implemented in Java. R2RML documents are often authored by hand in Turtle. Therefore, to be helpful, a validator must provide concrete, detailed and specific error messages. I think that most of what this particular validator does, should be doable in SHACL.
>> 
>> The following file contains all the error messages that the validator can generate:
>> 
>> https://github.com/d2rq/d2rq/blob/develop/src/org/d2rq/validation/Message.java
>> 
>> Some of these are “high-level” stuff, where one could imagine the error message being hardcoded in the SHACL processor, for example:
>> 
>>     DUPLICATE_VALUE(Level.Error,
>>         "Duplicate value for {property}",
>>         "The resource {resource} has multiple values for {property} ({objects}); only one is allowed.”),
>> 
>> But most are specific to R2RML, and the human-readable error message would have to be supplied in the SHACL document for R2RML, for example:
>> 
>>     INVALID_COLUMN_NAME(Level.Error, "Malformed column name {string}",
>>         "Malformed column name {string} in property {property} of {resource}: {details}.”),
>> 
>>     ONLY_ALLOWED_IF_TERM_TYPE_LITERAL(Level.Error,
>>         "{property} not allowed for this term type”,
>>         "The property {property} is only allowed on term maps that generate literals, but the rr:termType of {resource} is not rr:Literal.”),
>> 
>>     POSSIBLE_UNSAFE_SEPARATOR_IN_IRI_TEMPLATE(Level.Warning,
>>         "Possible unsafe separator in IRI template”,
>>         "Column references in the {property} of {resource} ({object}) are separated by a possibly unsafe delimiter. It is recommended that IRI sub-delim characters are used to delimit column references in IRI templates."),
>> 
>> For reference, each validation message has the following fields:
>> 
>>     Level level;
>>     String messageTitle;
>>     String messageTemplate;
>>     Resource subject;
>>     List<Property> predicates;
>>     List<RDFNode> objects;
>>     String detailCode;
>>     String details;
>> 
>> Best,
>> Richard
> 
> 

Received on Friday, 20 March 2015 12:01:46 UTC