Shapes/ShEx or the worrying issue of yet another syntax and lack of validated vision. from Jerven Tjalling Bolleman on 2014-07-15 (public-rdf-shapes@w3.org from July 2014)

From: Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch>
Date: Tue, 15 Jul 2014 17:35:25 +0200
To: public-rdf-shapes@w3.org
Message-ID: <53C54A3D.6080604@isb-sib.ch>
Dear All,

Let me apologize in advance for the rude tone of this e-mail.

I am looking at the current work/direction of the work-group and am 
really worried.

Issues

First off all you decided not to focus on the problem of validating data 
in RDF but on a solution called shapes. I think you need to go back and 
collect what validation should do first instead of what the solution 
looks like. Because I don't think ShEx/Shapes does enough.

Secondly I have the feeling that the work-group is confounding the issue 
of syntax and user interfaces as well as ignoring a lot of engineering 
effort out there in the world.

Concerns

My current concerns are mostly based on this document 
http://www.w3.org/2013/ShEx/Primer.

Concern 1.

First of all its yet another syntax with strange variations on turtle.

<IssueShape> {
     ex:state (ex:unassigned ex:assigned),
     ex:reportedBy @<UserShape>,
     ex:reportedOn xsd:dateTime,
     ( ex:reproducedBy @<EmployeeShape>,
       ex:reproducedOn xsd:dateTime      )?,
     ex:related @<IssueShape>*
}

Why the brackets and @ for some kind of pointers? why not make nice and 
simple turtle and do this?


:IssueShape
     ex:state (ex:unassigned ex:assigned) ;
     ex:reportedBy [ a :UserShape ] ;
     ex:reportedOn xsd:dateTime ;
     ( ex:reproducedBy :EmployeeShape ;
       ex:reproducedOn xsd:dateTime  )?,
     ex:related :IssueShape

Ok we still have the strange '?' and a collection with meaning different 
to turtle, let me come to that

Now change that to

:IssueShape
     shex:oneOf ( [] ex:state ex:unassigned .
                  [] ex:state ex:assigned ) ;
     ex:reportedBy [ a :UserShape ] ;
     ex:reportedOn xsd:dateTime ;
     shex:eitherNoneOforAllOf [  ex:reproducedBy :EmployeeShape ;
                          ex:reproducedOn xsd:dateTime  ] ,
     ex:related :IssueShape

Now is that completely different in readability?
No its not.
Did you gain a lot of usability by yet another syntax?
No you didn't.
Will you make life difficult for everyone using it because you have yet 
another syntax?
Yes you did.
Did your syntax make life a lot easier for users?
No, because its yet another syntax to learn.

Aside:
   Did you notice that your use of the question mark is not consistent
   with any other commonly used syntax e.g. regex, globs, trinary logic
   etc.. For sure leading to a lot of confusion.

Concern 2.

The second issue is that because the work-group seems to have confounded 
User Interface with constraints interchange. They have forgotten where 
all the engineering and much of the training effort has gone in the last 
few years. Why is SPARQL 1.1. not the majority of the solution? Why are 
you not building on OWL where it is needed.

The ShEx already shows that you can't solve the problems because you are 
punting to other languages including SPARQL. Meaning that your users 
still need to use SPARQL anyway! A major issue IMHO.

Concern 3.

Shapes is not enough for real world data validation. I have worked for a 
while on dutch healthcare systems and had to deal with the fact that 
data in the database could be incorrect and data that is provided might 
be correct and we need to have humans in the loop to figure out the 
truth of it e.g. two people with the same citizen service number (BSN) 
(due to typo or fraud). ShEx can tell us that we have an issue but it 
can't generate a work item.

A thing that for example SPIN can do. Because SPIN is not just a 
constraint language but also a inference language. (e.g. I can infer 
that a manual data intervention is required given two people with the 
same BSN). OWL can do similar things.

Concern 4.

Because data and rules do not have the same syntax or model it is 
difficult to write rules about your rules. Something that is trivial in 
SPIN and really helps rule maintenance. e.g. checking that all 
predicates mentioned in your rules are present in a limited set of 
ontologies is easy in SPIN. Its hard in ShEx because your model is not 
quite simple to translate to RDF.

Concern 5.

As you disregard SPARQL you disregard SERVICE calls. This means I can't 
easily have validation using data in multiple systems. Looking at data 
as files you process in isolation you have lost a lot of power. As well
as an easy way to extend the capabilities of the system in standard 
compliant ways (e.g. using a SADI service to compute values needed in 
your validation on the fly)


Conclusions.

ShEx -> SPARQL is fine, places ShEx as a UI not as a interchange standard.
ShEx -> is not powerful enough to do more than simple validation.
ShEx -> Should not invent yet another syntax. ShEx should be modeled in 
RDF and use existing syntaxes.
Workgroup -> you to quickly discarded the two widely adopted solutions 
in industry SPIN (SPARQL) and OWL closed worlds on the outcome of a 
single workshop.
Workgroup -> you don't have a good goal document to states what 
validation needs to do.

I hope you will seriously reconsider your chosen direction because it is 
breaking the first rule of a good standard -> depend on other existing 
standards.

Regards,
Jerven Bolleman
Received on Tuesday, 15 July 2014 15:35:57 UTC