- From: Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch>
- Date: Tue, 15 Jul 2014 17:35:25 +0200
- To: public-rdf-shapes@w3.org
Dear All,
Let me apologize in advance for the rude tone of this e-mail.
I am looking at the current work/direction of the work-group and am
really worried.
Issues
First off all you decided not to focus on the problem of validating data
in RDF but on a solution called shapes. I think you need to go back and
collect what validation should do first instead of what the solution
looks like. Because I don't think ShEx/Shapes does enough.
Secondly I have the feeling that the work-group is confounding the issue
of syntax and user interfaces as well as ignoring a lot of engineering
effort out there in the world.
Concerns
My current concerns are mostly based on this document
http://www.w3.org/2013/ShEx/Primer.
Concern 1.
First of all its yet another syntax with strange variations on turtle.
<IssueShape> {
ex:state (ex:unassigned ex:assigned),
ex:reportedBy @<UserShape>,
ex:reportedOn xsd:dateTime,
( ex:reproducedBy @<EmployeeShape>,
ex:reproducedOn xsd:dateTime )?,
ex:related @<IssueShape>*
}
Why the brackets and @ for some kind of pointers? why not make nice and
simple turtle and do this?
:IssueShape
ex:state (ex:unassigned ex:assigned) ;
ex:reportedBy [ a :UserShape ] ;
ex:reportedOn xsd:dateTime ;
( ex:reproducedBy :EmployeeShape ;
ex:reproducedOn xsd:dateTime )?,
ex:related :IssueShape
Ok we still have the strange '?' and a collection with meaning different
to turtle, let me come to that
Now change that to
:IssueShape
shex:oneOf ( [] ex:state ex:unassigned .
[] ex:state ex:assigned ) ;
ex:reportedBy [ a :UserShape ] ;
ex:reportedOn xsd:dateTime ;
shex:eitherNoneOforAllOf [ ex:reproducedBy :EmployeeShape ;
ex:reproducedOn xsd:dateTime ] ,
ex:related :IssueShape
Now is that completely different in readability?
No its not.
Did you gain a lot of usability by yet another syntax?
No you didn't.
Will you make life difficult for everyone using it because you have yet
another syntax?
Yes you did.
Did your syntax make life a lot easier for users?
No, because its yet another syntax to learn.
Aside:
Did you notice that your use of the question mark is not consistent
with any other commonly used syntax e.g. regex, globs, trinary logic
etc.. For sure leading to a lot of confusion.
Concern 2.
The second issue is that because the work-group seems to have confounded
User Interface with constraints interchange. They have forgotten where
all the engineering and much of the training effort has gone in the last
few years. Why is SPARQL 1.1. not the majority of the solution? Why are
you not building on OWL where it is needed.
The ShEx already shows that you can't solve the problems because you are
punting to other languages including SPARQL. Meaning that your users
still need to use SPARQL anyway! A major issue IMHO.
Concern 3.
Shapes is not enough for real world data validation. I have worked for a
while on dutch healthcare systems and had to deal with the fact that
data in the database could be incorrect and data that is provided might
be correct and we need to have humans in the loop to figure out the
truth of it e.g. two people with the same citizen service number (BSN)
(due to typo or fraud). ShEx can tell us that we have an issue but it
can't generate a work item.
A thing that for example SPIN can do. Because SPIN is not just a
constraint language but also a inference language. (e.g. I can infer
that a manual data intervention is required given two people with the
same BSN). OWL can do similar things.
Concern 4.
Because data and rules do not have the same syntax or model it is
difficult to write rules about your rules. Something that is trivial in
SPIN and really helps rule maintenance. e.g. checking that all
predicates mentioned in your rules are present in a limited set of
ontologies is easy in SPIN. Its hard in ShEx because your model is not
quite simple to translate to RDF.
Concern 5.
As you disregard SPARQL you disregard SERVICE calls. This means I can't
easily have validation using data in multiple systems. Looking at data
as files you process in isolation you have lost a lot of power. As well
as an easy way to extend the capabilities of the system in standard
compliant ways (e.g. using a SADI service to compute values needed in
your validation on the fly)
Conclusions.
ShEx -> SPARQL is fine, places ShEx as a UI not as a interchange standard.
ShEx -> is not powerful enough to do more than simple validation.
ShEx -> Should not invent yet another syntax. ShEx should be modeled in
RDF and use existing syntaxes.
Workgroup -> you to quickly discarded the two widely adopted solutions
in industry SPIN (SPARQL) and OWL closed worlds on the outcome of a
single workshop.
Workgroup -> you don't have a good goal document to states what
validation needs to do.
I hope you will seriously reconsider your chosen direction because it is
breaking the first rule of a good standard -> depend on other existing
standards.
Regards,
Jerven Bolleman
Received on Tuesday, 15 July 2014 15:35:57 UTC