W3C home > Mailing lists > Public > public-rdf-shapes@w3.org > July 2014

Re: Shapes/ShEx or the worrying issue of yet another syntax and lack of validated vision.

From: Paul Hermans <paul@proxml.be>
Date: 15 Jul 2014 19:48:30 +0200
To: Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch>
Cc: <public-rdf-shapes@w3.org>
Message-Id: <D60CE67C-EDE8-4D46-B282-4280F1986D5A@proxml.be>
Jerven,


>the two widely adopted solutions 
>in industry SPIN (SPARQL) and OWL closed worlds



Do you have facts on this?

How many users are we talking?




Paul




Kind Regards,
Paul Hermans


-------------------------


OpenCube – Linked Open Statistical Data - http://opencube-project.eu/
















> On Jul 15, 2014, at 5:35 PM, Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch> wrote:
> 
> 
> Dear All,
> 
> 
> Let me apologize in advance for the rude tone of this e-mail.
> 
> 
> I am looking at the current work/direction of the work-group and am
> really worried.
> 
> 
> Issues
> 
> 
> First off all you decided not to focus on the problem of validating data
> in RDF but on a solution called shapes. I think you need to go back and
> collect what validation should do first instead of what the solution
> looks like. Because I don't think ShEx/Shapes does enough.
> 
> 
> Secondly I have the feeling that the work-group is confounding the issue
> of syntax and user interfaces as well as ignoring a lot of engineering
> effort out there in the world.
> 
> 
> Concerns
> 
> 
> My current concerns are mostly based on this document
> http://www.w3.org/2013/ShEx/Primer.
> 
> 
> Concern 1.
> 
> 
> First of all its yet another syntax with strange variations on turtle.
> 
> 
> <IssueShape> {
> ex:state (ex:unassigned ex:assigned),
> ex:reportedBy @<UserShape>,
> ex:reportedOn xsd:dateTime,
> ( ex:reproducedBy @<EmployeeShape>,
> ex:reproducedOn xsd:dateTime )?,
> ex:related @<IssueShape>*
> }
> 
> 
> Why the brackets and @ for some kind of pointers? why not make nice and
> simple turtle and do this?
> 
> 
> 
> 
> :IssueShape
> ex:state (ex:unassigned ex:assigned) ;
> ex:reportedBy [ a :UserShape ] ;
> ex:reportedOn xsd:dateTime ;
> ( ex:reproducedBy :EmployeeShape ;
> ex:reproducedOn xsd:dateTime )?,
> ex:related :IssueShape
> 
> 
> Ok we still have the strange '?' and a collection with meaning different
> to turtle, let me come to that
> 
> 
> Now change that to
> 
> 
> :IssueShape
> shex:oneOf ( [] ex:state ex:unassigned .
> [] ex:state ex:assigned ) ;
> ex:reportedBy [ a :UserShape ] ;
> ex:reportedOn xsd:dateTime ;
> shex:eitherNoneOforAllOf [ ex:reproducedBy :EmployeeShape ;
> ex:reproducedOn xsd:dateTime ] ,
> ex:related :IssueShape
> 
> 
> Now is that completely different in readability?
> No its not.
> Did you gain a lot of usability by yet another syntax?
> No you didn't.
> Will you make life difficult for everyone using it because you have yet
> another syntax?
> Yes you did.
> Did your syntax make life a lot easier for users?
> No, because its yet another syntax to learn.
> 
> 
> Aside:
> Did you notice that your use of the question mark is not consistent
> with any other commonly used syntax e.g. regex, globs, trinary logic
> etc.. For sure leading to a lot of confusion.
> 
> 
> Concern 2.
> 
> 
> The second issue is that because the work-group seems to have confounded
> User Interface with constraints interchange. They have forgotten where
> all the engineering and much of the training effort has gone in the last
> few years. Why is SPARQL 1.1. not the majority of the solution? Why are
> you not building on OWL where it is needed.
> 
> 
> The ShEx already shows that you can't solve the problems because you are
> punting to other languages including SPARQL. Meaning that your users
> still need to use SPARQL anyway! A major issue IMHO.
> 
> 
> Concern 3.
> 
> 
> Shapes is not enough for real world data validation. I have worked for a
> while on dutch healthcare systems and had to deal with the fact that
> data in the database could be incorrect and data that is provided might
> be correct and we need to have humans in the loop to figure out the
> truth of it e.g. two people with the same citizen service number (BSN)
> (due to typo or fraud). ShEx can tell us that we have an issue but it
> can't generate a work item.
> 
> 
> A thing that for example SPIN can do. Because SPIN is not just a
> constraint language but also a inference language. (e.g. I can infer
> that a manual data intervention is required given two people with the
> same BSN). OWL can do similar things.
> 
> 
> Concern 4.
> 
> 
> Because data and rules do not have the same syntax or model it is
> difficult to write rules about your rules. Something that is trivial in
> SPIN and really helps rule maintenance. e.g. checking that all
> predicates mentioned in your rules are present in a limited set of
> ontologies is easy in SPIN. Its hard in ShEx because your model is not
> quite simple to translate to RDF.
> 
> 
> Concern 5.
> 
> 
> As you disregard SPARQL you disregard SERVICE calls. This means I can't
> easily have validation using data in multiple systems. Looking at data
> as files you process in isolation you have lost a lot of power. As well
> as an easy way to extend the capabilities of the system in standard
> compliant ways (e.g. using a SADI service to compute values needed in
> your validation on the fly)
> 
> 
> 
> 
> Conclusions.
> 
> 
> ShEx -> SPARQL is fine, places ShEx as a UI not as a interchange standard.
> ShEx -> is not powerful enough to do more than simple validation.
> ShEx -> Should not invent yet another syntax. ShEx should be modeled in
> RDF and use existing syntaxes.
> Workgroup -> you to quickly discarded the two widely adopted solutions
> in industry SPIN (SPARQL) and OWL closed worlds on the outcome of a
> single workshop.
> Workgroup -> you don't have a good goal document to states what
> validation needs to do.
> 
> 
> I hope you will seriously reconsider your chosen direction because it is
> breaking the first rule of a good standard -> depend on other existing
> standards.
> 
> 
> Regards,
> Jerven Bolleman
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
Received on Tuesday, 15 July 2014 17:49:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:02:39 UTC