- From: Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch>
- Date: Tue, 15 Jul 2014 17:35:25 +0200
- To: public-rdf-shapes@w3.org
Dear All, Let me apologize in advance for the rude tone of this e-mail. I am looking at the current work/direction of the work-group and am really worried. Issues First off all you decided not to focus on the problem of validating data in RDF but on a solution called shapes. I think you need to go back and collect what validation should do first instead of what the solution looks like. Because I don't think ShEx/Shapes does enough. Secondly I have the feeling that the work-group is confounding the issue of syntax and user interfaces as well as ignoring a lot of engineering effort out there in the world. Concerns My current concerns are mostly based on this document http://www.w3.org/2013/ShEx/Primer. Concern 1. First of all its yet another syntax with strange variations on turtle. <IssueShape> { ex:state (ex:unassigned ex:assigned), ex:reportedBy @<UserShape>, ex:reportedOn xsd:dateTime, ( ex:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime )?, ex:related @<IssueShape>* } Why the brackets and @ for some kind of pointers? why not make nice and simple turtle and do this? :IssueShape ex:state (ex:unassigned ex:assigned) ; ex:reportedBy [ a :UserShape ] ; ex:reportedOn xsd:dateTime ; ( ex:reproducedBy :EmployeeShape ; ex:reproducedOn xsd:dateTime )?, ex:related :IssueShape Ok we still have the strange '?' and a collection with meaning different to turtle, let me come to that Now change that to :IssueShape shex:oneOf ( [] ex:state ex:unassigned . [] ex:state ex:assigned ) ; ex:reportedBy [ a :UserShape ] ; ex:reportedOn xsd:dateTime ; shex:eitherNoneOforAllOf [ ex:reproducedBy :EmployeeShape ; ex:reproducedOn xsd:dateTime ] , ex:related :IssueShape Now is that completely different in readability? No its not. Did you gain a lot of usability by yet another syntax? No you didn't. Will you make life difficult for everyone using it because you have yet another syntax? Yes you did. Did your syntax make life a lot easier for users? No, because its yet another syntax to learn. Aside: Did you notice that your use of the question mark is not consistent with any other commonly used syntax e.g. regex, globs, trinary logic etc.. For sure leading to a lot of confusion. Concern 2. The second issue is that because the work-group seems to have confounded User Interface with constraints interchange. They have forgotten where all the engineering and much of the training effort has gone in the last few years. Why is SPARQL 1.1. not the majority of the solution? Why are you not building on OWL where it is needed. The ShEx already shows that you can't solve the problems because you are punting to other languages including SPARQL. Meaning that your users still need to use SPARQL anyway! A major issue IMHO. Concern 3. Shapes is not enough for real world data validation. I have worked for a while on dutch healthcare systems and had to deal with the fact that data in the database could be incorrect and data that is provided might be correct and we need to have humans in the loop to figure out the truth of it e.g. two people with the same citizen service number (BSN) (due to typo or fraud). ShEx can tell us that we have an issue but it can't generate a work item. A thing that for example SPIN can do. Because SPIN is not just a constraint language but also a inference language. (e.g. I can infer that a manual data intervention is required given two people with the same BSN). OWL can do similar things. Concern 4. Because data and rules do not have the same syntax or model it is difficult to write rules about your rules. Something that is trivial in SPIN and really helps rule maintenance. e.g. checking that all predicates mentioned in your rules are present in a limited set of ontologies is easy in SPIN. Its hard in ShEx because your model is not quite simple to translate to RDF. Concern 5. As you disregard SPARQL you disregard SERVICE calls. This means I can't easily have validation using data in multiple systems. Looking at data as files you process in isolation you have lost a lot of power. As well as an easy way to extend the capabilities of the system in standard compliant ways (e.g. using a SADI service to compute values needed in your validation on the fly) Conclusions. ShEx -> SPARQL is fine, places ShEx as a UI not as a interchange standard. ShEx -> is not powerful enough to do more than simple validation. ShEx -> Should not invent yet another syntax. ShEx should be modeled in RDF and use existing syntaxes. Workgroup -> you to quickly discarded the two widely adopted solutions in industry SPIN (SPARQL) and OWL closed worlds on the outcome of a single workshop. Workgroup -> you don't have a good goal document to states what validation needs to do. I hope you will seriously reconsider your chosen direction because it is breaking the first rule of a good standard -> depend on other existing standards. Regards, Jerven Bolleman
Received on Tuesday, 15 July 2014 15:35:57 UTC