Re: Shapes/ShEx or the worrying issue of yet another syntax and lack of validated vision.

Paul,
On 15 Jul 2014, at 19:48, Paul Hermans <paul@proxml.be> wrote:

> Jerven,
> 
> >the two widely adopted solutions 
> >in industry SPIN (SPARQL) and OWL closed worlds
> 
> Do you have facts on this?
> How many users are we talking?
Maybe widely adopted is over the top. 
But when I am not talking to people in academia but industrial users for example
at Oracle Open World about 8/10 of the people coming to the RDF/SemWeb meetup there used topbraid/spin
in one way or another.

The other commercial supported option has recently been stardog ICV which in my grape vine seems to have reasonable pickup.

Up to very recently I was not aware of any other deployed solutions other than home grown ones.
Data integration being the key market, and most validation cleaning being done by scripts and SPARQL queries.

But I don’t do market research so take it rightly with a grain of salt.

Regards,
Jerven
> 
> 
> Paul
> 
> 
> Kind Regards,
> Paul Hermans
> 
> -------------------------
> 
> OpenCube – Linked Open Statistical Data - http://opencube-project.eu/
> 
> 
> 
>> On Jul 15, 2014, at 5:35 PM, Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch> wrote:
>> 
>> Dear All,
>> 
>> Let me apologize in advance for the rude tone of this e-mail.
>> 
>> I am looking at the current work/direction of the work-group and am
>> really worried.
>> 
>> Issues
>> 
>> First off all you decided not to focus on the problem of validating data
>> in RDF but on a solution called shapes. I think you need to go back and
>> collect what validation should do first instead of what the solution
>> looks like. Because I don't think ShEx/Shapes does enough.
>> 
>> Secondly I have the feeling that the work-group is confounding the issue
>> of syntax and user interfaces as well as ignoring a lot of engineering
>> effort out there in the world.
>> 
>> Concerns
>> 
>> My current concerns are mostly based on this document
>> http://www.w3.org/2013/ShEx/Primer.
>> 
>> Concern 1.
>> 
>> First of all its yet another syntax with strange variations on turtle.
>> 
>> <IssueShape> {
>> ex:state (ex:unassigned ex:assigned),
>> ex:reportedBy @<UserShape>,
>> ex:reportedOn xsd:dateTime,
>> ( ex:reproducedBy @<EmployeeShape>,
>> ex:reproducedOn xsd:dateTime )?,
>> ex:related @<IssueShape>*
>> }
>> 
>> Why the brackets and @ for some kind of pointers? why not make nice and
>> simple turtle and do this?
>> 
>> 
>> :IssueShape
>> ex:state (ex:unassigned ex:assigned) ;
>> ex:reportedBy [ a :UserShape ] ;
>> ex:reportedOn xsd:dateTime ;
>> ( ex:reproducedBy :EmployeeShape ;
>> ex:reproducedOn xsd:dateTime )?,
>> ex:related :IssueShape
>> 
>> Ok we still have the strange '?' and a collection with meaning different
>> to turtle, let me come to that
>> 
>> Now change that to
>> 
>> :IssueShape
>> shex:oneOf ( [] ex:state ex:unassigned .
>> [] ex:state ex:assigned ) ;
>> ex:reportedBy [ a :UserShape ] ;
>> ex:reportedOn xsd:dateTime ;
>> shex:eitherNoneOforAllOf [ ex:reproducedBy :EmployeeShape ;
>> ex:reproducedOn xsd:dateTime ] ,
>> ex:related :IssueShape
>> 
>> Now is that completely different in readability?
>> No its not.
>> Did you gain a lot of usability by yet another syntax?
>> No you didn't.
>> Will you make life difficult for everyone using it because you have yet
>> another syntax?
>> Yes you did.
>> Did your syntax make life a lot easier for users?
>> No, because its yet another syntax to learn.
>> 
>> Aside:
>> Did you notice that your use of the question mark is not consistent
>> with any other commonly used syntax e.g. regex, globs, trinary logic
>> etc.. For sure leading to a lot of confusion.
>> 
>> Concern 2.
>> 
>> The second issue is that because the work-group seems to have confounded
>> User Interface with constraints interchange. They have forgotten where
>> all the engineering and much of the training effort has gone in the last
>> few years. Why is SPARQL 1.1. not the majority of the solution? Why are
>> you not building on OWL where it is needed.
>> 
>> The ShEx already shows that you can't solve the problems because you are
>> punting to other languages including SPARQL. Meaning that your users
>> still need to use SPARQL anyway! A major issue IMHO.
>> 
>> Concern 3.
>> 
>> Shapes is not enough for real world data validation. I have worked for a
>> while on dutch healthcare systems and had to deal with the fact that
>> data in the database could be incorrect and data that is provided might
>> be correct and we need to have humans in the loop to figure out the
>> truth of it e.g. two people with the same citizen service number (BSN)
>> (due to typo or fraud). ShEx can tell us that we have an issue but it
>> can't generate a work item.
>> 
>> A thing that for example SPIN can do. Because SPIN is not just a
>> constraint language but also a inference language. (e.g. I can infer
>> that a manual data intervention is required given two people with the
>> same BSN). OWL can do similar things.
>> 
>> Concern 4.
>> 
>> Because data and rules do not have the same syntax or model it is
>> difficult to write rules about your rules. Something that is trivial in
>> SPIN and really helps rule maintenance. e.g. checking that all
>> predicates mentioned in your rules are present in a limited set of
>> ontologies is easy in SPIN. Its hard in ShEx because your model is not
>> quite simple to translate to RDF.
>> 
>> Concern 5.
>> 
>> As you disregard SPARQL you disregard SERVICE calls. This means I can't
>> easily have validation using data in multiple systems. Looking at data
>> as files you process in isolation you have lost a lot of power. As well
>> as an easy way to extend the capabilities of the system in standard
>> compliant ways (e.g. using a SADI service to compute values needed in
>> your validation on the fly)
>> 
>> 
>> Conclusions.
>> 
>> ShEx -> SPARQL is fine, places ShEx as a UI not as a interchange standard.
>> ShEx -> is not powerful enough to do more than simple validation.
>> ShEx -> Should not invent yet another syntax. ShEx should be modeled in
>> RDF and use existing syntaxes.
>> Workgroup -> you to quickly discarded the two widely adopted solutions
>> in industry SPIN (SPARQL) and OWL closed worlds on the outcome of a
>> single workshop.
>> Workgroup -> you don't have a good goal document to states what
>> validation needs to do.
>> 
>> I hope you will seriously reconsider your chosen direction because it is
>> breaking the first rule of a good standard -> depend on other existing
>> standards.
>> 
>> Regards,
>> Jerven Bolleman
>> 
>> 
>> 
>> 
>> 

-------------------------------------------------------------------
Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.isb-sib.ch - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------

Received on Tuesday, 15 July 2014 19:36:57 UTC