Re: Shapes/ShEx or the worrying issue of yet another syntax and lack of validated vision. from Sandro Hawke on 2014-07-15 (public-rdf-shapes@w3.org from July 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 15 Jul 2014 12:29:50 -0400
To: Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch>, public-rdf-shapes@w3.org
Message-ID: <53C556FE.4000605@w3.org>
On 07/15/2014 11:35 AM, Jerven Tjalling Bolleman wrote:
> Dear All,
>
> Let me apologize in advance for the rude tone of this e-mail.
>

:-)    I for one appreciate your passion on this.     I'm not the right 
person to reply to all of this, but let me just make a few points.

> I am looking at the current work/direction of the work-group and am 
> really worried.
>
> Issues
>
> First off all you decided not to focus on the problem of validating 
> data in RDF but on a solution called shapes. I think you need to go 
> back and collect what validation should do first instead of what the 
> solution looks like. Because I don't think ShEx/Shapes does enough.
>
> Secondly I have the feeling that the work-group is confounding the 
> issue of syntax and user interfaces as well as ignoring a lot of 
> engineering effort out there in the world.
>
> Concerns
>
> My current concerns are mostly based on this document 
> http://www.w3.org/2013/ShEx/Primer.
>
> Concern 1.
>
> First of all its yet another syntax with strange variations on turtle.
>
> <IssueShape> {
>     ex:state (ex:unassigned ex:assigned),
>     ex:reportedBy @<UserShape>,
>     ex:reportedOn xsd:dateTime,
>     ( ex:reproducedBy @<EmployeeShape>,
>       ex:reproducedOn xsd:dateTime      )?,
>     ex:related @<IssueShape>*
> }
>
> Why the brackets and @ for some kind of pointers? why not make nice 
> and simple turtle and do this?
>
>
> :IssueShape
>     ex:state (ex:unassigned ex:assigned) ;

Here you've adopted the Turtle syntax, but taken up a different 
semantics, which is a pretty big problem.  For example, if the 
rdfs:domain of ex:state is ex:Issue, you've now forced :IssueShape to be 
an instance of class ex:Issue, which is surely not correct.

I agree there should be an RDF representation for data shapes, but I 
think it has to be somewhat more complicated than the example you 
provide, so the need for a syntax like ShEx is somewhat greater than you 
suggest.

>
>     ex:reportedBy [ a :UserShape ] ;
>     ex:reportedOn xsd:dateTime ;
>     ( ex:reproducedBy :EmployeeShape ;
>       ex:reproducedOn xsd:dateTime  )?,
>     ex:related :IssueShape
>
> Ok we still have the strange '?' and a collection with meaning 
> different to turtle, let me come to that
>
> Now change that to
>
> :IssueShape
>     shex:oneOf ( [] ex:state ex:unassigned .
>                  [] ex:state ex:assigned ) ;
>     ex:reportedBy [ a :UserShape ] ;
>     ex:reportedOn xsd:dateTime ;
>     shex:eitherNoneOforAllOf [  ex:reproducedBy :EmployeeShape ;
>                          ex:reproducedOn xsd:dateTime  ] ,
>     ex:related :IssueShape
>
> Now is that completely different in readability?
> No its not.
> Did you gain a lot of usability by yet another syntax?
> No you didn't.
> Will you make life difficult for everyone using it because you have 
> yet another syntax?
> Yes you did.
> Did your syntax make life a lot easier for users?
> No, because its yet another syntax to learn.
>
> Aside:
>   Did you notice that your use of the question mark is not consistent
>   with any other commonly used syntax e.g. regex, globs, trinary logic
>   etc.. For sure leading to a lot of confusion.
>
> Concern 2.
>
> The second issue is that because the work-group seems

Minor detail, what you're responding to is a proposed Working Group 
Charter.  That charter is being drafted mostly by W3C staff member Eric 
Prud'hommeaux, although I've contributed some text, as have others on 
and off the W3C staff, based on input from the community, mostly via 
last year's Workshop and discussion on this list.

At some point, if this goes forward, there will be a Working Group which 
can have opinions, be confused, etc, but for now the thing to be trying 
to correct is the Charter (and maybe the people editing the charter).

> to have confounded User Interface with constraints interchange. They 
> have forgotten where all the engineering and much of the training 
> effort has gone in the last few years. Why is SPARQL 1.1. not the 
> majority of the solution? Why are you not building on OWL where it is 
> needed.
>
> The ShEx already shows that you can't solve the problems because you 
> are punting to other languages including SPARQL. Meaning that your 
> users still need to use SPARQL anyway! A major issue IMHO.
>
> Concern 3.
>
> Shapes is not enough for real world data validation. I have worked for 
> a while on dutch healthcare systems and had to deal with the fact that 
> data in the database could be incorrect and data that is provided 
> might be correct and we need to have humans in the loop to figure out 
> the truth of it e.g. two people with the same citizen service number 
> (BSN) (due to typo or fraud). ShEx can tell us that we have an issue 
> but it can't generate a work item.
>
> A thing that for example SPIN can do. Because SPIN is not just a 
> constraint language but also a inference language. (e.g. I can infer 
> that a manual data intervention is required given two people with the 
> same BSN). OWL can do similar things.
>
> Concern 4.
>
> Because data and rules do not have the same syntax or model it is 
> difficult to write rules about your rules. Something that is trivial 
> in SPIN and really helps rule maintenance. e.g. checking that all 
> predicates mentioned in your rules are present in a limited set of 
> ontologies is easy in SPIN. Its hard in ShEx because your model is not 
> quite simple to translate to RDF.
>
> Concern 5.
>
> As you disregard SPARQL you disregard SERVICE calls. This means I 
> can't easily have validation using data in multiple systems. Looking 
> at data as files you process in isolation you have lost a lot of 
> power. As well
> as an easy way to extend the capabilities of the system in standard 
> compliant ways (e.g. using a SADI service to compute values needed in 
> your validation on the fly)
>
>
> Conclusions.
>
> ShEx -> SPARQL is fine, places ShEx as a UI not as a interchange 
> standard.
> ShEx -> is not powerful enough to do more than simple validation.

I personally happen to favor less expressive (and thus simpler and more 
efficient) solutions in this space.  I'm not trying to convince you, but 
please understand that as often seen in computer science, expressivity 
is not an unmitigated good, but rather there is a tradeoff.   There's a 
lot to be said for simple validation.

>
> ShEx -> Should not invent yet another syntax. ShEx should be modeled 
> in RDF and use existing syntaxes.

So you're not okay with even a syntactic sugar language, what's called a 
"compact syntax"
  in the charter?  Something like OWL's Functional Syntax or Manchester 
Syntax?

Why was it okay to invent a SPARQL syntax?   We could have expressed 
queries in RDF itself.

(You see that kind of thing in SPIN, and also in RIF-in-RDF.)

>
> Workgroup -> you to quickly discarded the two widely adopted solutions 
> in industry SPIN (SPARQL) and OWL closed worlds on the outcome of a 
> single workshop.

You say "the outcome of a single workshop" as if that were a small 
thing.    The workshop was widely advertised for months as the time and 
the place for people who cared about this subject to come forward and be 
a part of figuring this out.    And many people did.

Still, the outcome of that workshop is not intended to limit so much as 
focus.   It showed a direction to go; it didn't pick particular 
technologies or solutions.      That's for the anticipated Working Group 
to do.

The question now is what guidance is to be given to that Working 
Group.   On this, I mostly hear you saying that the Working Group should 
define an RDF syntax for shapes, and have a more expressive solution 
than ShEx.

On the first of these, the heart of the charter is this line:

    Syntax and semantics *RDF Data Shapes Language*: W3C
    Recommendation(s) defining the language semantics, an RDF syntax and
    a compact syntax as described in Scope
    <http://www.w3.org/2014/rds/charter#scope>.

which clearly says "an RDF syntax", which I think is what you want.

On the expressivity question, it seems clear to me that's left to the WG 
to decide.

> Workgroup -> you don't have a good goal document to states what 
> validation needs to do.
>

That's for the WG to write, the "Use Case and Requirements" document.

> I hope you will seriously reconsider your chosen direction because it 
> is breaking the first rule of a good standard -> depend on other 
> existing standards.
>

Honestly, I'd say the first rule of standards is to make something that 
people will want to use.  :-)

       -- Sandro (W3C staff, but not assigned to this WG)

> Regards,
> Jerven Bolleman
>
>
>
>
>
>
Received on Tuesday, 15 July 2014 16:29:52 UTC