Re: Shapes/ShEx or the worrying issue of yet another syntax and lack of validated vision. from Evren Sirin on 2014-07-20 (public-rdf-shapes@w3.org from July 2014)

From: Evren Sirin <evren@clarkparsia.com>
Date: Sat, 19 Jul 2014 22:55:19 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: Kendall Clark <kendall@clarkparsia.com>, Jerven Bolleman <jerven.bolleman@isb-sib.ch>, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>, Jose Emilio Labra Gayo <jelabra@gmail.com>, "Dam, Jesse van" <jesse.vandam@wur.nl>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <CAFC3-Qru-diFmVBVBvbMJ=A4CN5hGXsiBF6fQZrGHY0EAE6Upw@mail.gmail.com>
What I said at the workshop is what is written in our position paper.
I said we are not obsessed about the syntax of constraints and there
can even be multiple different syntaxes for the representation of
constraints. This does not necessarily mean we should come up with a
new syntax where there are three different deployed solutions (Stardog
ICV, IBM Resource Shapes, TopQuadrant SPIN). Note that, the OWL
constraints implemented in Stardog have the benefit of being already
supported by existing tools, they are directly representable in RDF,
and there is a concise human-friendly representation (Manchester
syntax). We have many examples showing example constraints in RDF and
Manchester syntax [1].

At the workshop, I focused on other points that we think are more
important: expressivity and semantics. We think the expressivity of
constraints should be equivalent to SPARQL and the semantics should be
defined via translation to SPARQL. Defining semantics in terms of
SPARQL solves the issue of how reasoning interacts with constraints
since there are SPARQL entailment regimes for RDFS, OWL 2 and  RIF.
The semantics of Stardog ICV is given in terms of a model theory [2]
but it can alternatively be described via SPARQL translation and that
is how our implementation works.

I must also emphasize that having the ability to translate from an
arbitrary syntax to SPARQL is not enough by itself. As an example, one
common feature in all three of the solutions mentioned above is the
ability to associate constraints/shapes with an existing type. If I'd
like to define a constraint that should be satisfied by all instances
of Person type, I can do it with any of these systems:

[ICV] ex:Person rdfs:subClassOf ...
[ResSh] ex:PersonShape oslc:describes ex:Person ; ...
[SPIN] ex:Person spin:constraint "ASK {...}"

In each system a SPARQL query would be generated and every Person
instance would be validated using this query. With ShEx, the problem
is kind of reversed and one tries to find the resources that match a
shape. So I can define a PersonShape in ShEx and a SPARQL query is
generated but the query is used in a completely different way. As a
result, every Person instance might not satisfy that shape (a Person
instance can satisfy a different, irrelevant shape and would be
considered valid).

As a summary, ignoring the existing solutions that have been in use
for quite some time and starting from scratch with a new syntax and
completely new semantics is not the right way to go.

Best,
Evren

[1] http://docs.stardog.com/icv/#sd-ICV-Examples
[2] http://docs.stardog.com/icv/icv-specification.html

On Fri, Jul 18, 2014 at 6:39 PM, Sandro Hawke <sandro@w3.org> wrote:
> On 07/18/2014 06:00 PM, Kendall Clark wrote:
>
> Why take out all of them instead of removing the one that's immature?  Near
> as I can tell ShEx is less than a year old. Does W3 Team really think it
> should be promoted in place of something like SPIN or ICV, which are 5 or 6
> years old? That's indefensible.
>
>
> As I recall, there was consensus at the RDF Validation Workshop against
> using either SPIN or ICV.   My memory is nowhere near perfect, but I
> remember this pretty clearly, since both results surprised me.   I assumed
> Evrin would try to convince people of the merits of ICV and would object to
> any other solution, but he didn't.  I assumed lots of people would like
> SPARQL for validation, since it's already widely deployed.  Instead, there
> was agreement that SPARQL-like syntaxes are not suitable for the use cases
> people in the room cared about.
>
> I expect these points of consensus, and the the requirements that drove
> them, are what motivated the creation of ShEx.
>
> And that's why the Charter was developed as it was, steering away from SPIN
> and ICV.
>
> What I'm hearing now is that for whatever reasons, the Workshop was
> surprisingly non-representative of the industry, or perhaps was run in a way
> which corrupted the signal.   Maybe several of us somehow misunderstood what
> Evrin was saying, or maybe he misunderstood the question being asked.  Maybe
> the SPARQL question was framed incorrectly when discussed.  Maybe the wrong
> people were at the Workshop.    Fortunately, it's not too late to change
> course.
>
> So, with that in mind, would it work to just take out the mentions of
> specific technologies/solutions from the charter?
>
>      -- Sandro
>
>
>
>
> Cheers,
> Kendall
>
> On Friday, July 18, 2014, Sandro Hawke <sandro@w3.org> wrote:
>>
>> On 07/18/2014 04:40 PM, Jerven Bolleman wrote:
>>
>> I completely agree with Kendall.
>>
>> A standard would look at the similarities between Resource Shapes, ICV and
>> SPIN and see if a common syntax can be achieved.
>> What seems to be happening instead is that a 4th independent option is
>> being designed.
>> Which means that the real standard will then need to look into
>> standardising Shex, Resource Shapes, ICV and SPIN.
>> Giving standard number 5, which is how WG’s become inspiration for XKCD
>> and Dilbert comics…
>>
>> ShEX currently reuses practically nothing of the earlier work or existing
>> W3C standards.
>>
>> And a lot is being said about usability but no one recalls the sad joke.
>>
>>    Some people, when confronted with a problem, think
>>    “I know, I'll use regular expressions.”   Now they have two problems.
>>
>> ASCII art is not a requirement any more.
>> Saving bits is a goal of compression algorithms.
>> Code should strive for readability, especially validation code.
>>
>> E.g. this SPARQL pseudo style of using
>> { [] foaf:name xsd:string }
>> XOR
>> { [] foaf:givenName xsd:string }
>>
>> Is a much better idea than
>> { foaf:name xsd:string ;
>>   | foaf:givenName xsd:string }
>> Where we started using the binary OR symbol to mean XOR and that is rather
>> similar to || or the normal OR people are exposed to.
>>
>> For the rest I saw the UniProt ShEX example and it is not at all
>> representative for what a database like UniProt really needs.
>>
>> Attached to this e-mail is PDF/poster about how SPIN is actually looked at
>> in the UniProt consortium.
>>
>> All in all I really encourage the Charter writers to really look at what
>> is out there being used in the semweb world.
>> And look at standardising that instead of looking to the XML and Regex
>> planets, which we thankfully left behind.
>>
>>
>> Would it work to just take out the mentions of specific
>> technologies/solutions from the charter?
>>
>> (Note that the charter may have changed since you last read it.)
>>
>>       -- Sandro
>>
>>
>> Regards,
>> Jerven
>>
>>
>>
>>
>> On 18 Jul 2014, at 18:24, Kendall Clark <kendall@clarkparsia.com> wrote:
>>
>> On Fri, Jul 18, 2014 at 12:20 PM, Dimitris Kontokostas
>> <kontokostas@informatik.uni-leipzig.de> wrote:
>>
>>
>>
>> Instead of criticizing what ShEx can't do we should all try to see what
>> ShEx should do.
>>
>> Why? Standards bodies should be about standardizing existing systems. This
>> is one thing the W3C has consistently gotten wrong in the semantic web
>> space: too much speculative research done in the guise of standardization.
>>
>> I think we all agree that a compact human syntax (with equivalent RDF
>> representation) that covers common validations cases and SPARQL extensions
>> is something we all want.
>>
>> SPIN, IBM Resource Shapes, and Stardog ICV already provide that. You can't
>> get any more compact human syntax than, say, Manchester OWL syntax for
>> constraints: see http://docs.stardog.com/icv for many *real* examples from
>> shipping code.
>>
>> I too don't like some parts of ShEx but I think it's a good initiative to
>> bootstrap a standard.
>>
>> That isn't how standardization works best.
>>
>> I already raised some issues in the mailing list and have a few more from
>> my experience with RDFUnit - but will raise them later since the maintainers
>> are now too busy replying.
>>
>> Those are all valid, interesting points for ShEx, which is at this point
>> an interesting proof of concept or prototype of an idea. That work should be
>> carried out in an R&D context. W3C Working Groups are not R&D contexts.
>>
>> Cheers,
>> Kendall Clark
>>
>> -------------------------------------------------------------------
>> Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
>> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
>> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
>> 1211 Geneve 4,
>> Switzerland     www.isb-sib.ch - www.uniprot.org
>> Follow us at https://twitter.com/#!/uniprot
>> -------------------------------------------------------------------
>>
>>
>
Received on Sunday, 20 July 2014 02:56:07 UTC