W3C home > Mailing lists > Public > public-rdf-shapes@w3.org > July 2014

Re: Shapes/ShEx or the worrying issue of yet another syntax and lack of validated vision.

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sun, 20 Jul 2014 04:10:43 -0400
To: Evren Sirin <evren@clarkparsia.com>
Cc: Sandro Hawke <sandro@w3.org>, Kendall Clark <kendall@clarkparsia.com>, Jerven Bolleman <jerven.bolleman@isb-sib.ch>, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>, Jose Emilio Labra Gayo <jelabra@gmail.com>, "Dam, Jesse van" <jesse.vandam@wur.nl>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <20140720081041.GA30771@w3.org>
* Evren Sirin <evren@clarkparsia.com> [2014-07-19 22:55-0400]
> What I said at the workshop is what is written in our position paper.
> I said we are not obsessed about the syntax of constraints and there
> can even be multiple different syntaxes for the representation of
> constraints. This does not necessarily mean we should come up with a
> new syntax where there are three different deployed solutions (Stardog
> ICV, IBM Resource Shapes, TopQuadrant SPIN). Note that, the OWL
> constraints implemented in Stardog have the benefit of being already
> supported by existing tools, they are directly representable in RDF,
> and there is a concise human-friendly representation (Manchester
> syntax). We have many examples showing example constraints in RDF and
> Manchester syntax [1].
> 
> At the workshop, I focused on other points that we think are more
> important: expressivity and semantics. We think the expressivity of
> constraints should be equivalent to SPARQL and the semantics should be
> defined via translation to SPARQL. Defining semantics in terms of
> SPARQL solves the issue of how reasoning interacts with constraints
> since there are SPARQL entailment regimes for RDFS, OWL 2 and  RIF.
> The semantics of Stardog ICV is given in terms of a model theory [2]
> but it can alternatively be described via SPARQL translation and that
> is how our implementation works.
> 
> I must also emphasize that having the ability to translate from an
> arbitrary syntax to SPARQL is not enough by itself. As an example, one
> common feature in all three of the solutions mentioned above is the
> ability to associate constraints/shapes with an existing type. If I'd
> like to define a constraint that should be satisfied by all instances
> of Person type, I can do it with any of these systems:
> 
> [ICV] ex:Person rdfs:subClassOf ...
> [ResSh] ex:PersonShape oslc:describes ex:Person ; ...
> [SPIN] ex:Person spin:constraint "ASK {...}"
> 
> In each system a SPARQL query would be generated and every Person
> instance would be validated using this query. With ShEx, the problem
> is kind of reversed and one tries to find the resources that match a
> shape. So I can define a PersonShape in ShEx and a SPARQL query is
> generated but the query is used in a completely different way. As a
> result, every Person instance might not satisfy that shape (a Person
> instance can satisfy a different, irrelevant shape and would be
> considered valid).

One of the features of Resource Shapes is that, while it *can* be
attached to a type, it frequently is not. Arthur Ryman spoke of this
[[
constraint language should be independent of any vocabulary or
ontology
]] — <http://www.w3.org/mid/OFF14B15B5.802B33E2-ON85257D0A.004C62E1-85257D0A.005240FE@ca.ibm.com>
and emphasized it in
<http://www.w3.org/mid/OF026C08BD.7F379A54-ON85257D15.00456170-85257D15.0047EB06@ca.ibm.com>

ShEx provides a formalization of Resource Shapes' apparent semantics
and adds disjunction, groups and extensibility. The latter was a
mandate from the workshop and the former were added to support use
cases like

my:UserProfile {
  (foaf:name xsd:string
   | foaf:givenName xsd:string+,
     foaf:familyName xsd:string),
  foaf:mbox IRI
}

You can see the added functionality by looking at the "View as:
<Resource Shape>". If the schema doesn't use disjunction or groups,
you'll see Resource Shapes 2.0 (per the Member Submission).

For example, this conjunctive shape <http://tinyurl.com/ngema9c>:
[[
  PREFIX myshapes: <http://myshapes.example/#>
  PREFIX ex: <http://ex.example/#>
  PREFIX foaf: <http://foaf.example/#>
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
  
  start = myshapes:Issue
  
  myshapes:Issue {
      ex:state (ex:unassigned ex:assigned),
      ex:reportedBy @myshapes:User,
      ex:reportedOn xsd:dateTime
  }
  
  myshapes:User {
      foaf:givenName xsd:string+,
      foaf:familyName xsd:string,
      foaf:mbox IRI
  }
]]
is a representation of this resource shape:
[[
  PREFIX ex:<http://ex.example/#>
  PREFIX foaf:<http://foaf.example/#>
  PREFIX myshapes:<http://myshapes.example/#>
  PREFIX rs:<http://open-services.net/ns/core#>
  PREFIX shex:<http://www.w3.org/2013/ShEx/ns#>
  PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
  <http://myshapes.example/#Issue> a rs:ResourceShape ;
      rs:property [
          rs:name "state" ;
          rs:propertyDefinition ex:state ;
          rs:allowedValue <http://ex.example/#unassigned> , <http://ex.example/#assigned> ;
          rs:occurs rs:Exactly-one ;
      ] ;
      rs:property [
          rs:name "reportedBy" ;
          rs:propertyDefinition ex:reportedBy ;
          rs:valueShape myshapes:User ;
          rs:occurs rs:Exactly-one ;
      ] ;
      rs:property [
          rs:name "reportedOn" ;
          rs:propertyDefinition ex:reportedOn ;
          rs:valueType xsd:dateTime ;
          rs:occurs rs:Exactly-one ;
      ] ;
   .
  <http://myshapes.example/#User> a rs:ResourceShape ;
      rs:property [
          rs:name "givenName" ;
          rs:propertyDefinition foaf:givenName ;
          rs:valueType xsd:string ;
          rs:occurs rs:One-or-many ;
      ] ;
      rs:property [
          rs:name "familyName" ;
          rs:propertyDefinition foaf:familyName ;
          rs:valueType xsd:string ;
          rs:occurs rs:Exactly-one ;
      ] ;
      rs:property [
          rs:name "mbox" ;
          rs:propertyDefinition foaf:mbox ;
          rs:valueType shex:IRI ;
          rs:occurs rs:Exactly-one ;
      ] ;
   .
]]

If you use groups and optionals, the RDF reprepresentation uses
another namespace to capture those constructions, e.g.
<http://www.w3.org/2013/ShEx/FancyShExDemo?schemaURL=Examples/Issue-simple-annotated.shex&dataURL=test/Issue-pass-date.ttl>


> As a summary, ignoring the existing solutions that have been in use
> for quite some time and starting from scratch with a new syntax and
> completely new semantics is not the right way to go.

I totally agree. ShEx captures the (apparent) semantics of OSLC
Resource Shapes and Dublin Core Description Set Profiles.


> Best,
> Evren
> 
> [1] http://docs.stardog.com/icv/#sd-ICV-Examples
> [2] http://docs.stardog.com/icv/icv-specification.html
> 
> On Fri, Jul 18, 2014 at 6:39 PM, Sandro Hawke <sandro@w3.org> wrote:
> > On 07/18/2014 06:00 PM, Kendall Clark wrote:
> >
> > Why take out all of them instead of removing the one that's immature?  Near
> > as I can tell ShEx is less than a year old. Does W3 Team really think it
> > should be promoted in place of something like SPIN or ICV, which are 5 or 6
> > years old? That's indefensible.
> >
> >
> > As I recall, there was consensus at the RDF Validation Workshop against
> > using either SPIN or ICV.   My memory is nowhere near perfect, but I
> > remember this pretty clearly, since both results surprised me.   I assumed
> > Evrin would try to convince people of the merits of ICV and would object to
> > any other solution, but he didn't.  I assumed lots of people would like
> > SPARQL for validation, since it's already widely deployed.  Instead, there
> > was agreement that SPARQL-like syntaxes are not suitable for the use cases
> > people in the room cared about.
> >
> > I expect these points of consensus, and the the requirements that drove
> > them, are what motivated the creation of ShEx.
> >
> > And that's why the Charter was developed as it was, steering away from SPIN
> > and ICV.
> >
> > What I'm hearing now is that for whatever reasons, the Workshop was
> > surprisingly non-representative of the industry, or perhaps was run in a way
> > which corrupted the signal.   Maybe several of us somehow misunderstood what
> > Evrin was saying, or maybe he misunderstood the question being asked.  Maybe
> > the SPARQL question was framed incorrectly when discussed.  Maybe the wrong
> > people were at the Workshop.    Fortunately, it's not too late to change
> > course.
> >
> > So, with that in mind, would it work to just take out the mentions of
> > specific technologies/solutions from the charter?
> >
> >      -- Sandro
> >
> >
> >
> >
> > Cheers,
> > Kendall
> >
> > On Friday, July 18, 2014, Sandro Hawke <sandro@w3.org> wrote:
> >>
> >> On 07/18/2014 04:40 PM, Jerven Bolleman wrote:
> >>
> >> I completely agree with Kendall.
> >>
> >> A standard would look at the similarities between Resource Shapes, ICV and
> >> SPIN and see if a common syntax can be achieved.
> >> What seems to be happening instead is that a 4th independent option is
> >> being designed.
> >> Which means that the real standard will then need to look into
> >> standardising Shex, Resource Shapes, ICV and SPIN.
> >> Giving standard number 5, which is how WG’s become inspiration for XKCD
> >> and Dilbert comics…
> >>
> >> ShEX currently reuses practically nothing of the earlier work or existing
> >> W3C standards.
> >>
> >> And a lot is being said about usability but no one recalls the sad joke.
> >>
> >>    Some people, when confronted with a problem, think
> >>    “I know, I'll use regular expressions.”   Now they have two problems.
> >>
> >> ASCII art is not a requirement any more.
> >> Saving bits is a goal of compression algorithms.
> >> Code should strive for readability, especially validation code.
> >>
> >> E.g. this SPARQL pseudo style of using
> >> { [] foaf:name xsd:string }
> >> XOR
> >> { [] foaf:givenName xsd:string }
> >>
> >> Is a much better idea than
> >> { foaf:name xsd:string ;
> >>   | foaf:givenName xsd:string }
> >> Where we started using the binary OR symbol to mean XOR and that is rather
> >> similar to || or the normal OR people are exposed to.
> >>
> >> For the rest I saw the UniProt ShEX example and it is not at all
> >> representative for what a database like UniProt really needs.
> >>
> >> Attached to this e-mail is PDF/poster about how SPIN is actually looked at
> >> in the UniProt consortium.
> >>
> >> All in all I really encourage the Charter writers to really look at what
> >> is out there being used in the semweb world.
> >> And look at standardising that instead of looking to the XML and Regex
> >> planets, which we thankfully left behind.
> >>
> >>
> >> Would it work to just take out the mentions of specific
> >> technologies/solutions from the charter?
> >>
> >> (Note that the charter may have changed since you last read it.)
> >>
> >>       -- Sandro
> >>
> >>
> >> Regards,
> >> Jerven
> >>
> >>
> >>
> >>
> >> On 18 Jul 2014, at 18:24, Kendall Clark <kendall@clarkparsia.com> wrote:
> >>
> >> On Fri, Jul 18, 2014 at 12:20 PM, Dimitris Kontokostas
> >> <kontokostas@informatik.uni-leipzig.de> wrote:
> >>
> >>
> >>
> >> Instead of criticizing what ShEx can't do we should all try to see what
> >> ShEx should do.
> >>
> >> Why? Standards bodies should be about standardizing existing systems. This
> >> is one thing the W3C has consistently gotten wrong in the semantic web
> >> space: too much speculative research done in the guise of standardization.
> >>
> >> I think we all agree that a compact human syntax (with equivalent RDF
> >> representation) that covers common validations cases and SPARQL extensions
> >> is something we all want.
> >>
> >> SPIN, IBM Resource Shapes, and Stardog ICV already provide that. You can't
> >> get any more compact human syntax than, say, Manchester OWL syntax for
> >> constraints: see http://docs.stardog.com/icv for many *real* examples from
> >> shipping code.
> >>
> >> I too don't like some parts of ShEx but I think it's a good initiative to
> >> bootstrap a standard.
> >>
> >> That isn't how standardization works best.
> >>
> >> I already raised some issues in the mailing list and have a few more from
> >> my experience with RDFUnit - but will raise them later since the maintainers
> >> are now too busy replying.
> >>
> >> Those are all valid, interesting points for ShEx, which is at this point
> >> an interesting proof of concept or prototype of an idea. That work should be
> >> carried out in an R&D context. W3C Working Groups are not R&D contexts.
> >>
> >> Cheers,
> >> Kendall Clark
> >>
> >> -------------------------------------------------------------------
> >> Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
> >> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> >> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> >> 1211 Geneve 4,
> >> Switzerland     www.isb-sib.ch - www.uniprot.org
> >> Follow us at https://twitter.com/#!/uniprot
> >> -------------------------------------------------------------------
> >>
> >>
> >
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Sunday, 20 July 2014 08:10:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:02:39 UTC