How we can all "win" the Classes vs Shapes war from Holger Knublauch on 2015-01-27 (public-data-shapes-wg@w3.org from January 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 27 Jan 2015 10:51:41 +1000
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <54C6E11D.30009@topquadrant.com>
The WG still struggles to find common ground on something that is 
entirely a war about terminology. This should be easy to fix. Let's try 
by coming back to the specific language syntax. I really don't want to 
talk about philosophical differences now.

There is a fairly obvious mapping between the proposed terms

Shape = Class := Collection of nodes with shared characteristics
ShEx people may prefer ldom:Shape, but the syntax could also use rdfs:Class

Proposals to associate nodes with a shape are either
ldom:instanceShape or rdf:type

Proposals to define reuse/inheritance of shapes are either
ldom:classShape or rdfs:subClassOf

I looked at the LDOM engine, and it could theoretically support all 
options. This would require to duplicate every piece of logic that 
currently uses rdf:type to also use ldom:instanceShape, and every piece 
of logic based on rdfs:subClassOf to also walk up the ldom:classShape 
hierarchy. Then this duplication propagates into every SPARQL query. A 
completely unnecessary nightmare only because of different wording.

It's technically 100% the same thing.

Every function or feature that takes a shape as a parameter can also 
take an rdfs:Class. Nothing is lost by using rdfs:Class directly, 
especially because it already has inheritance solved.

Given this duplication, there is a strong risk that the W3C consensus 
process forces us to create yet another bloated standard that is the 
union of many similar ideas because nobody is ready to give up their 
individual preference. But the result would be a standard that nobody 
understands and implements because it is too complicating. OTOH, the 
concepts used by LDOM (rdfs:Class, rdf:type, rdfs:subClassOf) all 
perfectly align with current practice and are easy to understand by most 
mainstream developers. We can easily attract a large crowd of JSON 
developers, among others, and increase the size of linked data community 
by several orders of magnitude. These people don't care about the 
theoretical distinction between Shapes and Classes that not even we seem 
to fully understand.

Looking at ShEx I believe one of the driving forces was the ShExC 
compact syntax, and I can certainly see that some people may find such a 
thing useful. ShExC is a language on its own. It can have many different 
engine implementation, some already exist and did not require LDOM. 
However, ShExC can also be defined as a mapping into the RDF syntax. 
ShEx already has an RDF syntax. All we need to do is make sure that it 
can also be mapped into LDOM RDF triples. We can define an LDOM profile 
[1] for ShExC that includes ldom:minCount/maxCount/valueType, 
OrConstraint etc and then publish a W3C document that formalizes the 
mapping into that profile. The ShExC developers then have their standard 
that they can continue to research on. The flexibility of the LDOM 
templating mechanism even means that they can add new language elements 
or syntax extensions without requiring changes to the core language.

The key is that ShExC is its own text syntax anyway, so who cares 
whether it gets mapped to ldom:Shape or rdfs:Class for execution? There 
is no difference for the user, only that we can keep LDOM simple.

People who write papers or books about this language can still say "To 
declare a Shape you write [ShExC snippet] or [ex:Person a rdfs:Class]". 
It's an entirely syntactic detail whether these are called Shape or Class.

The other issue brought forward by Eric is that there are many ways to 
trigger the constraint checking. LDOM suggests to use rdf:type to point 
at the shapes that a given node should be evaluated against *by 
default*. But that is easy to override. The API will have an entry point 
that checks whether a given Node fulfills a given Shape (currently 
implemented as ldom:violatesConstraints, maybe renamed to 
ldom:hasShape). Any application can call this function with their own 
protocol. It could be part of an HTTP header or whatever, or an 
application dynamically adds rdf:type triple to drive the execution 
engine. There are many ways in which this can work, and many of those 
ways are outside of the scope of this WG and better left to Linked Data 
Platform or Hydra, or even local to the ShExC specification.

Can we please work together to create the right layering of this 
technology stack so that everyone gets what they need, instead of a 
messed up hybrid solution where everything is just thrown together?

Thanks,
Holger

[1] https://w3c.github.io/data-shapes/data-shapes-primer/#profiles
Received on Tuesday, 27 January 2015 00:52:14 UTC