- From: Holger Knublauch <holger@topquadrant.com>
- Date: Thu, 24 Jul 2014 18:45:45 +1000
- To: public-rdf-shapes@w3.org
- Message-ID: <53D0C7B9.1090103@topquadrant.com>
Olivier,
not sure where your hostility comes from. Jerven has made clear that he
has first-hand experience in teaching semantic technology to novices.
That's a valid target audience, as are advanced ontologists like himself.
Regards,
Holger
On 7/24/14, 6:37 PM, Olivier Rossel wrote:
> Please please, do not decide by yourself that one option or the other
> is user-friendly or readable or solves a real-life problem. Please
> survey that with data definition people from outside our community
> (i.e our target audience).
> IMHO, the point is to have a good idea of the tediousness vs
> capabilites of all the available options.
>
>
>
> On Thu, Jul 24, 2014 at 10:15 AM, Jerven Bolleman
> <jerven.bolleman@isb-sib.ch <mailto:jerven.bolleman@isb-sib.ch>> wrote:
>
> Dear All,
>
> I now see that there are two main desires from the community for
> the outcome of this WG process.
> The first is documenting what the data should look like,
> the second is validating that the data is correct.
>
> My first messages where about the validation of data being
> correct, this one is about what the data should look like.
> Some people have expressed the opinion that organizations already
> have a large infrastructure for validation but that
> they need better documentation today.
>
> In my opinion, that is formed in a large part by my experience in
> teaching RDF/SPARQL and OWL reasoning to interested novices.
>
> SPIN as it was presented is not nice for the first but is really
> great for the second.
> ICV is ok for the first and is good for the second.
> ShEx, just makes me sad... The readability of regular expressions
> with the verbosity of RDF is not a pleasant combination.
> Resource shapes, I have only glanced at.
>
> With a few examples I am going to try to explain the goals I
> currently think the WG should investigate (and have that part
> investigation goal be part of the Charter) and how SPIN with templates
> can achieve these goals. These examples are just for discussion
> and illustration purposes they are not a complete proposal and do
> not have an implementation.
>
> A problem with ShEx and ICV as is that it can only express hard
> constraints and makes documenting the why of these constraints hard.
> SPIN can describe hard constraints and soft/heuristics. For
> example lets say we have some data about Formula 1 cars. We want
> to say that all cars have 1 driver and 4 or 6 wheels. This is a
> hard constraint, as shown below in SPIN/template and ShEx syntax.
>
>
> prefix sp : <http://spinrdf.org/sp#">
> prefix spin : <http://spinrdf.org/spin#">
> prefix spl : <http://spinrdf.org/spl#">
> prefix formula :
> <http://example.org/example_ontology_about_formula_one#”>
>
> formula:Car a owl:Class .
> spin:constraint [ a spl:Attribute ;
> spl:predicate formula:driver ;
> spl:valueType formula:Driver ;
> spl:count 1 ] ;
> spin:constraint [ spl:union [ a spl:Attribute ;
> spl:predicate formula:wheels ;
> spl:valueType formula:Wheel ;
> spl:count 4 ],
> [ a spl:Attribute ;
> spl:predicate formula:wheels ;
> spl:valueType formula:Wheel ;
> spl:count 6 ] ] .
> So far straight forward and nothing unusual here.
> With some fine tuning this could be improved i.e. removing a few
> redundant triples.
> But it is quite consistent, one driver, 4 or 6 wheels. Here I try
> to do the same in ShEx.
>
> <FormulaOneCarShape> { a formula:Car,
> formula:driver @<DriverShape> ,
> ( formula:wheels @<WheelShape>{4,4} |
> formula:wheels @<WheelShape>{6,6} ) }
> <DriverShape> { a formula:Driver }
> <WheelShape> { a formula:Wheel }
>
> Difference between ShEx or SPIN here is 14 to 9 or 6 lines
> depending on layout.
> SPIN is more explicit and does not need custom syntax.
> i.e. its plain RDF. ShEx is more compact but is not compatible in
> any way with existing tools.
> spl:union is not yet an existing spin template but I think it can
> be done.
>
> However, this example is rather minimal and only deals with
> constraints.
> I suggest we extend this with soft/heuristics that look like this.
>
> formula:Car
> spin:constraint [ a heuristics:veryFewHave ;
> ex:commonType :4WheelCar ;
> ex:rareType :6WheelCar ;
> rdfs:comment "The Tyrrel P34 had 4 front
> wheels and raced in 1976 and 1977, but it is the only known example" ;
> rdfs:seeAlso
> <http://en.wikipedia.org/wiki/Tyrrell_P34> ]
>
> :4WheelCar rdfs:subClassOf formula:Car ;
> rdfs:subClassOf [ owl:restriction [ owl:onProperty formula:wheel ;
> owl:exactCardinality 4 ]] .
>
> :6WheelCar rdfs:subClassOf formula:Car ;
> rdfs:subClassOf [ owl:restriction [ owl:onProperty formula:wheel ;
> owl:exactCardinality 6 ]] .
>
> The idea here is that it allows us to identify the common case and
> the exceptional, and document those. With side benefits that
> heuristics for data quality control can be triggered for them as
> well as optimizations if e.g. java code is generated from these
> Expectations. In the example while formula one cars can have four
> or six wheels the 6 wheel case is very rare, and if you ever have
> a database/message filled with six wheel formula one cars you
> should probably investigate.
>
> You can see that I use OWL here instead of more shapes as OWL is a
> great existing technology to determine the type of an instance
> given knowledge about its properties. OWL anonymous classes will
> also solve the issue of "typeless" constraints, which I expect
> will be very rare. So for most users knowing OWL would not be a
> requirement.
>
> One can imagine a an extension to Manchester Syntax that can
> encode this as well as the examples given here.
> But to be honest I would prefer the RDF syntax to be clean and
> straight forward for most uses. When I teach RDF, I always say
> everything can be expressed as triple, sometimes its verbose and
> awkward but it always works. Every single time I introduce a new
> syntax I put up a barrier for adoption and understanding. This is
> why I personally do not like OWL Manchester Syntax because it puts
> in place an artificial barrier between data and ontologies and
> divides a community that should be united. In a two day course I
> spend the first day explaining RDF
> and SPARQL, and the second day Reasoning and OWL. The second day I
> waste a lot of time when using Manchester Syntax and undermine my
> first day, which is why I use topbraid composer (free) and its
> RDF/turtle views to explain owl:restrictions instead of protege.
>
> I think all the heuristics constraints for expressing expected
> data distributions can be spin:templates
> e.g. something like this (please excuse syntax/logic errors and typos)
>
> heuristics:veryFewHave rdfs:subClassOf spin:Template ;
> spin:constraint [ a spl:Argument ;
> rdfs:comment "The common super type" ;
> spl:predicate heuristic:commonType ;
> spl:valueType xsd:anyURI ] ;
> spin:constraint [ a spl:Argument ;
> rdfs:comment "The rare type" ;
> spl:predicate heuristic:rareType ;
> spl:valueType xsd:anyURI ] ;
> spin:text "CONSTRUCT {
> [] a heuristics:HeuristicsViolation ;
> spin:violationRoot ?this ;
> spin:violationPath ?predicate
> rdfs:label ?label .
> } WHERE {
> {
> BIND((spl:objectCount(rdf:type, ?commonType)) AS
> ?commonCount)
> BIND((spl:objectCount(rdf:type, ?rareType)) AS
> ?rareCount)
> FILTER((?commonCount/?rareCount) > 0.05)
> BIND(CONCAT("The type ", str(?rareType), " is more
> than 5% of ", str(?commonType)) as ?label)
> }"
>
>
> This heuristics ontology/template library of concepts/thing for
> validation can of course be implemented using other technologies
> than SPIN. And while these templates should be standardized they
> are not part of the the "UI" for simple documentation and
> validation reasons.
>
> In conclusion, SPIN, in collaboration with its templates and
> reusing the existing OWL standard is at least as user friendly as
> ShEx and it has very good potential to document not just
> constraints but expectations. Showing that we can have both simple
> and expressive with one standard.
>
> Sincere regards,
> Jerven Bolleman
> -------------------------------------------------------------------
> Jerven Bolleman Jerven.Bolleman@isb-sib.ch
> <mailto:Jerven.Bolleman@isb-sib.ch>
> SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58
> 85 <tel:%2B41%20%280%2922%20379%2058%2085>
> CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58
> <tel:%2B41%20%280%2922%20379%2058%2058>
> 1211 Geneve 4,
> Switzerland www.isb-sib.ch <http://www.isb-sib.ch> -
> www.uniprot.org <http://www.uniprot.org>
> Follow us at https://twitter.com/#!/uniprot
> <https://twitter.com/#%21/uniprot>
> -------------------------------------------------------------------
>
>
>
Received on Thursday, 24 July 2014 08:46:20 UTC