- From: Holger Knublauch <holger@topquadrant.com>
- Date: Thu, 24 Jul 2014 18:45:45 +1000
- To: public-rdf-shapes@w3.org
- Message-ID: <53D0C7B9.1090103@topquadrant.com>
Olivier, not sure where your hostility comes from. Jerven has made clear that he has first-hand experience in teaching semantic technology to novices. That's a valid target audience, as are advanced ontologists like himself. Regards, Holger On 7/24/14, 6:37 PM, Olivier Rossel wrote: > Please please, do not decide by yourself that one option or the other > is user-friendly or readable or solves a real-life problem. Please > survey that with data definition people from outside our community > (i.e our target audience). > IMHO, the point is to have a good idea of the tediousness vs > capabilites of all the available options. > > > > On Thu, Jul 24, 2014 at 10:15 AM, Jerven Bolleman > <jerven.bolleman@isb-sib.ch <mailto:jerven.bolleman@isb-sib.ch>> wrote: > > Dear All, > > I now see that there are two main desires from the community for > the outcome of this WG process. > The first is documenting what the data should look like, > the second is validating that the data is correct. > > My first messages where about the validation of data being > correct, this one is about what the data should look like. > Some people have expressed the opinion that organizations already > have a large infrastructure for validation but that > they need better documentation today. > > In my opinion, that is formed in a large part by my experience in > teaching RDF/SPARQL and OWL reasoning to interested novices. > > SPIN as it was presented is not nice for the first but is really > great for the second. > ICV is ok for the first and is good for the second. > ShEx, just makes me sad... The readability of regular expressions > with the verbosity of RDF is not a pleasant combination. > Resource shapes, I have only glanced at. > > With a few examples I am going to try to explain the goals I > currently think the WG should investigate (and have that part > investigation goal be part of the Charter) and how SPIN with templates > can achieve these goals. These examples are just for discussion > and illustration purposes they are not a complete proposal and do > not have an implementation. > > A problem with ShEx and ICV as is that it can only express hard > constraints and makes documenting the why of these constraints hard. > SPIN can describe hard constraints and soft/heuristics. For > example lets say we have some data about Formula 1 cars. We want > to say that all cars have 1 driver and 4 or 6 wheels. This is a > hard constraint, as shown below in SPIN/template and ShEx syntax. > > > prefix sp : <http://spinrdf.org/sp#"> > prefix spin : <http://spinrdf.org/spin#"> > prefix spl : <http://spinrdf.org/spl#"> > prefix formula : > <http://example.org/example_ontology_about_formula_one#”> > > formula:Car a owl:Class . > spin:constraint [ a spl:Attribute ; > spl:predicate formula:driver ; > spl:valueType formula:Driver ; > spl:count 1 ] ; > spin:constraint [ spl:union [ a spl:Attribute ; > spl:predicate formula:wheels ; > spl:valueType formula:Wheel ; > spl:count 4 ], > [ a spl:Attribute ; > spl:predicate formula:wheels ; > spl:valueType formula:Wheel ; > spl:count 6 ] ] . > So far straight forward and nothing unusual here. > With some fine tuning this could be improved i.e. removing a few > redundant triples. > But it is quite consistent, one driver, 4 or 6 wheels. Here I try > to do the same in ShEx. > > <FormulaOneCarShape> { a formula:Car, > formula:driver @<DriverShape> , > ( formula:wheels @<WheelShape>{4,4} | > formula:wheels @<WheelShape>{6,6} ) } > <DriverShape> { a formula:Driver } > <WheelShape> { a formula:Wheel } > > Difference between ShEx or SPIN here is 14 to 9 or 6 lines > depending on layout. > SPIN is more explicit and does not need custom syntax. > i.e. its plain RDF. ShEx is more compact but is not compatible in > any way with existing tools. > spl:union is not yet an existing spin template but I think it can > be done. > > However, this example is rather minimal and only deals with > constraints. > I suggest we extend this with soft/heuristics that look like this. > > formula:Car > spin:constraint [ a heuristics:veryFewHave ; > ex:commonType :4WheelCar ; > ex:rareType :6WheelCar ; > rdfs:comment "The Tyrrel P34 had 4 front > wheels and raced in 1976 and 1977, but it is the only known example" ; > rdfs:seeAlso > <http://en.wikipedia.org/wiki/Tyrrell_P34> ] > > :4WheelCar rdfs:subClassOf formula:Car ; > rdfs:subClassOf [ owl:restriction [ owl:onProperty formula:wheel ; > owl:exactCardinality 4 ]] . > > :6WheelCar rdfs:subClassOf formula:Car ; > rdfs:subClassOf [ owl:restriction [ owl:onProperty formula:wheel ; > owl:exactCardinality 6 ]] . > > The idea here is that it allows us to identify the common case and > the exceptional, and document those. With side benefits that > heuristics for data quality control can be triggered for them as > well as optimizations if e.g. java code is generated from these > Expectations. In the example while formula one cars can have four > or six wheels the 6 wheel case is very rare, and if you ever have > a database/message filled with six wheel formula one cars you > should probably investigate. > > You can see that I use OWL here instead of more shapes as OWL is a > great existing technology to determine the type of an instance > given knowledge about its properties. OWL anonymous classes will > also solve the issue of "typeless" constraints, which I expect > will be very rare. So for most users knowing OWL would not be a > requirement. > > One can imagine a an extension to Manchester Syntax that can > encode this as well as the examples given here. > But to be honest I would prefer the RDF syntax to be clean and > straight forward for most uses. When I teach RDF, I always say > everything can be expressed as triple, sometimes its verbose and > awkward but it always works. Every single time I introduce a new > syntax I put up a barrier for adoption and understanding. This is > why I personally do not like OWL Manchester Syntax because it puts > in place an artificial barrier between data and ontologies and > divides a community that should be united. In a two day course I > spend the first day explaining RDF > and SPARQL, and the second day Reasoning and OWL. The second day I > waste a lot of time when using Manchester Syntax and undermine my > first day, which is why I use topbraid composer (free) and its > RDF/turtle views to explain owl:restrictions instead of protege. > > I think all the heuristics constraints for expressing expected > data distributions can be spin:templates > e.g. something like this (please excuse syntax/logic errors and typos) > > heuristics:veryFewHave rdfs:subClassOf spin:Template ; > spin:constraint [ a spl:Argument ; > rdfs:comment "The common super type" ; > spl:predicate heuristic:commonType ; > spl:valueType xsd:anyURI ] ; > spin:constraint [ a spl:Argument ; > rdfs:comment "The rare type" ; > spl:predicate heuristic:rareType ; > spl:valueType xsd:anyURI ] ; > spin:text "CONSTRUCT { > [] a heuristics:HeuristicsViolation ; > spin:violationRoot ?this ; > spin:violationPath ?predicate > rdfs:label ?label . > } WHERE { > { > BIND((spl:objectCount(rdf:type, ?commonType)) AS > ?commonCount) > BIND((spl:objectCount(rdf:type, ?rareType)) AS > ?rareCount) > FILTER((?commonCount/?rareCount) > 0.05) > BIND(CONCAT("The type ", str(?rareType), " is more > than 5% of ", str(?commonType)) as ?label) > }" > > > This heuristics ontology/template library of concepts/thing for > validation can of course be implemented using other technologies > than SPIN. And while these templates should be standardized they > are not part of the the "UI" for simple documentation and > validation reasons. > > In conclusion, SPIN, in collaboration with its templates and > reusing the existing OWL standard is at least as user friendly as > ShEx and it has very good potential to document not just > constraints but expectations. Showing that we can have both simple > and expressive with one standard. > > Sincere regards, > Jerven Bolleman > ------------------------------------------------------------------- > Jerven Bolleman Jerven.Bolleman@isb-sib.ch > <mailto:Jerven.Bolleman@isb-sib.ch> > SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 > 85 <tel:%2B41%20%280%2922%20379%2058%2085> > CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 > <tel:%2B41%20%280%2922%20379%2058%2058> > 1211 Geneve 4, > Switzerland www.isb-sib.ch <http://www.isb-sib.ch> - > www.uniprot.org <http://www.uniprot.org> > Follow us at https://twitter.com/#!/uniprot > <https://twitter.com/#%21/uniprot> > ------------------------------------------------------------------- > > >
Received on Thursday, 24 July 2014 08:46:20 UTC