- From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
- Date: Thu, 24 Jul 2014 11:59:42 +0300
- To: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
- Cc: "public-rdf-sha." <public-rdf-shapes@w3.org>
- Message-ID: <CA+u4+a1A=M0=hFHGmhzYxidL1SpJZYvmGs_L0coft4MZmbYzwg@mail.gmail.com>
On Thu, Jul 24, 2014 at 11:15 AM, Jerven Bolleman < jerven.bolleman@isb-sib.ch> wrote: > Dear All, > > I now see that there are two main desires from the community for the > outcome of this WG process. > The first is documenting what the data should look like, > the second is validating that the data is correct. > > My first messages where about the validation of data being correct, this > one is about what the data should look like. > Some people have expressed the opinion that organizations already have a > large infrastructure for validation but that > they need better documentation today. > > In my opinion, that is formed in a large part by my experience in teaching > RDF/SPARQL and OWL reasoning to interested novices. > > SPIN as it was presented is not nice for the first but is really great for > the second. > ICV is ok for the first and is good for the second. > ShEx, just makes me sad... The readability of regular expressions with the > verbosity of RDF is not a pleasant combination. > Resource shapes, I have only glanced at. > > With a few examples I am going to try to explain the goals I currently > think the WG should investigate (and have that part investigation goal be > part of the Charter) and how SPIN with templates > can achieve these goals. These examples are just for discussion and > illustration purposes they are not a complete proposal and do not have an > implementation. > > A problem with ShEx and ICV as is that it can only express hard > constraints and makes documenting the why of these constraints hard. > SPIN can describe hard constraints and soft/heuristics. For example lets > say we have some data about Formula 1 cars. We want to say that all cars > have 1 driver and 4 or 6 wheels. This is a hard constraint, as shown below > in SPIN/template and ShEx syntax. > > > prefix sp : <http://spinrdf.org/sp#"> > prefix spin : <http://spinrdf.org/spin#"> > prefix spl : <http://spinrdf.org/spl#"> > prefix formula : <http://example.org/example_ontology_about_formula_one#”> > > formula:Car a owl:Class . > spin:constraint [ a spl:Attribute ; > spl:predicate formula:driver ; > spl:valueType formula:Driver ; > spl:count 1 ] ; > spin:constraint [ spl:union [ a spl:Attribute ; > spl:predicate formula:wheels ; > spl:valueType formula:Wheel ; > spl:count 4 ], > [ a spl:Attribute ; > spl:predicate formula:wheels ; > spl:valueType formula:Wheel ; > spl:count 6 ] ] . > So far straight forward and nothing unusual here. > With some fine tuning this could be improved i.e. removing a few redundant > triples. > But it is quite consistent, one driver, 4 or 6 wheels. Here I try to do > the same in ShEx. > > <FormulaOneCarShape> { a formula:Car, > formula:driver @<DriverShape> , > ( formula:wheels @<WheelShape>{4,4} | > formula:wheels @<WheelShape>{6,6} ) } > <DriverShape> { a formula:Driver } > <WheelShape> { a formula:Wheel } > > Difference between ShEx or SPIN here is 14 to 9 or 6 lines depending on > layout. > SPIN is more explicit and does not need custom syntax. > i.e. its plain RDF. ShEx is more compact but is not compatible in > any way with existing tools. > spl:union is not yet an existing spin template but I think it can be done. > > However, this example is rather minimal and only deals with constraints. > I suggest we extend this with soft/heuristics that look like this. > > formula:Car > spin:constraint [ a heuristics:veryFewHave ; > ex:commonType :4WheelCar ; > ex:rareType :6WheelCar ; > rdfs:comment "The Tyrrel P34 had 4 front wheels and > raced in 1976 and 1977, but it is the only known example" ; > rdfs:seeAlso <http://en.wikipedia.org/wiki/Tyrrell_P34> > ] > > :4WheelCar rdfs:subClassOf formula:Car ; > rdfs:subClassOf [ owl:restriction [ owl:onProperty formula:wheel ; > owl:exactCardinality 4 ]] . > > :6WheelCar rdfs:subClassOf formula:Car ; > rdfs:subClassOf [ owl:restriction [ owl:onProperty formula:wheel ; > owl:exactCardinality 6 ]] . > > The idea here is that it allows us to identify the common case and the > exceptional, and document those. With side benefits that heuristics for > data quality control can be triggered for them as well as optimizations if > e.g. java code is generated from these Expectations. In the example while > formula one cars can have four or six wheels the 6 wheel case is very rare, > and if you ever have a database/message filled with six wheel formula one > cars you should probably investigate. > One note here (i'm not going into syntax) if we adopt the severety level paradigm this can be easily supported from both ShEx & SPIN with a rule like this: Rule X, "cars with six wheels are uncommon" @level warning (or notice) > > You can see that I use OWL here instead of more shapes as OWL is a great > existing technology to determine the type of an instance given knowledge > about its properties. OWL anonymous classes will also solve the issue of > "typeless" constraints, which I expect will be very rare. So for most users > knowing OWL would not be a requirement. > > One can imagine a an extension to Manchester Syntax that can encode this > as well as the examples given here. > But to be honest I would prefer the RDF syntax to be clean and straight > forward for most uses. When I teach RDF, I always say everything can be > expressed as triple, sometimes its verbose and awkward but it always works. > Every single time I introduce a new syntax I put up a barrier for adoption > and understanding. This is why I personally do not like OWL Manchester > Syntax because it puts in place an artificial barrier between data and > ontologies and divides a community that should be united. In a two day > course I spend the first day explaining RDF > and SPARQL, and the second day Reasoning and OWL. The second day I waste a > lot of time when using Manchester Syntax and undermine my first day, which > is why I use topbraid composer (free) and its RDF/turtle views to explain > owl:restrictions instead of protege. > > I think all the heuristics constraints for expressing expected data > distributions can be spin:templates > e.g. something like this (please excuse syntax/logic errors and typos) > > heuristics:veryFewHave rdfs:subClassOf spin:Template ; > spin:constraint [ a spl:Argument ; > rdfs:comment "The common super type" ; > spl:predicate heuristic:commonType ; > spl:valueType xsd:anyURI ] ; > spin:constraint [ a spl:Argument ; > rdfs:comment "The rare type" ; > spl:predicate heuristic:rareType ; > spl:valueType xsd:anyURI ] ; > spin:text "CONSTRUCT { > [] a heuristics:HeuristicsViolation ; > spin:violationRoot ?this ; > spin:violationPath ?predicate > rdfs:label ?label . > } WHERE { > { > BIND((spl:objectCount(rdf:type, ?commonType)) AS > ?commonCount) > BIND((spl:objectCount(rdf:type, ?rareType)) AS ?rareCount) > FILTER((?commonCount/?rareCount) > 0.05) > BIND(CONCAT("The type ", str(?rareType), " is more than 5% > of ", str(?commonType)) as ?label) > }" > > > This heuristics ontology/template library of concepts/thing for validation > can of course be implemented using other technologies than SPIN. And while > these templates should be standardized they are not part of the the "UI" > for simple documentation and validation reasons. > > In conclusion, SPIN, in collaboration with its templates and reusing the > existing OWL standard is at least as user friendly as ShEx and it has very > good potential to document not just constraints but expectations. Showing > that we can have both simple and expressive with one standard. > > Sincere regards, > Jerven Bolleman > ------------------------------------------------------------------- > Jerven Bolleman Jerven.Bolleman@isb-sib.ch > SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 > CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 > 1211 Geneve 4, > Switzerland www.isb-sib.ch - www.uniprot.org > Follow us at https://twitter.com/#!/uniprot > ------------------------------------------------------------------- > > > > -- Dimitris Kontokostas Department of Computer Science, University of Leipzig Research Group: http://aksw.org Homepage:http://aksw.org/DimitrisKontokostas
Received on Thursday, 24 July 2014 09:00:41 UTC