- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Thu, 26 Feb 2015 20:07:01 +0000
- To: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
- Cc: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Thanks Dimitris, that makes sense. So the proposed requirement could be something like: “Number of violations should be countable when validating an RDF graph”. I’m not sure if you have contributed a user story already, to back up this requirement? A brief description of what you do with regard to validation in DBpedia? Best, Richard > On 26 Feb 2015, at 19:28, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> wrote: > > Hi Richard, > > There are some cases with the current draft spec where counting is not easy unless we require some conventions > e.g. for sections 15.1.1 CONSTRUCT-based Constraints [1] and 15.1.2 ASK-based Constraints [2] how would you count the number of violations? > or when for a resource there are two violation values (sh:value) should the values create separate violations or grouped in the same resource? > > I already briefly pointed this to Holger. > All we need to change is to require the existence of one SPARQL variable (sh:root or sh:this) in every query > > Best, > Dimitris > > [1] http://w3c.github.io/data-shapes/data-shapes-core/#sparql-constraints-construct > [2] http://w3c.github.io/data-shapes/data-shapes-core/#sparql-constraints-ask > > > On Thu, Feb 26, 2015 at 8:09 PM, Richard Cyganiak <richard@cyganiak.de> wrote: > Hi Dimitris, > > I’m not sure I understand what requirement you’re proposing. > > Are you proposing that SHACL should not include detailed violation reporting facilities, because there could be too many reports? > > Counting violations seems like something that implementations can do no matter how SHACL is designed, so doesn’t appear to give rise to any particular requirement for the language itself? > > Richard > > > > > > On 26 Feb 2015, at 14:30, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> wrote: > > > > Dear all, > > > > I proposed the following requirement that derived from UC34 > > https://www.w3.org/2014/data-shapes/wiki/Requirements#Constraint_Violations_Reporting_Details > > > > In large databases (such as DBpedia) there can be many thousands of violations and getting the detailed nodes that failed is not practical. > > In these cases, getting the number of violations per shape / shape facet is more suited. Most of the times all the violations of a shape facet can be amended with a single code/mapping fix > > > > In the following example we had ~1M violations related to geo from four constraints & another ~1M violations for images that both got fixed with a single commit in the code > > > > http://nl.dbpedia.org/downloads/rdfunit/20141210/ > > *.aggregated* groups constraints with error counts & prevalence > > *.rlog* displays only 10 violation nodes per constraint > > > > -- > > Dimitris Kontokostas > > Department of Computer Science, University of Leipzig > > Research Group: http://aksw.org > > Homepage:http://aksw.org/DimitrisKontokostas > > > > > > -- > Dimitris Kontokostas > Department of Computer Science, University of Leipzig > Research Group: http://aksw.org > Homepage:http://aksw.org/DimitrisKontokostas
Received on Thursday, 26 February 2015 20:07:33 UTC