Re: Shape inference from Thomas Baker on 2019-03-18 (public-shex@w3.org from March 2019)

From: Thomas Baker <tom@tombaker.org>
Date: Mon, 18 Mar 2019 21:25:43 +0100
To: Daniel Fernández Álvarez <danifdezalvarez@gmail.com>
Cc: "public-shex@w3.org" <public-shex@w3.org>
Message-ID: <20190318202543.GA59244@cicero.speedport.ip>
Hi Daniel,

This sounds like very interesting work though I wonder if
"inference" is the most appropriate word to convey what
you mean.

In a semantic sense, "inference" involves adding triples
to a graph to make relations between things explicit on
the basis of sub-class definitions, domain and range
definitions, and the like.

In contrast, you are taking RDF data graphs as input,
then "deriving" or "generating" corresponding shapes --
shapes that are not themselves RDF graphs (well, not
unless they are expressed in ShExR syntax).

The process does seem like a sort of "inference", but the
term is so heavily loaded in common usage towards
_semantic_ inference in the RDF and OWL sense that it
might be best to explicitly draw attention to how you are
using the term differently.

Aside from that quibble: a nice project! :-)

Tom

On Fri, Mar 01, 2019 at 08:31:52PM +0100, Daniel Fernández Álvarez wrote:
> I am Daniel Fernández-Álvarez, PhD student at the University of Oviedo (Spain). I'd like to share with you a tool for shape inference that I mentioned in a call about this topic last week. I'm developing this tool in a public repository (python), in case you want to check it or even add some issue.
> 
> Currently, the only way to use this is to clone the repo and execute it locally. But I am actively developing it, and my priorities right now are to make the tool easier to use by:
> - Offering a web service.
> - Surrounding that WS also with a webapp.
> 
> I've run some experiments using Wikidata content (considering just direct properties). The results can be checked here.
> 
> I briefly describe the tool's (current) features:
> 
> -  The main inputs are an RDF graph and a set of classes selected by the user. The output is a ShEx file containing a shape inferred for each one of those classes.
> 
> - The shape of each class is inferred w.r.t. the outgoing links of its instances. 
> 
> - A single triple may be a reason to consider different constraints for a given shape, being more or less specific regarding the type of the object and the cardinality. For instance, a triple such as (:Harry :name "Harry"), could produce  constraints such as
>      - :name xsd:string  ;
>      - :name Literal ;
>      - :name Literal + ;
>      - :name . * ;
>      - ...
> The algorithm considers most of these possibilities and associate to each constraint a score which reflects the proportion of instances of a class that actually conform with the constraint. That let us sort the constraints in the final shape regarding how trustworthy they are. 
> Most of these constraints does not appear in the final shape, but just the most representative ones according to some config params. The rest of them are used to provide extra information via comments regarding specific cardinalities or objects, as it is shown here
> 
> - It makes shape interlinkage when there are links between instances whose classes have an associated shape. In the rest of cases, it represents these relations using the macro IRI. 
> 
> There are several configuration params that I'm starting to document here. Those params allow you to do things like ignoring constraints which have a low score, ignoring certain triples, producing shapes which are valid for every instance (using Kleene closures when needed in the constraints) or shapes that look more reasonable bearing in mind the scores (even if that makes that some instances are not compliant with the shape), and so on.
> 
> Any feedback, question, suggestion or request about this would be really welcomed =)
> 
> Best regards,
> Dani F.
> 
> 
> 
> 

-- 
Tom Baker <tom@tombaker.org>
Received on Monday, 18 March 2019 20:26:12 UTC