Re: Shape inference from Eric Prud'hommeaux on 2019-03-18 (public-shex@w3.org from March 2019)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 18 Mar 2019 22:03:12 +0100
To: Thomas Baker <tom@tombaker.org>
Cc: Daniel Fernández Álvarez <danifdezalvarez@gmail.com>, "public-shex@w3.org" <public-shex@w3.org>
Message-ID: <20190318210310.GA4065@w3.org>
On Mon, Mar 18, 2019 at 09:25:43PM +0100, Thomas Baker wrote:
> Hi Daniel,
> 
> This sounds like very interesting work though I wonder if
> "inference" is the most appropriate word to convey what
> you mean.
> 
> In a semantic sense, "inference" involves adding triples
> to a graph to make relations between things explicit on
> the basis of sub-class definitions, domain and range
> definitions, and the like.
> 
> In contrast, you are taking RDF data graphs as input,
> then "deriving" or "generating" corresponding shapes --
> shapes that are not themselves RDF graphs (well, not
> unless they are expressed in ShExR syntax).
> 
> The process does seem like a sort of "inference", but the
> term is so heavily loaded in common usage towards
> _semantic_ inference in the RDF and OWL sense that it
> might be best to explicitly draw attention to how you are
> using the term differently.

It's true that the term is heavily loaded, but I wonder if saying
"shape inference" over and over again will cause folks will figure out
that the phrase means "inference of shapes" (vs. some sort of
inference entailed by shapes). If we can get past our reflexive
interpretations, "shape inference" does seem a nice description of it.


> Aside from that quibble: a nice project! :-)
> 
> Tom
> 
> On Fri, Mar 01, 2019 at 08:31:52PM +0100, Daniel Fernández Álvarez wrote:
> > I am Daniel Fernández-Álvarez, PhD student at the University of Oviedo (Spain). I'd like to share with you a tool for shape inference that I mentioned in a call about this topic last week. I'm developing this tool in a public repository (python), in case you want to check it or even add some issue.
> > 
> > Currently, the only way to use this is to clone the repo and execute it locally. But I am actively developing it, and my priorities right now are to make the tool easier to use by:
> > - Offering a web service.
> > - Surrounding that WS also with a webapp.
> > 
> > I've run some experiments using Wikidata content (considering just direct properties). The results can be checked here.
> > 
> > I briefly describe the tool's (current) features:
> > 
> > -  The main inputs are an RDF graph and a set of classes selected by the user. The output is a ShEx file containing a shape inferred for each one of those classes.
> > 
> > - The shape of each class is inferred w.r.t. the outgoing links of its instances. 
> > 
> > - A single triple may be a reason to consider different constraints for a given shape, being more or less specific regarding the type of the object and the cardinality. For instance, a triple such as (:Harry :name "Harry"), could produce  constraints such as
> >      - :name xsd:string  ;
> >      - :name Literal ;
> >      - :name Literal + ;
> >      - :name . * ;
> >      - ...
> > The algorithm considers most of these possibilities and associate to each constraint a score which reflects the proportion of instances of a class that actually conform with the constraint. That let us sort the constraints in the final shape regarding how trustworthy they are. 
> > Most of these constraints does not appear in the final shape, but just the most representative ones according to some config params. The rest of them are used to provide extra information via comments regarding specific cardinalities or objects, as it is shown here
> > 
> > - It makes shape interlinkage when there are links between instances whose classes have an associated shape. In the rest of cases, it represents these relations using the macro IRI. 
> > 
> > There are several configuration params that I'm starting to document here. Those params allow you to do things like ignoring constraints which have a low score, ignoring certain triples, producing shapes which are valid for every instance (using Kleene closures when needed in the constraints) or shapes that look more reasonable bearing in mind the scores (even if that makes that some instances are not compliant with the shape), and so on.
> > 
> > Any feedback, question, suggestion or request about this would be really welcomed =)
> > 
> > Best regards,
> > Dani F.
> > 
> > 
> > 
> > 
> 
> -- 
> Tom Baker <tom@tombaker.org>
> 

-- 
-eric

office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Monday, 18 March 2019 21:03:19 UTC