Re: [Patterns] Materialize Inferences (was Re: Triple materialization at publisher level)

On Wed, Apr 7, 2010 at 10:55 AM, Leigh Dodds <leigh.dodds@talis.com> wrote:
> Vasiliy asks an excellent question below about publishing of inferred
> data. This happens to be one of the patterns on my short-list, so I
> thought I'd share a draft definition here to seek comments and develop
> the discussion. But I'm also interested to explore whether a focused
> discussion on this list is a good way to mine for extra patterns. I've
> amended the subject to clarify things. Let me know what you think.


This is indeed a good question, and one whose answer is necessarily a
delicate balance of tradeoffs.

[snip]
> Linked Data can be consumed by a wide variety of different client
> applications and libraries. Not all of these will have ready access to
> an RDFS or OWL reasoner, e.g. Javascript libraries running within a
> browser or mobile devices with limited processing power. How can a
> publisher provide access to data which can be inferred from the
> triples they are publishing?

You might also mention bandwidth here, and the tension with mobile
devices that if RDF is chunked to heavily, they could find themselves
consuming a lot of resources (both bandwidth and CPU) that could
impact end-user responsiveness of an app; particularly if Linked Data
retrievals are happening in real time.


> SOLUTION
>
> Publish both the original and inferred (materialized) triples within
> the Linked Data.

Suggest s/and inferred/and some inferred/

This is where the delicate tradeoffs come into play, and where we
would all benefit if there were conventions for documenting the
information needs (eg. SPARQL templates) of consuming apps.

If the raw data tells us that _x is a VeganRestaurant, it is probably
worth also materialising that it is a Restaurant. And probably perhaps
typically maybe also worth saying it is an Eating or Recreational
establishment. How many levels or semi-equivalent classes to mention
here would depend on (a) which popular vocabs have suitable vocabulary
(b) what consuming apps there are for these kinds of things, and what
data patterns those apps expect to match. Beyond these mid-level
concepts, we move towards a level of abstraction where it becomes
increasingly unlikely that code and services will care. Knowing that
something is a geo:SpatialThing is not very useful. Knowing also its
geo:lat and geo:long and some display-oriented properties makes it
much more useful. Similarly with dcterms:Agent of foaf:Agent; unless
you have more info, a foaf:Agent could be a bit of software, an animal
(alive or dead), a historical figure, a Group, etc etc. So I think the
decision whether or not to publish the inferred type will be quite
heavily contextual.

A guideline might be:

"As typing information become broader and more inclusive, they also
become less informative: to know that something is a "Thing" is rarely
useful. It is difficult to say whether a class is at a useful level of
specificity without taking into account other datasets, tools and
services that use it, however an intuitive grasp of "mid-level"
concepts often provides useful guidance. In addition, Linked Data apps
have a particular concern for cross-referencing information about
specific things, it is therefore often useful to include inferred
identifiers (owl:sameAs etc) based on analysis of properties
(owl:FunctionalProperty, owl:InverseFunctionalProperty) etc"

Ok that's not very friendly text but hope it might be useful.
Basically "rdf:type owl:Thing" is boring, but "owl:sameAs x:anotherID"
is very useful...

cheers,

Dan

Received on Wednesday, 7 April 2010 12:45:49 UTC