- From: Dan Brickley <danbri@danbri.org>
- Date: Wed, 7 Apr 2010 14:45:14 +0200
- To: Leigh Dodds <leigh.dodds@talis.com>
- Cc: public-lod <public-lod@w3.org>
On Wed, Apr 7, 2010 at 10:55 AM, Leigh Dodds <leigh.dodds@talis.com> wrote: > Vasiliy asks an excellent question below about publishing of inferred > data. This happens to be one of the patterns on my short-list, so I > thought I'd share a draft definition here to seek comments and develop > the discussion. But I'm also interested to explore whether a focused > discussion on this list is a good way to mine for extra patterns. I've > amended the subject to clarify things. Let me know what you think. This is indeed a good question, and one whose answer is necessarily a delicate balance of tradeoffs. [snip] > Linked Data can be consumed by a wide variety of different client > applications and libraries. Not all of these will have ready access to > an RDFS or OWL reasoner, e.g. Javascript libraries running within a > browser or mobile devices with limited processing power. How can a > publisher provide access to data which can be inferred from the > triples they are publishing? You might also mention bandwidth here, and the tension with mobile devices that if RDF is chunked to heavily, they could find themselves consuming a lot of resources (both bandwidth and CPU) that could impact end-user responsiveness of an app; particularly if Linked Data retrievals are happening in real time. > SOLUTION > > Publish both the original and inferred (materialized) triples within > the Linked Data. Suggest s/and inferred/and some inferred/ This is where the delicate tradeoffs come into play, and where we would all benefit if there were conventions for documenting the information needs (eg. SPARQL templates) of consuming apps. If the raw data tells us that _x is a VeganRestaurant, it is probably worth also materialising that it is a Restaurant. And probably perhaps typically maybe also worth saying it is an Eating or Recreational establishment. How many levels or semi-equivalent classes to mention here would depend on (a) which popular vocabs have suitable vocabulary (b) what consuming apps there are for these kinds of things, and what data patterns those apps expect to match. Beyond these mid-level concepts, we move towards a level of abstraction where it becomes increasingly unlikely that code and services will care. Knowing that something is a geo:SpatialThing is not very useful. Knowing also its geo:lat and geo:long and some display-oriented properties makes it much more useful. Similarly with dcterms:Agent of foaf:Agent; unless you have more info, a foaf:Agent could be a bit of software, an animal (alive or dead), a historical figure, a Group, etc etc. So I think the decision whether or not to publish the inferred type will be quite heavily contextual. A guideline might be: "As typing information become broader and more inclusive, they also become less informative: to know that something is a "Thing" is rarely useful. It is difficult to say whether a class is at a useful level of specificity without taking into account other datasets, tools and services that use it, however an intuitive grasp of "mid-level" concepts often provides useful guidance. In addition, Linked Data apps have a particular concern for cross-referencing information about specific things, it is therefore often useful to include inferred identifiers (owl:sameAs etc) based on analysis of properties (owl:FunctionalProperty, owl:InverseFunctionalProperty) etc" Ok that's not very friendly text but hope it might be useful. Basically "rdf:type owl:Thing" is boring, but "owl:sameAs x:anotherID" is very useful... cheers, Dan
Received on Wednesday, 7 April 2010 12:45:49 UTC