Reification: was Re: ANNOUNCE: W3C Workshop on Semantic Web for Life Sciences from Dave Reynolds on 2004-08-20 (public-semweb-lifesci@w3.org from August 2004)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Fri, 20 Aug 2004 16:51:34 +0100
To: Eric Jain <Eric.Jain@isb-sib.ch>
Cc: Massimo Marchiori <massimo@w3.org>, public-semweb-lifesci@w3.org
Message-ID: <41261E06.4050204@hplb.hpl.hp.com>

Eric Jain wrote:

>> 3. Interesting: do your use cases have absolute needs
>> for reification, or it's just a convenience? Is
>> your only use just use case 6 (provenance)?
> 
> 
> Consider this example: A protein may occur in one or more organisms. We 
> may need to indicate who observed this protein in a specific organism, 
> and cite a relevant publication etc. This information obviously can't be 
> attached to either the protein or the taxon resource. We could create 
> intermediary resources for connecting proteins to taxa, but this seems 
> unnatural and is impractical, because the same procedure would have to 
> be repeated for many other properties. Also, no application should break 
> because one day we decide to provide some provenance data for something 
> that previously never had any.

A good use case for provenance information.

You could represent this by explicitly reifying the observation rather than 
the RDF statement. For example, have an ObservationEvent class, instances 
of which indicate the protein, the organism, the observer, the citation etc 
directly. This could sit alongside rather than replace a direct link 
between protein and organism so that the presence or absence of such 
information does not change the navigation structure.

This doesn't really save anything compared to attaching properties to a 
reified statement but it is another option.

> By quads I meant (perhaps misusing the terminology) that when parsing 
> something like
> 
> <rdf:Description rdf:about="P12345">
>   <name rdf:ID="S1">Foo</name>
> </rdf:Description>
> 
> with a statement-by-statement callback mechanism, most parsers will return:
> 
> P12345 name 'Foo'
> S1 rdf:type rdf:Statement
> S1 rdf:subject P12345
> S1 rdf:predicate name
> S1 rdf:object 'Foo'
> 
> Rather than:
> 
> S1: P12345 name 'Foo'
> 
> Which would be much simpler and more efficient to process, in my opinion.

The efficiency does depend on the platform.

In Jena we tried to get some way towards the efficiency and simplicity of 
the latter while still supporting the standard. The stores can (and in the 
RDB case, do) store the quad of statements compactly. The API allows you to 
optionally hide the reification quads so that your model isn't cluttered up 
with the extra statements yet you can still use the reification API to get 
from a Statement to the resource representing its reification.

This is a complex part of the implementation to maintain and appears to be 
so little used in practice that there is a proposal to deprecate it [1]. 
Perhaps your experience suggests we should consider at least postponing the 
deprecation a little longer in case someone starts to need this functionality.

Dave

[1] http://groups.yahoo.com/group/jena-dev/message/8523

Received on Friday, 20 August 2004 15:52:01 UTC