Re: Advancing translational research with the Semantic Web from Chris Mungall on 2007-05-17 (public-semweb-lifesci@w3.org from May 2007)

From: Chris Mungall <cjm@fruitfly.org>
Date: Thu, 17 May 2007 00:13:19 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Jain <Eric.Jain@isb-sib.ch>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <4DBC3E97-7A51-4968-98DC-C8E46D54810B@fruitfly.org>

On May 16, 2007, at 12:16 PM, Pat Hayes wrote:

> The association is done by the reification using a URI which is  
> intended to identify the triple. However, there is no 'standard'  
> way to associate a URI with an RDF triple. This is exactly the  
> problem that named graphs were proposed as a way to solve. The  
> other is that one rarely wants to assign properties like belief and  
> provenance to a single triple;

This may be generally true for the standard semantic web use case,  
but not necessarily for representations of biological generalisations  
that are generated from biological data. For example, each of the 10  
million statements that comprise the annotations made to the Gene  
Ontology would have its own provenance/evidence (although the  
statements themselves may be type-level assertions and thus take up  
 >1 triple)

Provenance is kind of important for science, and it doesn't do it any  
favours to mix provenance at the document and statement levels

> and saying that you believe/are responsible for a graph, and saying  
> that you believe/are responsible for every triple in the graph,  
> might well not be exactly equivalent. Since one can always treat a  
> single triple as a very small graph when needed, the graph seems to  
> be the best 'unit' to choose.

Horses for courses I guess - having 10m graphs and 10m RDF/XML  
documents would seem a bit excessive to me but if it can express the  
same thing then that's fine with me. But since the word clunky was  
being thrown around w.r.t. RDF-reifIcation, I have to say this seems  
fairly clunky.

Also, if we identify the named graph with the URI of the document  
this seems to be mixing two separate concerns. There is my trust in  
the sequence of bytes that constitute my RDF document, which is  
dependent on things such as network security, and my trust in the  
statements about reality encoded in that document, which are  
dependent on scientific matters such as experimental evidence.

> I really would suggest the named graphs would be a better  
> underpinning. Unlike reification, they do have a full semantics and  
> a clear deployment model, and they follow in a long tradition of  
> naming document-like semantic entities. And unlike RDF reification,  
> they are not widely loathed, and they are fairly widely supported.

I've never understood why RDF-reification is so loathed. So the  
syntax is ugly - but I think there may be other reasons RDF/XML  
hasn't won any beauty contests.

The lack of semantics seem fine to me (although more could be done to  
clear up some misunderstandings in the documentation) - all I want is  
a way to attach provenance to a statement.

The only support I'd want would be some behind-the-scenes optimising  
away of the fact I have 4n triples when a single 3-ary predicate  
would do (but hey, again, as it's RDF anyway, I need at least 4  
triples for each type-level statement). Though support in other  
syntaxes like SPARQL would be nice, and presumably easy to layer on,  
perhaps in some intermediate representation.

Received on Thursday, 17 May 2007 04:13:36 UTC