Re: Observations about facts in genomics from Peter Ansell on 2013-03-22 (public-semweb-lifesci@w3.org from March 2013)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Fri, 22 Mar 2013 12:56:03 +1000
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: Jeremy J Carroll <jjc@syapse.com>, Jerven Bolleman <me@jerven.eu>, Graham Klyne <graham.klyne@zoo.ox.ac.uk>, w3c semweb HCLS <public-semweb-lifesci@w3.org>, Pat Hayes <phayes@ihmc.us>
Message-ID: <CAGYFOCSAurikMqv7Y-TTWCCVJrhV=_yiT+nC_+KMQ6f2pNJmbQ@mail.gmail.com>

On 22 March 2013 12:05, Alan Ruttenberg <alanruttenberg@gmail.com> wrote:
> On Wed, Mar 20, 2013 at 3:15 PM, Jeremy J Carroll <jjc@syapse.com> wrote:
>>
>> To me, that seems to lead us back to the earlier discussion (rathole?)
>> about owl:sameAs
>> I tend to a view that there are diminishing returns in terms of levels of
>> indirection here!
>
> As the number of levels of indirection increases, perhaps. But here we are
> talking about 1 level - separating claims from truth.

The question that scientists spend their lives trying to establish is
the one that you seem to think is clearly defined in this statement,
ie, "seperating claims from 'truth'". In some domains, such as
logic/mathematics, "truth" is easy to define, and that seems to be the
basis that the RDF specifications use to justify their semantics.
However, in others, such as life sciences (ie, the domain of
public-semweb-lifesci), at least some of the best information we have
is approximate idealist information that may not exactly match
anything at all in reality (ie, large genome reference assemblies that
are statistically modelled from multiple samples but may not actually
match base for base with any actual DNA strands in the real world).
These approximations are referenced directly by scientists in their
publications without them having to qualify every statement as
referencing a "claim".

I am not sure why you say that there is only one layer of wrapping
needed. I can think of many different situations where someone could
have more than one layer of alternative interpretations that they may
need to accommodate other scientists now and in the future. The 4 or
so layers that the provenance ontology has just for published
documents are worrying enough, and they may not be enough to map the
complexities of genome reference assemblies, as genomics researchers
may have a different "publication" workflow to book publishers.

> 2) I think there's a big difference between what one publishes on the web,
> and what one uses in the privacy of one's home, so to speak. If one is
> publishing on the web, it is good citizenship to respect specifications, and
> to consider the impact of one's assertions on the broader data consumer
> community. That consideration, IMO, is justification enough for the 1 extra
> indirection necessary to not make statements that are too strong.

The specifications seem to be based on premises that the practicing
scientists may not ever accept. Ie, the idea that there is static
scientific "truth" that can be unamgiuously and continuously
communicated, and not "challengable current theories" that can be
either alternatively stated, or gradually or suddenly revoked and
replaced with new best theories. Scientists need to be able to
interpret, contrast, and concurrently utilise, past information
directly without having to suddenly wrap up past "truths" inside of
"claims" because they may be out of date with something someone else
has now put into the RDF-sphere. The whole idea that statements could
be "too strong" takes its basis from "static truth" and I cannot
personally accept that we need to represent everything for life
sciences inside of "claims" (or alternatively have everyone create new
URIs for everything they want to talk about) just incase it changes in
future or someone would find it difficult to deal with the statement
if their application relies on a different structure for their queries
to work.

If someone else has a completely different problem domain that would
find it difficult to deal with direct, "un-framed"/"un-claim-wrapped"
statements from third-parties using a URI because they clash with some
of their statements or assumptions, how would the claim wrapping
practically help them?

Life scientists attempting to use RDF to model their heterogeneous
information aren't trying to make ambiguous statements or reject the
wisdom of the logic/maths backgrounds of the specifications authors,
they are just trying to get work done, and it seems that we are being
told that we are bad citizens for having a complex, "un-truthy"
domain.

Cheers,

Peter

Received on Friday, 22 March 2013 02:56:33 UTC