Re: Observations about facts in genomics

On Thu, Mar 21, 2013 at 7:56 PM, Peter Ansell <ansell.peter@gmail.com>wrote:

> On 22 March 2013 12:05, Alan Ruttenberg <alanruttenberg@gmail.com> wrote:
> > On Wed, Mar 20, 2013 at 3:15 PM, Jeremy J Carroll <jjc@syapse.com>
> wrote:
> >>
> >> To me, that seems to lead us back to the earlier discussion (rathole?)
> >> about owl:sameAs
> >> I tend to a view that there are diminishing returns in terms of levels
> of
> >> indirection here!
> >
> > As the number of levels of indirection increases, perhaps. But here we
> are
> > talking about 1 level - separating claims from truth.
>
> The question that scientists spend their lives trying to establish is
> the one that you seem to think is clearly defined in this statement,
> ie, "seperating claims from 'truth'". In some domains, such as
> logic/mathematics, "truth" is easy to define, and that seems to be the
> basis that the RDF specifications use to justify their semantics.
> However, in others, such as life sciences (ie, the domain of
> public-semweb-lifesci), at least some of the best information we have
> is approximate idealist information that may not exactly match
> anything at all in reality (ie, large genome reference assemblies that
> are statistically modelled from multiple samples but may not actually
> match base for base with any actual DNA strands in the real world).
> These approximations are referenced directly by scientists in their
> publications without them having to qualify every statement as
> referencing a "claim".
>

When they need to say it is a claim they do so, either referring to the
matter in that way or by language signals in their text. In the other cases
they are different consequences for getting things wrong. If they assert
something directly and their result depends on it then their result will be
called into question.

I am not saying that science presented as fact is infallible. Of course it
is. But when we talk about things as fact we tend to back it up by implicit
agreement to revise if the thing presented as fact is determined to be
false. I (and Foundry) take this situation as one where the ontological
commitment is one in which, should we make such statements and find them to
be wrong, we will fix them. There's lots of cases where that isn't the
commitment, Jeremy's case being one of them, I suspect. And there are
plenty of true things (things that no one would object to) despite this.
That the information is about a dna sequence, that it is about differences
between humans, that it is about an amino acid change at one place in the
molecule, etc.

Associated with each of the three kinds of logical commitment in those
slides I wrote what inconsistency means. IMO, if you have some system and
there's no way for you to be wrong in your usage then it's not worth using.
Please tell me, given your assessment of scientific use of RDF, if there is
any such use that can be wrong or inconsistent? If you can't then I'm
guessing we are going to have a continued 'failure to communicate'.(attempt
at humor, culture specific reference:
http://en.wikipedia.org/wiki/What_we've_got_here_is_(a)_failure_to_communicate
)


> I am not sure why you say that there is only one layer of wrapping
> needed. I can think of many different situations where someone could
> have more than one layer of alternative interpretations that they may
> need to accommodate other scientists now and in the future. The 4 or
> so layers that the provenance ontology has just for published
> documents are worrying enough, and they may not be enough to map the
> complexities of genome reference assemblies, as genomics researchers
> may have a different "publication" workflow to book publishers.
>

Since I am not familiar with the PROV model (I tried to read it through but
got frustrated), please say a little more, and justify why you think these
"layers" need be represented as levels of indirection rather than
assertions on a first or second level such as I have described.


> > 2) I think there's a big difference between what one publishes on the
> web,
> > and what one uses in the privacy of one's home, so to speak. If one is
> > publishing on the web, it is good citizenship to respect specifications,
> and
> > to consider the impact of one's assertions on the broader data consumer
> > community. That consideration, IMO, is justification enough for the 1
> extra
> > indirection necessary to not make statements that are too strong.
>
> The specifications seem to be based on premises that the practicing
> scientists may not ever accept. Ie, the idea that there is static
> scientific "truth" that can be unamgiuously and continuously
> communicated, and not "challengable current theories" that can be
> either alternatively stated, or gradually or suddenly revoked and
> replaced with new best theories. Scientists need to be able to
> interpret, contrast, and concurrently utilise, past information
> directly without having to suddenly wrap up past "truths" inside of
> "claims" because they may be out of date with something someone else
> has now put into the RDF-sphere. The whole idea that statements could
> be "too strong" takes its basis from "static truth" and I cannot
> personally accept that we need to represent everything for life
> sciences inside of "claims" (or alternatively have everyone create new
> URIs for everything they want to talk about) just incase it changes in
> future or someone would find it difficult to deal with the statement
> if their application relies on a different structure for their queries
> to work.
>

No. The specification is based on the premise that if you are going to
share information there have to at least be some rules. The rules were
developed by a skilled working group, the semantics were written by an
expert, and the whole survived what can be a rather brutal W3C approval
process. There is room in those semantics to express what you want. People
seem to be annoyed that it takes an extra link, some extra thinking, to do
that. Tough.

If someone else has a completely different problem domain that would
> find it difficult to deal with direct, "un-framed"/"un-claim-wrapped"
> statements from third-parties using a URI because they clash with some
> of their statements or assumptions, how would the claim wrapping
> practically help them?
>

I think you have this backward. Naive engineers (meaning those that haven't
hung around with the people on this list) will read the spec and have
expectations about how things work, such as that one URI represents one
resource, independent of "context". The idea that its ok to break the rules
because they are inconvenient is the equivalent of thinking it's ok to be a
vandal. It's your responsibility as an educated engineer to understand and
use the spec you are using in the documented way, or to write a different
one. If you want to talk about specific problems you have with indirection,
let's talk about that. But it is clear that the onus is on you to figure
out a way to use the technology as specified, rather than me to solve your
(at the moment vague and unspecified) usage problems.


> Life scientists attempting to use RDF to model their heterogeneous
> information aren't trying to make ambiguous statements or reject the
> wisdom of the logic/maths backgrounds of the specifications authors,
> they are just trying to get work done, and it seems that we are being
> told that we are bad citizens for having a complex, "un-truthy"
> domain.
>

If I see a biologist doing mathematics, I'm going to look at whether they
get it right. If they do representation I'm going to expect they do it
right. I look to see that I they do the biology right too (best I can). The
labs hire professionals to do their mass spec. Should we expect less for
data?

Take care,
Alan


>
> Cheers,
>
> Peter
>

Received on Friday, 22 March 2013 05:08:19 UTC