Re: Observations about facts in genomics from Alan Ruttenberg on 2013-03-22 (public-semweb-lifesci@w3.org from March 2013)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Thu, 21 Mar 2013 19:05:52 -0700
To: Jeremy J Carroll <jjc@syapse.com>
Cc: Jerven Bolleman <me@jerven.eu>, Graham Klyne <graham.klyne@zoo.ox.ac.uk>, w3c semweb HCLS <public-semweb-lifesci@w3.org>, Pat Hayes <phayes@ihmc.us>
Message-ID: <CAFKQJ8msd7oVWihMKOLVZMj7FJHGHUDKQGLvn+EG=J+qh0Tgnw@mail.gmail.com>
On Wed, Mar 20, 2013 at 3:15 PM, Jeremy J Carroll <jjc@syapse.com> wrote:

> To me, that seems to lead us back to the earlier discussion (rathole?)
> about owl:sameAs
>

Don't think so. It is a simple application of the pattern of having
information about something. The statements don't have to be true.

>
> Yes, we can model what we are doing at an arbitrary level of
> sophistication, but no, we may not want to.
>
You need one level of indirection. How much information you supply in
support of the statement being true is up to you.

>
> I tend to a view that there are diminishing returns in terms of levels of
> indirection here!
>
As the number of levels of indirection increases, perhaps. But here we are
talking about 1 level - separating claims from truth.

>
> OTOH the motivation of avoiding exotic machinery such as frames is
> well-placed.
>

The exotic machinery is temporal reasoning on frames, something that you
haven't called for here. The frame problem has to do with representing
"situations" - states of the world - as different instances, and then
inferring which things that were true in the situation before this one are
still true in the current situation.

Frames themselves are dead simple data structures that map directly onto
RDF. You use them, implicitly, ever time you have a bunch of predicates on
the same subject.


> And then it depends what is the intended use of the system … if the
> variant calls are expected to go unchallenged almost always, then reporting
> them as 'facts' and being done with it, with some caveat emptor overall,
> may be the best way of presenting the issue - rather than being
> in-your-face about it at every twist and turn
>

Thats all fine and good, but the semantics of RDF is what it is and it
seems fairly pointless to argue it should be something else. Specs are only
of utility when people follow the rules, and as far as rules go, RDF has
hardly any. Echoing Pat, below, if you need something better then you need
another spec.

A couple of thought, perhaps useful, perhaps not

1) Slide 3 in
http://www.stateofthesalmon.org/pdfs/SalDAWG%20PDFs/2009/Ruttenberg-realistapproach.pdf
Roughly, this is on the subject of ontological commitment. Each of the
three approaches is straightforward to express in RDF. Doing interesting
reasoning with any of them, or between them, is another matter entirely,
but then nobody expects RDF to do much in the way of interesting inference.

2) I think there's a big difference between what one publishes on the web,
and what one uses in the privacy of one's home, so to speak. If one is
publishing on the web, it is good citizenship to respect specifications,
and to consider the impact of one's assertions on the broader data consumer
community. That consideration, IMO, is justification enough for the 1 extra
indirection necessary to not make statements that are too strong.

If one is using RDF in a private, tactical, situation, then use it however
you wish. The considerations of interoperability, predictability, defined
behavior, truth are subject to one's private sense of utility, and there is
no issue around the impact on others. I realize that the politics of our
society still think their is justification to look behind the curtain at
private behaviors, but we have no need for such in the RDF / OWL world.

Regards,
Alan

ps. +1 Jevren


> Jeremy
>
>
> On Mar 20, 2013, at 3:09 PM, Jerven Bolleman <me@jerven.eu> wrote:
>
> > Hi All,
> >
> > This is fine in RDF, the important thing to separate is the concept of
> > a Chromsome/Patient sequence and a set of observations and hypothesis
> > about that Chromosome sequence.
> >
> > So instead of chromosome M you are really talking about assembly X of
> > a set of reads R mapped via some (variant calling) processes to
> > reference chromosome C that is also really an assembly of a different
> > set of reads. Subtly different and not always made explicit in
> > conversation, but for good RDF you representations you should.
> >
> > In RDF here you need to be careful about what you are identifying. As
> > long as you are correct in what you identified (in this case an
> > variant called, mapped assembly) instead of what you are discussing in
> > english (a patients chromosome)  you will end up fine. If you do this
> > you don't need anything as exotic as frames etc...
> >
> > Regards,
> > Jerven
> >
> > On Wed, Mar 20, 2013 at 9:23 PM, Graham Klyne <graham.klyne@zoo.ox.ac.uk>
> wrote:
> >> Hi Jeremy,
> >>
> >>
> >> On 20/03/2013 16:04, Jeremy J Carroll wrote:
> >>> One of the things I am learning about genetic sequencing is this
> process,
> >>> which is meant to tell you about the patient's DNA, is in fact somewhat
> >>> problematic, resulting in facts which are disputable.
> >>>
> >>
> >> It gets worse... the association between sequence fragments and genes
> >> changes over time as knowledge is improved, I understand in ways that
> isn't
> >> always reflected in published information.  GMOD/CHADO
> >> (http://gmod.org/wiki/Introduction_to_Chado) keeps all the concepts
> very
> >> separate to allow for this, but the translation to RDF can get very
> >> convoluted (Al Miles did some work on a mapping, a few years ago).
> >>
> >> I also understand that there's emerging research that shows that
> non-coding
> >> regions, which were previously thought to be meaningless/irrelevant, do
> >> actually have relevant roles in the overall genetic machinery
> (something to
> >> do with regulation?).
> >>
> >> One of the many reasons I'd like RDF to have some flexibility to deal
> with
> >> contexts, or differing worldviews, is to allow representation of
> evolving
> >> information without having to make explicit all those things that
> >> researchers sometimes don't bother to make explicit (e.g. genes vs
> proteins,
> >> sequence vs gene, etc.).  And then there all the stuff we don't yet
> know to
> >> make explicit. ("frame problem", anyone?)
> >>
> >> #g
> >> --
> >>
> >>
> >>
> >> On 20/03/2013 16:04, Jeremy J Carroll wrote:
> >>>
> >>> Pat Hayes wrote:
> >>>
> >>> "[RDF] is intended for recording data, and most data is pretty mundane
> >>> stuff about which there is not a lot of factual disagreement."
> >>>
> >>> One of the things I am learning about genetic sequencing is this
> process,
> >>> which is meant to tell you about the patient's DNA, is in fact somewhat
> >>> problematic, resulting in facts which are disputable.
> >>>
> >>> So, a data file that I am trying to get my head around at the moment
> >>> contains a line like:
> >>>
> >>> chrM    942     rs28579222      A       G       .       .
> >>>
> ASP;HD;OTHERKG;RSPOS=942;SAO=0;SF=0;SSR=0;VC=SNV;VP=050000000005000402000100;WGT=1;dbSNPBuildID=125
> >>>
> >>>
> >>> So far, I have understood the first five fields, as saying that in a
> >>> particular position in the DNA (the 942nd base in the mitochondrial
> DNA, aka
> >>> rs28579222), when one might have expected to see an A a sample had a G.
> >>> But that last part "a sample had a G" is in fact open to doubt … There
> is
> >>> a complex piece of chemistry, physics and computing that guesses that
> there
> >>> is a G in that position. It is possible to see some of the less
> processed
> >>> data that fed into that guess, and to see levels of confidence that the
> >>> different algorithms had with the results; but it is not a slam dunk
> by any
> >>> means. So, some more skeptical people want to be able to see the 'raw
> read
> >>> data' prior to the decision that this is a G. Usually one would expect
> to
> >>> see some of the raw read data agree with the G, and some disagree.
> >>>
> >>>
> >>> Since this assertion (that this position is a G) is made with a few
> >>> million similar assertions, all of which have some element of doubt -
> it
> >>> would be highly surprising if every single call were correct: yet
> within the
> >>> logic of RDF we probably end up asserting the truth of the whole graph
> …
> >>> which leads us onto the dangerous path of ex contradictione quadlibet
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
> > --
> > Jerven Bolleman
> > me@jerven.eu
>
>
>
Received on Friday, 22 March 2013 02:06:56 UTC