RE: Simple Genomics Use Case from Cline, Melissa on 2003-11-04 (public-semweb-lifesci@w3.org from November 2003)

From: Cline, Melissa <Melissa_Cline@affymetrix.com>
Date: Tue, 4 Nov 2003 15:57:19 -0800
To: "'Brian Gilman'" <gilmanb@mac.com>, "'Eric Neumann'" <ENeumann@BeyondGenomics.com>
Cc: "'public-semweb-lifesci@w3.org'" <public-semweb-lifesci@w3.org>
Message-ID: <53386E0C47E7D41194BB0002B325C997018127E7@NTEX60>
Brian's use case brings up some interesting points.  To follow up, consider
Issue #1:

1.        Currently, both Affymetrix and IMAGE consortium have a database
for mapping probe set and clone information to some unique gene ID
information. It would be easy if the map information could be implemented in
DC project. But Affymetix may only give DNA sequence information in the
future. 

(okay, this is an issue close to my heart...)

The association between gene and probe set has some ambiguity that's worth
noting here, especially as chip designs become more complex and the one gene
<=> one probe set mapping goes (further) out the window.  Consider the
following cases:

1. The probe set is designed to uniquely interrogate the gene, or more
precisely, some feature within the gene.  The feature is constitutively
expressed.

2. The probe set uniquely interrogates some feature within the gene that is
not constitutively expressed.  So, the gene is expressed if the probe set
hybridizes, but the the converse is not true.

3. The probe set interrogates some feature within the gene, but not
uniquely: it hybridizes to any one of a number of paralogs.  This happens.
We encounter situations where we have a choice between a unique probe set
with marginal performance and a strong probe set that's not unique.  When
possible, we tile both.

4. Part of the probe set uniquely interrogates some feature.  For instance,
a shortened form of some exon is discovered after the chip is designed.
Some of the probes in the probe set interrogate the shortened form of the
exon (plus the longer form), while others interrogate the longer form only.
Here, you'd really like to divide the probe set into two "virtual probe
sets" so that each measures a consistent entity.

All four associations are valid.  What would be useful is to provide all the
associations, but qualify them.  Need I mention how cumbersome that would
get without a semantic representation...?  This also illustrates the value
of a DAS-like system, because as new knowledge becomes available (e.g. case
4), that knowledge could be distributed from one central location - instead
of forcing all potential users of the data to repeat the same analysis.

Melissa        

 

-----Original Message-----
From: Brian Gilman [mailto:gilmanb@mac.com] 
Sent: Tuesday, November 04, 2003 5:01 AM
To: Eric Neumann
Cc: public-semweb-lifesci@w3.org
Subject: Simple Genomics Use Case 



Hello everyone, 


        Sorry this took so long. My use case is fairly simple this comes
from a library of use cases that I have sitting here on my hard drive. 


        A gene Id is required for retrieval of a gene annotation and
ontology information. The unique gene Id is not available directly form chip
design information, where IMAGE clone and probe set information are only
available for each reporter (clone in spotted array chips, probe set in
oligo array chips). This use case describes the process of obtaining a
unique Id such as GenBank Accession Number, LocusLink Id or Gene Symbol for
the clone or probe set.  


Basic Course 

The actor selects the DNA sequence to retrieve a unique gene ID. 


2.         The actor sets the threshold for the sequence similarity. 


3.         The actor chooses which type of gene identifier to retrieve
(i.e., GenBank Accession Number). 


4.         The actor chooses chip identifier type (e.g., clone id or probe
set id). 


5.         The actor submits the gene identifier search request. 


6.         A list of gene identifier within sequence similarity threshold is
presented to the actor and the use case ends. 


  


Post Conditions 


1.         List of genes (gene symbol, GenBank ID, Unigene ID and Locus link
ID) are presented to the actor. 


2.         The options for retrieval of expression data or expression
pattern are also presented. 


  


Exceptions 


1.        The query is submitted to an external data source, and network
traffic results in long delay (see issue 2).   


2.        A unique gene ID is not available for the clone or probe set.  The
actor is notified and prompted to input another identifier 


  


Issues 


1.        Currently, both Affymetrix and IMAGE consortium have a database
for mapping probe set and clone information to some unique gene ID
information. It would be easy if the map information could be implemented in
DC project. But Affymetix may only give DNA sequence information in the
future. 


2.        If the query is submitted during busy time, should the request
"time out", and resubmit at a later time? 


3.        If the request is resubmitted, should it be done automatically by
the system, and then emailed to the actor? 


4. If presented with a list of ID's how do you know who the authority is for
the ID (LSID issue) 

-- 

Brian Gilman 

President Panther Informatics Inc. 

9 Acadia Park #2 

Somerville, MA 02143 

Phone: 617-591-1722 

Cell: 617-335-8276 

AIM: gilmanb1
Received on Tuesday, 4 November 2003 19:02:53 UTC