RE: Simple Genomics Use Case

Melissa,
 
The message  you sent could be viewed as a proof-of-concept proposal. As
most of the data for microarray features is already there/accessible, it
seems that if we could define a minimal annotation semantic set (MASS)
for microarrays, we could construct a prototype RDF server model. If you
think this is possible, perhaps we should try an develop a MASS as a
first step. Does this seem reasonable?
 
Eric
 
 


	-----Original Message-----
	From: Cline, Melissa [mailto:Melissa_Cline@affymetrix.com] 
	Sent: Tuesday, November 04, 2003 6:57 PM
	To: Brian Gilman; Eric Neumann
	Cc: 'public-semweb-lifesci@w3.org'
	Subject: RE: Simple Genomics Use Case 
	
	
	Brian's use case brings up some interesting points.  To follow
up, consider Issue #1:

	1.        Currently, both Affymetrix and IMAGE consortium have a
database for mapping probe set and clone information to some unique gene
ID information. It would be easy if the map information could be
implemented in DC project. But Affymetix may only give DNA sequence
information in the future. 

	(okay, this is an issue close to my heart...)

	The association between gene and probe set has some ambiguity
that's worth noting here, especially as chip designs become more complex
and the one gene <=> one probe set mapping goes (further) out the
window.  Consider the following cases:

	1. The probe set is designed to uniquely interrogate the gene,
or more precisely, some feature within the gene.  The feature is
constitutively expressed.

	2. The probe set uniquely interrogates some feature within the
gene that is not constitutively expressed.  So, the gene is expressed if
the probe set hybridizes, but the the converse is not true.

	3. The probe set interrogates some feature within the gene, but
not uniquely: it hybridizes to any one of a number of paralogs.  This
happens.  We encounter situations where we have a choice between a
unique probe set with marginal performance and a strong probe set that's
not unique.  When possible, we tile both.

	4. Part of the probe set uniquely interrogates some feature.
For instance, a shortened form of some exon is discovered after the chip
is designed.  Some of the probes in the probe set interrogate the
shortened form of the exon (plus the longer form), while others
interrogate the longer form only.  Here, you'd really like to divide the
probe set into two "virtual probe sets" so that each measures a
consistent entity.

	All four associations are valid.  What would be useful is to
provide all the associations, but qualify them.  Need I mention how
cumbersome that would get without a semantic representation...?  This
also illustrates the value of a DAS-like system, because as new
knowledge becomes available (e.g. case 4), that knowledge could be
distributed from one central location - instead of forcing all potential
users of the data to repeat the same analysis.

	Melissa        

	 

		-----Original Message-----
		From: Brian Gilman [mailto:gilmanb@mac.com] 
		Sent: Tuesday, November 04, 2003 5:01 AM
		To: Eric Neumann
		Cc: public-semweb-lifesci@w3.org
		Subject: Simple Genomics Use Case 
		
		

		Hello everyone, 


		        Sorry this took so long. My use case is fairly
simple this comes from a library of use cases that I have sitting here
on my hard drive. 


		        A gene Id is required for retrieval of a gene
annotation and ontology information. The unique gene Id is not available
directly form chip design information, where IMAGE clone and probe set
information are only available for each reporter (clone in spotted array
chips, probe set in oligo array chips). This use case describes the
process of obtaining a unique Id such as GenBank Accession Number,
LocusLink Id or Gene Symbol for the clone or probe set.  


		Basic Course 

		The actor selects the DNA sequence to retrieve a unique
gene ID. 


		2.         The actor sets the threshold for the sequence
similarity. 


		3.         The actor chooses which type of gene
identifier to retrieve (i.e., GenBank Accession Number). 


		4.         The actor chooses chip identifier type (e.g.,
clone id or probe set id). 


		5.         The actor submits the gene identifier search
request. 


		6.         A list of gene identifier within sequence
similarity threshold is presented to the actor and the use case ends. 


		  


		Post Conditions 


		1.         List of genes (gene symbol, GenBank ID,
Unigene ID and Locus link ID) are presented to the actor. 


		2.         The options for retrieval of expression data
or expression pattern are also presented. 


		  


		Exceptions 


		1.        The query is submitted to an external data
source, and network traffic results in long delay (see issue 2).   


		2.        A unique gene ID is not available for the
clone or probe set.  The actor is notified and prompted to input another
identifier 


		  


		Issues 


		1.        Currently, both Affymetrix and IMAGE
consortium have a database for mapping probe set and clone information
to some unique gene ID information. It would be easy if the map
information could be implemented in DC project. But Affymetix may only
give DNA sequence information in the future. 


		2.        If the query is submitted during busy time,
should the request "time out", and resubmit at a later time? 


		3.        If the request is resubmitted, should it be
done automatically by the system, and then emailed to the actor? 


		4. If presented with a list of ID's how do you know who
the authority is for the ID (LSID issue) 

		-- 

		Brian Gilman 

		President Panther Informatics Inc. 

		9 Acadia Park #2 

		Somerville, MA 02143 

		Phone: 617-591-1722 

		Cell: 617-335-8276 

		AIM: gilmanb1  

		 

		
		 

		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
		    Eric K. Neumann PhD 
		    VP Strategic Informatics, 
		    Head of Knowledge Research 

		   Beyond Genomics 
		    Drug Discovery through Systems Biology 

		    40 Bear Hill Road 
		    Waltham, MA 
		     tel: 781-434-0222 
		     fax: 781-895-1119 
		     www.beyondgenomics.com  

Received on Monday, 17 November 2003 10:08:38 UTC