- From: Eric Neumann <ENeumann@BeyondGenomics.com>
- Date: Mon, 17 Nov 2003 10:07:42 -0500
- To: "Cline, Melissa" <Melissa_Cline@affymetrix.com>
- Cc: <public-semweb-lifesci@w3.org>
- Message-ID: <FC5C355B8AE9F2499A5220CCEDC34756CF8735@bgmail.lifescience.com>
Melissa, The message you sent could be viewed as a proof-of-concept proposal. As most of the data for microarray features is already there/accessible, it seems that if we could define a minimal annotation semantic set (MASS) for microarrays, we could construct a prototype RDF server model. If you think this is possible, perhaps we should try an develop a MASS as a first step. Does this seem reasonable? Eric -----Original Message----- From: Cline, Melissa [mailto:Melissa_Cline@affymetrix.com] Sent: Tuesday, November 04, 2003 6:57 PM To: Brian Gilman; Eric Neumann Cc: 'public-semweb-lifesci@w3.org' Subject: RE: Simple Genomics Use Case Brian's use case brings up some interesting points. To follow up, consider Issue #1: 1. Currently, both Affymetrix and IMAGE consortium have a database for mapping probe set and clone information to some unique gene ID information. It would be easy if the map information could be implemented in DC project. But Affymetix may only give DNA sequence information in the future. (okay, this is an issue close to my heart...) The association between gene and probe set has some ambiguity that's worth noting here, especially as chip designs become more complex and the one gene <=> one probe set mapping goes (further) out the window. Consider the following cases: 1. The probe set is designed to uniquely interrogate the gene, or more precisely, some feature within the gene. The feature is constitutively expressed. 2. The probe set uniquely interrogates some feature within the gene that is not constitutively expressed. So, the gene is expressed if the probe set hybridizes, but the the converse is not true. 3. The probe set interrogates some feature within the gene, but not uniquely: it hybridizes to any one of a number of paralogs. This happens. We encounter situations where we have a choice between a unique probe set with marginal performance and a strong probe set that's not unique. When possible, we tile both. 4. Part of the probe set uniquely interrogates some feature. For instance, a shortened form of some exon is discovered after the chip is designed. Some of the probes in the probe set interrogate the shortened form of the exon (plus the longer form), while others interrogate the longer form only. Here, you'd really like to divide the probe set into two "virtual probe sets" so that each measures a consistent entity. All four associations are valid. What would be useful is to provide all the associations, but qualify them. Need I mention how cumbersome that would get without a semantic representation...? This also illustrates the value of a DAS-like system, because as new knowledge becomes available (e.g. case 4), that knowledge could be distributed from one central location - instead of forcing all potential users of the data to repeat the same analysis. Melissa -----Original Message----- From: Brian Gilman [mailto:gilmanb@mac.com] Sent: Tuesday, November 04, 2003 5:01 AM To: Eric Neumann Cc: public-semweb-lifesci@w3.org Subject: Simple Genomics Use Case Hello everyone, Sorry this took so long. My use case is fairly simple this comes from a library of use cases that I have sitting here on my hard drive. A gene Id is required for retrieval of a gene annotation and ontology information. The unique gene Id is not available directly form chip design information, where IMAGE clone and probe set information are only available for each reporter (clone in spotted array chips, probe set in oligo array chips). This use case describes the process of obtaining a unique Id such as GenBank Accession Number, LocusLink Id or Gene Symbol for the clone or probe set. Basic Course The actor selects the DNA sequence to retrieve a unique gene ID. 2. The actor sets the threshold for the sequence similarity. 3. The actor chooses which type of gene identifier to retrieve (i.e., GenBank Accession Number). 4. The actor chooses chip identifier type (e.g., clone id or probe set id). 5. The actor submits the gene identifier search request. 6. A list of gene identifier within sequence similarity threshold is presented to the actor and the use case ends. Post Conditions 1. List of genes (gene symbol, GenBank ID, Unigene ID and Locus link ID) are presented to the actor. 2. The options for retrieval of expression data or expression pattern are also presented. Exceptions 1. The query is submitted to an external data source, and network traffic results in long delay (see issue 2). 2. A unique gene ID is not available for the clone or probe set. The actor is notified and prompted to input another identifier Issues 1. Currently, both Affymetrix and IMAGE consortium have a database for mapping probe set and clone information to some unique gene ID information. It would be easy if the map information could be implemented in DC project. But Affymetix may only give DNA sequence information in the future. 2. If the query is submitted during busy time, should the request "time out", and resubmit at a later time? 3. If the request is resubmitted, should it be done automatically by the system, and then emailed to the actor? 4. If presented with a list of ID's how do you know who the authority is for the ID (LSID issue) -- Brian Gilman President Panther Informatics Inc. 9 Acadia Park #2 Somerville, MA 02143 Phone: 617-591-1722 Cell: 617-335-8276 AIM: gilmanb1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Eric K. Neumann PhD VP Strategic Informatics, Head of Knowledge Research Beyond Genomics Drug Discovery through Systems Biology 40 Bear Hill Road Waltham, MA tel: 781-434-0222 fax: 781-895-1119 www.beyondgenomics.com
Received on Monday, 17 November 2003 10:08:38 UTC