W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > March 2007

Re: Spatial queries against GENSAT or ABA

From: William Bug <William.Bug@DrexelMed.edu>
Date: Mon, 5 Mar 2007 00:26:46 -0500
Message-Id: <97F50139-D7D7-44DD-85B8-9A0C363FDA2A@DrexelMed.edu>
Cc: Maryann Martone <maryann@ncmir.ucsd.edu>, Alan Ruttenberg <alanruttenberg@gmail.com>, June Kinoshita <junekino@media.mit.edu>, Donald Doherty <donald.doherty@brainstage.com>, Gwen Wong <wonglabow@verizon.net>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>, "Robert W. Williams" <rwilliam@nb.utmem.edu>, zaslavsk@sdsc.edu
To: kc28 <kei.cheung@yale.edu>
Hi Kei,

What a great email.

I think you summarize beautifully how a lot of us feel about the  
pregnant possibilities of bringing these newer, dynamic techniques  
and the emerging mountains of valuable data together in ways that can  
effectively provide new insight.  The geo tagging of epidemiological  
data is a PERFECT example - really a guidepost for what we can  
expect, when the synergy works.

For anatomically based data sets, as Maryann has said, we are right  
no the cusp.

I wouldn't want what I've been passing re: the ABA or GENSAT data to  
give folks the overall view I don't believe these are critical and  
valuable assets that are poised to provide much more insight through  
further mining.

It is just tough to think through how to bring them to effective use  
in this demo.  The temptation is great to try to provide some twist  
that can help support deeper mining based on location in the brain.   
I think we're NOT QUITE there yet - but I could be wrong.

My sense for some time now is ABA has been tremendously successful in  
their effort not only to automate a high-throughput process for  
creating very high quality histology, but also in being able to  
provide the most practical automated, informatic pipeline for  
analyzing this spatially-based data set that the current constraints  
would allow.  In fact, their means of placing the transcript  
quantification within the re-constructed 3D atlas brain region  
volumes (VOIs) is truly impressive(this is what one can view with  
their Brain Explorer tool - and its thoroughly described in their  
informatics method supplement to the Jan 2007 Nature article).

Their systematic coverage of the brain in terms of the staining will  
likely be sufficient for many of the questions we'll want to ask  
(essentially 1 micron resolution within an imaged section and ~200  
microns across sections [56 sections over 11.2 mm coronally AND 20  
sections over 4 mm sagittally]).

I believe the current coordinate registration is just a little bit  
short of providing spatial query capability at the level of  
resolution required to map one gene to highly specific brain  
regions.  This is one of the reasons they provide the higher level  
classification for the finer resolution regions.  Promoting the  
quantification they have done (again with a registration error in the  
100 - 300 micron range) up to these higher level regions at least  
allows us all to have confidence in the numbers that were  
automatically generated.

Manual annotation can provide correction for the higher-rez, fine  
grained mapping of transcript patterns & intensity, but that's a lot  
of work (~60 sections * 4000 [for the coronally sectioned brains] -  
~20 sections * 20,000 for the sagittally section brains]).  That's  
not to say it couldn't be done if every one in the community were  
contributing, but it's a daunting task - and if it is to be mine- 
able, the manual annotations would have be structured to a  
significant extent.

What we can expect in the next year or so, I believe, is a higher  
resolution automatic registration to the atlas that is more precise  
and repeatable.  Then it will become much more plausible to slice-n- 
dice ABA data in search of correlations based on the overlay of  
spatial maps with genetic networks, pathways, and neuronal  
connectivity in ways I truly believe will push our basic  
understanding of the molecular, cellular, and circuitry basis of  
mammalian brain function a leap forward.  Further mining of the ABA  
will be one of the best ways to increase our scope of nuanced,  
integrated understanding across the entire brain in a spatially aware  
manner.

Cheers,
Bill

On Mar 4, 2007, at 10:46 PM, kc28 wrote:

> Hi Bill, Maryann,
>
> Thanks for your great response to the GIS queries of the brain. I  
> absolutely agree it's not a trivial issue to create canonical  
> coordinates that will work for different types/instances of brains,  
> while there is only one earth! Bill, you're also right on the  
> target for my question about correlation between mRNA expression  
> and protein expression.
>
> The life science significance and use case of Google Map and Google  
> Earth has been described in Nature (http://www.nature.com/nature/ 
> journal/v439/n7072/full/439006a.html).  Can we do brain data  
> "mashup" using Google Map (and possibly Google Earth)? We might be  
> a step closer, although as Bill indicated there are still  
> challenges ahead. It sounds like different groups are actively  
> working on different brain visualization tools with GIS querying  
> capabilities. That's exciting! The potential of combining GIS and  
> semantic web can be huge!
>
> Don't forget literature and scientific publishing! Tools like  
> Connotea and Google Earth can interoperate. A bunch of papers  
> describing genetic/epidemiologic research studies based on  
> different human populations can be bookmarked using Connotea and  
> tagged with geographical coordinates (geo tag). Google Earth can  
> then recognize such geo-tags and display the geographical  
> distribution of such populations with different satellite images  
> superimposed.
>
> I need to dream more about the data/tool mashup.
>
> Good night,
>
> -Kei
>
> William Bug wrote:
>
>> Given the UNIQUE work BIRN, your lab, and Ilya have done in  
>> applying GIS techniques to this problem of creating a SPATIAL- 
>> QUERY capable brain atlasing system (the SMART Atlas), it would be  
>> wonderful if Ilya could vet the scenarios as I outline them  
>> below.  This is my best understanding of what is required, but it  
>> may be very 3D-biased because of the work we do at Drexel.  To my  
>> mind, the problem is the same, only the strategies for solving it  
>> are a little be different - in some ways more tractable in 2D -  
>> though the answers may come with more constraints.
>>
>> I did pass some emails around to the SMART Atlas folks early last  
>> week in order to get their feedback on Alan's work on the Google  
>> Maps Javascript API and backend PERL code to support caching  
>> images.  The Google Maps API is one that has come up endless in  
>> these atlasing discussions, and it's nice to see just how it can  
>> be made useful - what it can and cannot do in this application space.
>>
>> As Maryann states - and I've stated several times - there is  
>> ONGOING work on several projects seeking to provide this  
>> functionality applied to the ABA and GENSAT gene expression image  
>> data repositories.  None of it - that I'm aware of - would be  
>> ready for use by the first week in May - or really at least a  
>> month before - to test.  I do think there are other low-hanging  
>> fruit, tractable opportunities in the time frame of for the HCLS  
>> IG demo for which SemWebTech is specifically suited, and Alan is  
>> converging on several of them.
>>
>> Cheers,
>> Bill
>>
>>
>> On Mar 4, 2007, at 1:44 PM, Maryann Martone wrote:
>>
>>> This is exactly what BIRN has been working on through the Smart  
>>> Atlas project and now MBAT.  The inverse query is also true:   
>>> What genes are expressed here?  As Bill indicated, there are  
>>> several spatially normalized atlas projects (ABA, GEnepaint) that  
>>> can do that.  We've been working on spatial normalization of some  
>>> of the GEnsat images, although we haven't gotten very far.  More  
>>> importantly, BIRN has been working on exchange of coordinate  
>>> systems so that different atlases can talk to each other.
>>>
>>> I think that's why Bill has been trying to get everyone together  
>>> on this. I've added Ilya Zaslavsky, our GIS expert, to this list.
>>>
>>> Maryann Martone, Ph. D.
>>> Professor-in-Residence
>>> Dept of Neuroscience
>>> University of California, San Diego
>>> San Diego CA  92093-0446
>>> 858 822 0745 (T)
>>> 858 822 0828 (F)
>>>
>>>
>>> On Sat, 3 Mar 2007, William Bug wrote:
>>>
>>>> Hi Kei,
>>>>
>>>> You are right on target re: use of a coordinate-based, spatial  
>>>> query system to resolve the relatively simple query: "In which  
>>>> brain regions is GENE X expressed?"
>>>>
>>>> This is the whole goal of several major neuroinformatics  
>>>> projects currently underway which are designed to use either 2D  
>>>> or 3D digital brain atlases to make such a query possible.   
>>>> Several of those efforts are associated with the BIRN project.   
>>>> In fact several such projects working on inbred mouse strain  
>>>> atlases have been striving to function synergistically within a  
>>>> single system (the Mouse BIRN Atlasing Tool or MBAT)  
>>>> specifically to support such a query. ABA is not currently  
>>>> available to query within MBAT, because it's not registered to  
>>>> the primary atlas being used in MBAT right now.  This work may  
>>>> eventually get done, but it won't be ready for the demo.
>>>>
>>>> The absolute pre-requisites for resolving such a query are:
>>>> 1) you must have a set of canonical brain images (2D) or a true  
>>>> voxel based canonical brain (3D) - "ATLASES" - that include  
>>>> expert-assisted brain region segmentation.
>>>> 2) these canonical pixel-based brain images (2D) or voxel based  
>>>> images (3D) must be situated within a defined coordinate space.
>>>> 3) the segmented brain regions must be deterministically placed  
>>>> within the same coordinate space.
>>>> 4) the images containing the gene expression patterns must be  
>>>> segmented (manually, semi-automatically, or automatically) to  
>>>> provide defined geometries for the expression patterns.
>>>> 5) the images containing the gene expression patterns must be  
>>>> registered to the canonical atlas data and coordinate space  
>>>> (whether 2D or 3D).
>>>>
>>>> With these conditions met, you could then present a user with a  
>>>> nice 3D visualization of the atlas (or even just the list of  
>>>> brain region IDs or preferred labels) and/or a list of gene  
>>>> names/IDs and let them ask both of the following questions:
>>>> a) In which brain regions is GENE X expressed?
>>>> b) Which genes does BRAIN REGION X contain defined expression  
>>>> values beyond some baseline?
>>>>
>>>> Right now, GENSAT is not registered to an atlas, so there is no  
>>>> coordinate frame to support resolving such as query.  They have  
>>>> manually curated many of the gene-specific images with both  
>>>> brain regions and cell types, so you can pose that query and get  
>>>> an answer based on the curation they have had the resources to  
>>>> do so far, but there is no way to place it in a GIS context (2D  
>>>> or 3D), since none of their info is YET linked to a canonical  
>>>> coordinate space (several projects are working on this very issue).
>>>>
>>>> ABA has aligned to a 2D mouse brain atlas (F&P C57Bl/6 adult  
>>>> brain atlas). In doing so, the 2D brain region segmentations on  
>>>> each of the images in the F&P mouse atlas can be super-imposed  
>>>> on the registered images from any of the 20,000+ brains.  The  
>>>> problem is the current registration has a moderate error  
>>>> associated with it, so that answering that query  
>>>> programmatically is problematic and often not very informative.   
>>>> The following can be done:
>>>> - along the coronal sectioning axis, give me the plate numbers  
>>>> for all the images in the atlas that contain a slice through the  
>>>> STRIATUM
>>>> - for ABA brain stained for GENE X, give me all the sections  
>>>> that have been roughly aligned to that set of F&P atlas images.
>>>>
>>>> From there the alignment is so coarse at this point, you could  
>>>> only use the atlas plates and location of the STRIATUM to help  
>>>> guide a qualitative assessment of whether there appears to be  
>>>> any staining in the STRIATUM.
>>>>
>>>> In fact, via this route, many contributers to GeneNetwork.org  
>>>> have actually linked the probe sets in their microarray QTL  
>>>> database to staining patterns in ABA.  In other words, if  
>>>> through there system, you uncover via QTL a locus or collection  
>>>> of SNPs associated with altered expression of a given gene - say  
>>>> Dopamine Receptor, type D2 (DRD2) - you might find someone has  
>>>> added an ABA or GENSAT annotation for DRD2 using the  
>>>> GeneNetwork.org GeneWiki.
>>>> 1) Go to www.genenetwork.org <http://www.genenetwork.org>
>>>> http://www.genenetwork.org/search3.html
>>>> 2) Enter 'DRD2' in the 'ANY' box searching against the default  
>>>> settings for other fields - & hit 'Search'
>>>> 3) Click on the single result entry
>>>> 4) In the record for DRD2, click on the GeneWiki button near the  
>>>> top of the page
>>>> 5) This will bring up a listing of all the annotations in  
>>>> GeneNetwork for DRD2 including qualitative annotations that  
>>>> someone did for the ABA DRD2 brain.
>>>>
>>>> If you want to see ALL of the genes for which ABA or GENSAT  
>>>> GeneWiki entries exist, just go back to step '1', enter wiki=ABA  
>>>> or wiki=GENSAT respectively in one of the 'ANY' boxes, and hit  
>>>> 'Search'.  Then pick up at step '3' above.
>>>>
>>>> Were we able to SCRAPE this, then you would have annotation for  
>>>> ABA that is roughly equivalent to that which exists for GENSAT -  
>>>> ONLY - it probably is doesn't cover the ABA very thoroughly  
>>>> (using the generic 'wiki=aba' brings up 948 probe sets - or ~5%  
>>>> of ABA - pretty remarkable, actually, given its a manual  
>>>> effort), and these GeneWiki annotations are mostly in free-text  
>>>> right now and are not done to a controlled vocabulary or  
>>>> classification scheme. :-(
>>>>
>>>> When the registration to the atlas improves to say the 50 - 100  
>>>> micron range, then the flood-gates will open, and all 20,000  
>>>> brains in ABA each staining for a particular gene will be able  
>>>> to automatically provide relatively solid answers to  these  
>>>> straight-forward questions related to where in the brain is Gene  
>>>> X expressed - and which genes does Brain Region Y show marked  
>>>> expression of.  Even here, however, there will be continued room  
>>>> for nuance in defining the ABA staining patterns - AND - there  
>>>> will be a need to eventually to add the time dimension to all  
>>>> these queries (e.g., "When is Gene X expressed in Brain Region  
>>>> Y?").
>>>>
>>>> Because the ABA has created multi-resolution versions of their  
>>>> brain images (both the Nissl stains for cell bodies and the  
>>>> pseudo-colored ISH images for a given gene), it is possible to  
>>>> use the very nice Google Maps API GUI Alan created to select a  
>>>> given 1 of the 20,000 ABA brains and simply Zoom & Pan on the  
>>>> actual pixel image data.  However, there is no straight-forward  
>>>> way to use it to pose and answer SPATIAL queries.
>>>>
>>>> What MIGHT be possible - based on the alignment they have done  
>>>> and the information provided in that brain region ontology Excel  
>>>> file Alan has - is to say, for the 'DRD2' brain, filter the  
>>>> sagittal image series to create a subset including only those  
>>>> images aligned to an F&P atlas images which contains a section  
>>>> through Brain Region X (say 'STRIATUM').  This way, if through  
>>>> some SPARQL query you pulled up a relation between DRD2 and  
>>>> STRIATUM, you'd be able to present a user with a very nice, low- 
>>>> tech interface to quickly pan&zoom on the median section of that  
>>>> 'STRIATUM'-filtered series to look at the staining pattern.  You  
>>>> could add a navigation control to go back-n-forth through the  
>>>> series for the DRD2 brain, so they could get a pretty good sense  
>>>> in 3D where DRD2 expression is in the striatum.  You might also  
>>>> go to BAMS or CoCoMac (BAMS is better in this instance since  
>>>> it's rodent focused - whereas CoCoMac is primate focused) to  
>>>> automatically determine what regions connect to (is_afferent_to)  
>>>> and what regions are connected to (is_efferent_to) the  
>>>> STRIATUM.  You could then bring up another HTML frame that gives  
>>>> you a view of the DRD2 subset series for those brain regions, too.
>>>>
>>>> THAT WOULD ACTUALLY BE A VERY NICE INTERFACE - and is probably  
>>>> quite tractable for the demo - if this sounds like a useful  
>>>> feature to provide.
>>>>
>>>> Running atlas-based SPATIAL queries against GENSAT and ABA is a  
>>>> very much sought after goal both for the curators of those  
>>>> repositories and for the neuroscience community at large, but we  
>>>> are not there yet.
>>>>
>>>>
>>>>
>>>> I'm not certain I understand what you are asking re: highly  
>>>> expressed genes that correlate with high levels of ADDL or  
>>>> Abeta.  I could see how you might be able to use GENSAT (which  
>>>> has a 'staining intensity' annotation field) to ask whether  
>>>> genes associated with high levels of specific ADDL species or  
>>>> with plaque deposition are expressed at high levels in the  
>>>> GENSAT data set - and if so - where are they expressed in the  
>>>> brain - and at what developmental time.  Given the sparse nature  
>>>> of the GENSAT data set, this would not be a comprehensive answer  
>>>> to the question, but it could prove very interesting. I'm  
>>>> certain June, Gwen, or Elisabeth could help us identify genes  
>>>> whose expression correlates with high levels of ADDL species  
>>>> (most interesting question given current AD research) or with  
>>>> other APP related macromolecules or plaques.  I'm not certain  
>>>> how you'd ask the same question of ABA, given there are not  
>>>> systematic annotations on staining intensity or pattern - though  
>>>> some of this has been done (see below).
>>>>
>>>> Cheers,
>>>> Bill
>>>>
>>>> On Mar 3, 2007, at 8:01 PM, kc28 wrote:
>>>>
>>>>> Alan et al.,
>>>>> In addition to mapping to brain regions, what seems to be also  
>>>>> missing is some kind of brain coordinates. I thought one major  
>>>>> advanatage of using Google Map is the ability to issue GIS-like  
>>>>> queries. With this type queries, one can potentially query  
>>>>> something like finding expressed genes for a given brain region  
>>>>> and its neighbouring/adjacent regions.
>>>>> While we are talking about gene expression, what seems to be  
>>>>> also logical to consider is whether some highly expressed genes  
>>>>> correlate with high abundance of pathological proteins (e.g.,  
>>>>> amyloid beta). Any take from neuroscientists?
>>>>> -Kei
>>>>> Alan Ruttenberg wrote:
>>>>>
>>>>>> On Mar 2, 2007, at 1:56 PM, Kei Cheung wrote:
>>>>>>
>>>>>>> By reading the AD/PD use case, one of the questions has to do  
>>>>>>> with  what genes are expressed in what regions of the brain  
>>>>>>> (if such gene expressions are localized to certain brain  
>>>>>>> regions). I wonder what Alan's currently working on can help  
>>>>>>> address this type of question  (even though the kind of gene  
>>>>>>> expression data is for the mouse --  perhaps we can find  
>>>>>>> homologous genes for human). Also, I'd  encourage people to  
>>>>>>> take look at what Bill Bug's Wiki page:
>>>>>>
>>>>>> What I can do is add an orthology mapping. Probably from  
>>>>>> orthogene.
>>>>>> I can also scrape the Allen site for the following query they  
>>>>>> provide
>>>>>> Brain Region(see list below), Expression-level(low/ 
>>>>>> high),Expression- density(low/high), expression pattern 
>>>>>> (clustered/not clustered). =>  gene set
>>>>>> So this would be 16x2x2x2 = 128 different gene sets.
>>>>>> There is also their "Fine structure search" :
>>>>>> Fine structure annotation lists are genes that have high  
>>>>>> specificity expression in particular brain regions or nuclei.
>>>>>> They provide these gene lists for a set of structures listed  
>>>>>> below  (fine structures).
>>>>>> This can lead us to a particular image, though I don't have a  
>>>>>> way yet  to identify which portion of the image corresponds to  
>>>>>> a particular  region or structure.
>>>>>
>>>>
>>>> Bill Bug
>>>> Senior Research Analyst/Ontological Engineer
>>>>
>>>> Laboratory for Bioimaging  & Anatomical Informatics
>>>> www.neuroterrain.org
>>>> Department of Neurobiology & Anatomy
>>>> Drexel University College of Medicine
>>>> 2900 Queen Lane
>>>> Philadelphia, PA    19129
>>>> 215 991 8430 (ph)
>>>> 610 457 0443 (mobile)
>>>> 215 843 9367 (fax)
>>>>
>>>>
>>>> Please Note: I now have a new email - William.Bug@DrexelMed.edu  
>>>> <mailto:William.Bug@DrexelMed.edu>
>>>>
>>>>
>>>>
>>>>
>>
>> Bill Bug
>> Senior Research Analyst/Ontological Engineer
>>
>> Laboratory for Bioimaging  & Anatomical Informatics
>> www.neuroterrain.org
>> Department of Neurobiology & Anatomy
>> Drexel University College of Medicine
>> 2900 Queen Lane
>> Philadelphia, PA    19129
>> 215 991 8430 (ph)
>> 610 457 0443 (mobile)
>> 215 843 9367 (fax)
>>
>>
>> Please Note: I now have a new email - William.Bug@DrexelMed.edu  
>> <mailto:William.Bug@DrexelMed.edu>
>>
>>
>>
>>
>

Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu
Received on Monday, 5 March 2007 05:27:57 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:30 UTC