Re: Size estimates of current LS space from Carole Goble on 2006-07-31 (public-semweb-lifesci@w3.org from July 2006)

From: Carole Goble <carole@cs.man.ac.uk>
Date: Mon, 31 Jul 2006 15:35:13 +0100
To: Eric Neumann <eneumann@teranode.com>
CC: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-ID: <44CE1521.603@cs.man.ac.uk>

Eric

strewth.
hmm....not sure, and on the grounds you should ask people who know if 
you don't know, I just emailed Rolf Apweiler, Ewan Birney and Arek 
Kasprzyk to see if they would hazard a guess for at least the public 
databases.

I'll also ask our local proteomics and microarray people about their 
local scale-ups.

on another note
We use LSIDs in Taverna and one workflow - say for a gene alert and 
annotation protocol - will produce in the order of thousands of data 
objects, all with an LSID :-)  And we run the workflows over and over 
and over again. If you labelled every microarray probe with an LSID your 
gonna get 50K+ in one array......

Carole

>
>
> As per today's Telcon, does any person with genomics knowledge (that 
> includes you too Carole) have estimates for the following numbers:
>
> 1. How many bio-molecular and organism-anatomical-functional entities 
> and records (broad sense) are currently accessible through the web 
> (excluding LIMS entities, such as samples, for now)?
>
> 2. Does this number grow substantially when it is allowed to include 
> every variant of protein, gene, etc. per species (i.e., not instances 
> of real molecules or organisms)?
>
>
> I think these would be quite useful for other W3C members to be aware 
> of, since some proposed mechanisms would require their global indexing...
>
> Eric
>
>

Received on Monday, 31 July 2006 14:35:26 UTC