W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > October 2007

Re: Using SEER Data

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Fri, 5 Oct 2007 08:36:40 -0400
Message-Id: <51CFB9F4-F461-4CE3-8F00-38593CE6CAB9@gmail.com>
Cc: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
To: Matt Williams <matthew.williams@cancer.org.uk>

I was thinking about the following scenario

Local triple store with SEER data <l>
Demo store <nc>

FROM <l>
FROM <nc>
WHERE {... }

(which might be a nice use case for Eric's federation stuff)

Or integrating it into a local installation of the demo.
But I agree it is suboptimal.

Speaking of statistical analysis, we need an R SPARQL interface.  
Anyone up for writing one?
There are a few SQL packages at http://lib.stat.cmu.edu/R/CRAN/src/ 
SPARQL should be easier because it can be built off of


On Oct 5, 2007, at 8:14 AM, Matt Williams wrote:

> Being able to do it, and do something useful with it would be good,  
> and might act as a good demonstrator. Again, I think the crucial  
> question is what it is *linking to* that gives it the added value:  
> I doubt that anyone would choose to do simple statistical analysis  
> on the data set in rdf (although I would be glad to be shot down).  
> Therefore if someone knows something we could link it with, I'd be  
> interested. I have done something similar (in non-rdf) linking SEER  
> with genomic data, but it's not big enough to make use of this.
> It may also be that clear demonstration of the utility of this  
> might encourage them to relax the licensing restrictions.
> I have an idea for a different data set which I will send as a  
> separate email.
> Matt
> Alan Ruttenberg wrote:
>> [cc changed to public-semweb-lifesci]
>> We could distribute a script that does the conversion to RDF so  
>> that individuals who wanted to use it could still get it  
>> themselves and put it into a local store.
>> There are two possible benefits of working with the data: 1)  
>> Learning something from it 2) Adding it to the pool of rdf that is  
>> in the demo
>> We can still perhaps benefit from 1), even if 2) is not possible -  
>> but you tell us whether you think that is of value...
>> -Alan
>> On Oct 5, 2007, at 4:26 AM, Matt Williams wrote:
>>> I've had a very quick look at this. It might be salutary to read  
>>> some parts of the data-user agreement.
>>> 1. You will not use nor permit others to use the data in any way  
>>> other than for statistical reporting and analysis for research  
>>> purposes. The SEER Program must be notified if it is discovered  
>>> that there has been any other use of the data.
>>> <snip>
>>> 3. You will not attempt to link nor permit others to link the  
>>> data with individually identified records in another data base.
>>> <snip>
>>> 6. You will not release nor permit others to release the data in  
>>> full or in part to any person except with the written approval of  
>>> the SEER Program. In particular, all members of the research team  
>>> who have access to the data must have signed data-use agreements.
>>> <snip>
>>> 7. You will use appropriate safeguards to prevent use or  
>>> disclosure of the information other than as provided for by this  
>>> data-use agreement. If accessing the data from a centralized  
>>> location on a time sharing computer system or LAN with SEER*Stat  
>>> or another statistical package, you will not share your logon  
>>> name and password with any other individuals. You will also not  
>>> allow any other individuals to use your computer account after  
>>> you have logged on with your logon name and password.
>>> I don't know to what extent this therefore causes problems with  
>>> the idea of sharing the data; while it can still be copied into  
>>> an rdf format, doing so and then keeping it on a local server  
>>> seems (mostly) pointless.
>>> --http://acl.icnet.uk/~mw
>>> http://adhominem.blogsome.com/
>>> +44 (0)7834 899570
> -- 
> http://acl.icnet.uk/~mw
> http://adhominem.blogsome.com/
> +44 (0)7834 899570
Received on Friday, 5 October 2007 12:36:53 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:33 UTC