Re: VCF and RDF, at Clinical Pharmacogenomics TF, Wed Apr 3rd

This looks really helpful ….

I suspect that the key questions remain about how to effectively use the data

FALDO seems at first glance to be fairly simplistic compared with the model implicit in VCF.
Maybe FALDO is actually more useful, concentrating on the important stuff ...

thanks


Jeremy J Carroll
Principal Architect
Syapse, Inc.



On Apr 1, 2013, at 12:40 PM, Chris Mungall <cjmungall@lbl.gov> wrote:

> 
> Apologies if this has been covered already I haven't been following the whole discussion.
> 
> Genome variant data is just a subset of genome data. My understanding is that the semweb BioHackathon group looked at a variety of different kinds of genomic data and came up with FALDO[1]. This model looks pretty good to me, and importantly there is a converter from GFF3[2,3]. Of all the commonly used genome feature formats out there, GFF3 is by far the best at encouraging provision of relevant metadata using standard ontologies/terminologies.
> 
> VCF is convertible to GVF[4,5] which is a subset of GFF3 with additional recommended metadata. It's supported by Ensembl, gbGap and others, and the 1000genomes data is available in GVF[6].
> 
> As GFF3 is convertible to RDF/OWL that uses FALDO and SO, it follows that GVF is too (though the converter may need tweaking to take advantage of the additional GVF metadata).
> 
> I just wanted to make sure you were aware of all this previous work before reinventing anything.
> 
> [1] https://github.com/JervenBolleman/FALDO
> [2] http://www.sequenceontology.org/gff3.shtml
> [3] https://code.google.com/p/gff3-to-owl/
> [4] http://www.ncbi.nlm.nih.gov/pubmed/20796305 - A standard variation file format for human genome sequences - Reese at al
> [5] http://www.sequenceontology.org/resources/gvf.html
> [6] ftp://ftp.ensembl.org/pub/current_variation/gvf/homo_sapiens/
> 
> On Apr 1, 2013, at 10:59 AM, Jeremy J Carroll wrote:
> 
>> Hi Kingsley,
>> 
>> I wasn't going to but since you ask:
>> 
>> http://www.slideshare.net/JeremyJCarroll/vcf-and-rdf
>> 
>> or
>> 
>> http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf
>> 
>> 
>> Jeremy J Carroll
>> Principal Architect
>> Syapse, Inc.
>> 
>> 
>> 
>> On Apr 1, 2013, at 10:13 AM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>> 
>>> On 4/1/13 1:05 PM, Jeremy J Carroll wrote:
>>>> Hi
>>>> 
>>>> I am hoping to present the work I am currently doing on VCF and RDF at the Clinical Pharamcogenomics TF telecom on Wednesday.
>>>> 
>>>> My presentation should cover:
>>>> 
>>>> - business background, Syapse Discovery
>>>> - some background on VCF as a knowledge representation format
>>>> - and some initial results on mapping 1000 genomes into RDF
>>>> 
>>>> I will circulate slides shortly
>>>> 
>>>> 
>>>> Jeremy J Carroll
>>>> Principal Architect
>>>> Syapse, Inc.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> Hopefully you'll publish to Slideshare?
>>> 
>>> -- 
>>> 
>>> Regards,
>>> 
>>> Kingsley Idehen	
>>> Founder & CEO
>>> OpenLink Software
>>> Company Web: http://www.openlinksw.com
>>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>>> Twitter/Identi.ca handle: @kidehen
>>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 

Received on Monday, 1 April 2013 21:02:46 UTC