Re: VCF and RDF, at Clinical Pharmacogenomics TF, Wed Apr 3rd

Apologies if this has been covered already I haven't been following the whole discussion.

Genome variant data is just a subset of genome data. My understanding is that the semweb BioHackathon group looked at a variety of different kinds of genomic data and came up with FALDO[1]. This model looks pretty good to me, and importantly there is a converter from GFF3[2,3]. Of all the commonly used genome feature formats out there, GFF3 is by far the best at encouraging provision of relevant metadata using standard ontologies/terminologies.

VCF is convertible to GVF[4,5] which is a subset of GFF3 with additional recommended metadata. It's supported by Ensembl, gbGap and others, and the 1000genomes data is available in GVF[6].

As GFF3 is convertible to RDF/OWL that uses FALDO and SO, it follows that GVF is too (though the converter may need tweaking to take advantage of the additional GVF metadata).

I just wanted to make sure you were aware of all this previous work before reinventing anything.

[1] https://github.com/JervenBolleman/FALDO
[2] http://www.sequenceontology.org/gff3.shtml
[3] https://code.google.com/p/gff3-to-owl/
[4] http://www.ncbi.nlm.nih.gov/pubmed/20796305 - A standard variation file format for human genome sequences - Reese at al
[5] http://www.sequenceontology.org/resources/gvf.html
[6] ftp://ftp.ensembl.org/pub/current_variation/gvf/homo_sapiens/

On Apr 1, 2013, at 10:59 AM, Jeremy J Carroll wrote:

> Hi Kingsley,
> 
> I wasn't going to but since you ask:
> 
> http://www.slideshare.net/JeremyJCarroll/vcf-and-rdf
> 
> or
> 
> http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf
> 
> 
> Jeremy J Carroll
> Principal Architect
> Syapse, Inc.
> 
> 
> 
> On Apr 1, 2013, at 10:13 AM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
> 
>> On 4/1/13 1:05 PM, Jeremy J Carroll wrote:
>>> Hi
>>> 
>>> I am hoping to present the work I am currently doing on VCF and RDF at the Clinical Pharamcogenomics TF telecom on Wednesday.
>>> 
>>> My presentation should cover:
>>> 
>>> - business background, Syapse Discovery
>>> - some background on VCF as a knowledge representation format
>>> - and some initial results on mapping 1000 genomes into RDF
>>> 
>>> I will circulate slides shortly
>>> 
>>> 
>>> Jeremy J Carroll
>>> Principal Architect
>>> Syapse, Inc.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> Hopefully you'll publish to Slideshare?
>> 
>> -- 
>> 
>> Regards,
>> 
>> Kingsley Idehen	
>> Founder & CEO
>> OpenLink Software
>> Company Web: http://www.openlinksw.com
>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca handle: @kidehen
>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>> 
>> 
>> 
>> 
>> 
> 
> 

Received on Monday, 1 April 2013 19:41:04 UTC