- From: Chris Mungall <cjmungall@lbl.gov>
- Date: Mon, 1 Apr 2013 12:40:32 -0700
- To: Jeremy J Carroll <jjc@syapse.com>
- Cc: Kingsley Idehen <kidehen@openlinksw.com>, HCLS hcls <public-semweb-lifesci@w3.org>
Apologies if this has been covered already I haven't been following the whole discussion. Genome variant data is just a subset of genome data. My understanding is that the semweb BioHackathon group looked at a variety of different kinds of genomic data and came up with FALDO[1]. This model looks pretty good to me, and importantly there is a converter from GFF3[2,3]. Of all the commonly used genome feature formats out there, GFF3 is by far the best at encouraging provision of relevant metadata using standard ontologies/terminologies. VCF is convertible to GVF[4,5] which is a subset of GFF3 with additional recommended metadata. It's supported by Ensembl, gbGap and others, and the 1000genomes data is available in GVF[6]. As GFF3 is convertible to RDF/OWL that uses FALDO and SO, it follows that GVF is too (though the converter may need tweaking to take advantage of the additional GVF metadata). I just wanted to make sure you were aware of all this previous work before reinventing anything. [1] https://github.com/JervenBolleman/FALDO [2] http://www.sequenceontology.org/gff3.shtml [3] https://code.google.com/p/gff3-to-owl/ [4] http://www.ncbi.nlm.nih.gov/pubmed/20796305 - A standard variation file format for human genome sequences - Reese at al [5] http://www.sequenceontology.org/resources/gvf.html [6] ftp://ftp.ensembl.org/pub/current_variation/gvf/homo_sapiens/ On Apr 1, 2013, at 10:59 AM, Jeremy J Carroll wrote: > Hi Kingsley, > > I wasn't going to but since you ask: > > http://www.slideshare.net/JeremyJCarroll/vcf-and-rdf > > or > > http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf > > > Jeremy J Carroll > Principal Architect > Syapse, Inc. > > > > On Apr 1, 2013, at 10:13 AM, Kingsley Idehen <kidehen@openlinksw.com> wrote: > >> On 4/1/13 1:05 PM, Jeremy J Carroll wrote: >>> Hi >>> >>> I am hoping to present the work I am currently doing on VCF and RDF at the Clinical Pharamcogenomics TF telecom on Wednesday. >>> >>> My presentation should cover: >>> >>> - business background, Syapse Discovery >>> - some background on VCF as a knowledge representation format >>> - and some initial results on mapping 1000 genomes into RDF >>> >>> I will circulate slides shortly >>> >>> >>> Jeremy J Carroll >>> Principal Architect >>> Syapse, Inc. >>> >>> >>> >>> >>> >>> >> >> Hopefully you'll publish to Slideshare? >> >> -- >> >> Regards, >> >> Kingsley Idehen >> Founder & CEO >> OpenLink Software >> Company Web: http://www.openlinksw.com >> Personal Weblog: http://www.openlinksw.com/blog/~kidehen >> Twitter/Identi.ca handle: @kidehen >> Google+ Profile: https://plus.google.com/112399767740508618350/about >> LinkedIn Profile: http://www.linkedin.com/in/kidehen >> >> >> >> >> > >
Received on Monday, 1 April 2013 19:41:04 UTC