Re: Observations about facts in genomics

Is this issue wholly addressed by having a URI for the reference? Or is there some subtlety that I am missing here?

i.e. I would expect a minor version of a reference genome to have a different URI from a different minor version of the same major version of the reference genome …. am I naive?

I have noticed reference declarations that fall short of my ideal … e.g.

##reference=GRCh37.p5

##reference="hg19"

##reference=GRCh37

##reference=GRCh37
##reference=file:///humgen/1kg/reference/human_g1k_v37.fasta
##reference=file:///humgen/gsa-hpprojects/GATK/data/ucsc.hg19/ucsc.hg19.fasta

##reference=file:///humgen/gsa-hpprojects/GATK/data/ucsc.hg19/ucsc.hg19.fasta

My take is that the "hg19" is a bug, and should read 

##reference=hg19

and that somewhere I need some heuristics that convert these into URIs … (and which convert file:/// uris into something more useful)

Any hints as to how to interpret these would be welcome.

Jeremy


On Mar 20, 2013, at 3:19 PM, Joachim Baran <joachim.baran@gmail.com> wrote:

> Hello,
> 
> On 20 March 2013 18:09, Jerven Bolleman <me@jerven.eu> wrote:
> So instead of chromosome M you are really talking about assembly X of
> a set of reads R mapped via some (variant calling) processes to
> reference chromosome C that is also really an assembly of a different
> set of reads.
>   Just to add to Jerven's comment: even when referring to a reference assembly, it is best to add "Which version?".
> 
>   Even when talking about reference genome assemblies, you have multiple versions (including "patches"). Additionally, when interpreting the genomes, you will also get different results from various institutes (genes from UCSC are not the same as Ensembl).
> 
>   I think my point here is that chromosomes (or anything, really), has provenance that needs to be explicitly denoted.
> 
> Best,
> Joachim
> 

Received on Wednesday, 20 March 2013 22:39:37 UTC