Tuesday, March 15, 2016


As I have my partially-sequenced genome over at 23andMe, I decided to do my bit for genetic research by signing up with DNA.Land.

"DNA.Land is a place where you can learn more about your genome while enabling scientists like us to make new discoveries for the benefit of humanity. The website is not-for-profit and run by the Erlich and Pickrell labs affiliated with Columbia University and the New York Genome Center.

"The purpose of DNA.Land is to enable you to learn more about your DNA and allow you the autonomy to share your data to facilitate important scientific research at the forefront of genome sciences and medicine. Our goal is to help members interpret their data and connect potential participants with research studies.


"After you provide consent, your participation would consist of creating your personal profile and securely uploading your genetic data to DNA.Land. We will use the most cutting edge genetic tools to analyze your data and return your results regarding ancestry, relatives, and different traits.

"As we want to learn about the genetic basis of different traits, we will ask you to fill out surveys relating to your (or your family’s) ancestry and health. You will also have the option to automatically contribute data from your social media profiles for new types of analysis, so we can learn about traits that are dynamic and more difficult to measure, such as social preferences.

"Your profile will also display a badge that summarizes your various contributions to DNA.Land. You can tweet this badge, share it on Facebook, or sew it on your old scout uniform.

"There are no costs associated with taking part in DNA.Land and you will not be compensated for participating."
Where is the value-add over, say, the 23andMe health and ancestry reports? DNA.Land 'impute' missing parts of your genome based on "whole genome sequencing data used to create a dictionary of genomic 'text' (known as haplotypes)."

So how does this help?
"Uploaded genotype files (e.g. from 23andMe) contain between 500,000 to 1 million SNPs. DNA.Land's imputation pipeline imputes (i.e. infers the value of) an additional 38 million SNPs."
I guess this exploits linkage disequilibrium.

After many hours of crunching, DNA.Land will deliver an imputed VCF file (about half a Gigabyte). They attempt to explain how to interpret this ("Understanding VCF values") - but it will plainly take more work on my part to figure it out. The raw detail file - in all its immensity - seems pretty opaque; probably best to run the whole thing through Promethease.

Here's what SNPedia says about DNA.Land with reference to Promethease.

And here's how to run the imputed results through Promethease. Haven't tried it as the file isn't yet available, I think it's just use the "Upload Raw data" button.

Costs $10 for the enhanced ('imputed') report.

So I'm currently waiting for DNA.Land's computers to finish crunching my 23andMe raw data file.


Update: (Tuesday 3.22 pm): Promethease currently uploading 417 MB of dnl13394_inl.imputed.vcf via ADSL. A long wait ahead.


Update: (Tuesday 7.29 pm): OK, so all done.

I have the Promethease report downloaded, all 8 MB of it. Time enough to browse it tomorrow.

I rather wish that Promethease had told me it doesn't accept .bcf files (the binary version of .vcf) before I uploaded half a Gigabyte's worth - how hard could that be, to check a file extension?

It did, however, take .vcf so it all worked in the end.

No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.