Wading Through Treacle: Promethease

Showing posts with label Promethease. Show all posts

Wednesday, August 02, 2017

My Personal Genome Project report has now arrived

Personal Genome Project Logo

I discovered the Personal Genome Project in September 2014 and immediately tried to sign up. They weren't taking new volunteers.

In December 2015 (15 months later) I finally did succeed in registering, but no-one was very interested in taking a spit-sample.

In May 2016, my sample collection kit arrived and I duly spat for science. I returned my sample for sequencing, at which point they stored it .. and nothing whatsoever happened.

It is now August 2017. Fifteen further months have passed and this morning I received an email from the PGP. I have been sequenced!

Here is my report (PDF) - I waive all privacy concerns.

---

My genome will now be released for research. Was there anything interesting in the report? No. Was it different in any important respect than that which I already received from 23andMe? No.

As usual, the report is mostly centred around SNPs. Unfortunately most interesting phenotypic traits are polygenic, the full connections with genomic variation yet to be unravelled. The SNPs - taken individually - simply adjust your odds ratio up or down for various conditions. Example: some of my SNPs elevate my odds for baldness; others lower it.

Insofar as the science centres around connecting my personal genome with my own phenotype characteristics, research should now focus on the latter.

I look forward to the first request for the promised punch biopsy.

Tuesday, March 15, 2016

DNA.Land

As I have my partially-sequenced genome over at 23andMe, I decided to do my bit for genetic research by signing up with DNA.Land.

Purpose

"DNA.Land is a place where you can learn more about your genome while enabling scientists like us to make new discoveries for the benefit of humanity. The website is not-for-profit and run by the Erlich and Pickrell labs affiliated with Columbia University and the New York Genome Center.

"The purpose of DNA.Land is to enable you to learn more about your DNA and allow you the autonomy to share your data to facilitate important scientific research at the forefront of genome sciences and medicine. Our goal is to help members interpret their data and connect potential participants with research studies.

Procedures

"After you provide consent, your participation would consist of creating your personal profile and securely uploading your genetic data to DNA.Land. We will use the most cutting edge genetic tools to analyze your data and return your results regarding ancestry, relatives, and different traits.

"As we want to learn about the genetic basis of different traits, we will ask you to fill out surveys relating to your (or your family’s) ancestry and health. You will also have the option to automatically contribute data from your social media profiles for new types of analysis, so we can learn about traits that are dynamic and more difficult to measure, such as social preferences.

"Your profile will also display a badge that summarizes your various contributions to DNA.Land. You can tweet this badge, share it on Facebook, or sew it on your old scout uniform.

"There are no costs associated with taking part in DNA.Land and you will not be compensated for participating."

Where is the value-add over, say, the 23andMe health and ancestry reports? DNA.Land 'impute' missing parts of your genome based on "whole genome sequencing data used to create a dictionary of genomic 'text' (known as haplotypes)."

So how does this help?

"Uploaded genotype files (e.g. from 23andMe) contain between 500,000 to 1 million SNPs. DNA.Land's imputation pipeline imputes (i.e. infers the value of) an additional 38 million SNPs."

I guess this exploits linkage disequilibrium.

After many hours of crunching, DNA.Land will deliver an imputed VCF file (about half a Gigabyte). They attempt to explain how to interpret this ("Understanding VCF values") - but it will plainly take more work on my part to figure it out. The raw detail file - in all its immensity - seems pretty opaque; probably best to run the whole thing through Promethease.

Here's what SNPedia says about DNA.Land with reference to Promethease.

And here's how to run the imputed results through Promethease. Haven't tried it as the file isn't yet available, I think it's just use the "Upload Raw data" button.

Costs $10 for the enhanced ('imputed') report.

So I'm currently waiting for DNA.Land's computers to finish crunching my 23andMe raw data file.

---

Update: (Tuesday 3.22 pm): Promethease currently uploading 417 MB of dnl13394_inl.imputed.vcf via ADSL. A long wait ahead.

---

Update: (Tuesday 7.29 pm): OK, so all done.

I have the Promethease report downloaded, all 8 MB of it. Time enough to browse it tomorrow.

I rather wish that Promethease had told me it doesn't accept .bcf files (the binary version of .vcf) before I uploaded half a Gigabyte's worth - how hard could that be, to check a file extension?

It did, however, take .vcf so it all worked in the end.

Sunday, June 07, 2015

Sequencing Dad

How much can you tell just from someone's genes?

Here's a thought experiment. Suppose you could copy and recreate a person's genome pretty much exactly, give or take a few mutational errors. You could implant this genome-copy into an egg cell, bring the resulting foetus to term and allow it to grow into a full adult. How similar would the copy-person be to the original?

Of course, this is not a thought experiment. I'm simply describing the case of twins separated at birth. And how alike are these twins? Well, in appearance, personality, health and intelligence they are as alike as ... twins.

If we understood the human genome in all its SNP and copy-number variants, we wouldn't need to recreate a human copy, we would simply read off the attributes of the person with that genome. Police profilers can already do this for many traits (I mentioned recently facial reconstruction from DNA samples). In the future we will be able to know and do much, much more.

It's simplest if you donate a sample of your DNA to a gene-sequencing company. My mother and myself have sent our saliva kits to 23andMe (here's my report) and even with the restricted coverage currently offered there's lots to learn.

My father died in 2009, before the age of consumer genomics. If we had his DNA we would know quite a lot about his health and personal traits right now; and of course in the future we would know so much more. Personality, health and intelligence would be my interests.

Forensic techniques get better all the time. We have some of his clothing and other personal possessions; there must be traces of his DNA. If we wait a while till the prices come down, I think there is an excellent chance we will be able to find samples of his DNA and sequence them. Our family genetic history will take a further step forwards.

I wrote a little piece for sciencefiction.com about this a while back.

Wednesday, October 08, 2014

Best introduction to human genomics

Puzzled by your 23andMe/Promethease results? Having trouble figuring out what SNPs are and what they're good, or bad for? Time to learn some human genetics and genomics.

"Human Genetics and Genomics" (4th Edition) by Bruce R. Korf and Mira B. Irons is a bird's eye view of the basics of molecular biology through to the technologies of genome analysis to the medical implications of DNA and chromosome variation. It's aimed at medical students, so there are case studies to keep it real.

Chapter 1 starts us off with a tour round human DNA. We look at how it's structured, how it's replicated in cells and how DNA is transcribed into proteins. We take a quick look at epigenetics (the way some genes can be chemically silenced - switched off).

Chapter 2 looks at genetic variation. This covers single nucleotide polymorphisms (SNPs), DNA repair mechanisms, gene duplication and its role in evolution. PCR (Polymerase Chain Reaction) as used for forensic analysis of DNA samples amongst other things is also described.

Chapter 3 is 'Patterns of Inheritance'. Here you'll learn how to take a family history looking for dominant or recessive patterns of Mendelian inheritance. We also look at sex-linked inheritance (X or Y chromosome location of the gene in question), mosaicism and genomic imprinting.

Chapter 4 describes the Human Genome Project and the history of attempts to find out where on the human chromosome set a gene of interest (often disease-causing) actually resides. You'll meet some common terms such as linkage disequilibrium, which is carefully explained. The working example in this chapter is cystic fibrosis.

Chapter 5 discusses 'Multifactorial Inheritance'. Most 'quantitative traits' such as height, personality and intelligence are under the control of hundreds or thousands of alleles of small effect. The same is true of many diseases. So in this chapter we learn about heritability, additive and threshold models of multi-allele effects and that very latest thing: genome-wide association studies (GWAS) which are shedding light on .. almost everything.

Chapter 6 raises its gaze to the organisation of genes into chromosomes. Sadly, this is another level where errors can occur (e.g. chromosome 21 trisomy leading to Down Syndrome) and there are plenty of more subtle things which can go wrong (deletions, duplications, inversions, rings, translocations). Characteristic diseases and syndromes duly follow.

Chapter 7 looks at Population Genetics. We learn about the Hardy-Weinberg equation (very clear explanation) and how it is used to work out the carrier frequency of a recessive disease in various populations. The working example here is Sickle Cell Anaemia.

Chapter 8 focuses on Cancer Genetics. Cancer is a genetic disease, emerging from a cascade of errors in those genes which regulate cell development and division. We now have a causal narrative of the mechanisms behind many cancers, as this chapter explains in good detail.

Chapter 9 looks at chromosome translocation with specific application to Down Syndrome.

The remaining chapters (10-17) are shorter and look in detail at:

Molecular diagnosis of conditions based on genetic testing
Newborn screening (e.g. for PKU)
Developmental genetics (CHARGE syndrome is the example)
Carrier screening (Tay-Sachs in an Ashkenazi context is the example in this chapter)
Genetic risk assessment (companies like 23andMe are discussed in some detail)
Genetic testing for risks of cancer (BRCA1 and 2 is the example)
The genetics of drug response - Pharmacogenetics (example: malignant hyperthermia)
Emerging treatment for genetic disorders such as gene therapy.

This book was published in 2013 so it's pretty much up-to-date. If you're not a medical student with good recall, have Google/Wikipedia next to you as you read it: most terms are explained - and then you come to something like Epistasis!

Wednesday, September 24, 2014

"Still Alice" by Lisa Genova

Here's the Amazon description.

"Still Alice is a compelling debut novel about a 50-year-old woman's sudden descent into early onset Alzheimer's disease, written by first-time author Lisa Genova, who holds a Ph. D in neuroscience from Harvard University.

"Alice Howland, happily married with three grown children and a house on the Cape, is a celebrated Harvard professor at the height of her career when she notices a forgetfulness creeping into her life. As confusion starts to cloud her thinking and her memory begins to fail her, she receives a devastating diagnosis: early onset Alzheimer's disease. Fiercely independent, Alice struggles to maintain her lifestyle and live in the moment, even as her sense of self is being stripped away. In turns heartbreaking, inspiring and terrifying, Still Alice captures in remarkable detail what's it's like to literally lose your mind..."

Midway through reading this novel to Clare, I got to the bit where Alice gets a genetic test done for early-onset (or familial) Alzheimer's. The mutated genes in question are APP, PS1 and PS2;

"Familial Alzheimer disease is caused by a mutation in one of at least 3 genes: presenilin 1, presenilin 2 and amyloid precursor protein (APP). Other gene mutations are in study."

Naturally, I scampered across to my 23andMe/Promethease health results to check out my genome. Now, 23andMe don't screen for SNPs on those genes (I can see why!) but they do look at a late-onset Alzheimer genotype - it's the ε4 variant of the APOE gene.

"Although 40-65% of AD patients have at least one copy of the ε4 allele, ApoE4 is not a determinant of the disease - at least a third of patients with AD are ApoE4 negative and some ApoE4 homozygotes never develop the disease. Yet those with two ε4 alleles have up to 20 times the risk of developing AD."

I don't have it.

---

Note: how do two SNPs determine three APOE variants? (From 23andMe).

Variant: rs429358 + rs7412 (these are the two SNPs within the APOE gene)

ε2 = T + T

ε3 = T + C

ε4 = C + C

I'm ε3/ε3 - on the chromosome 19 pair (TT + CC). It's surprising how blasé one can be about all this once it's clear that you don't have the 'bad' allele ...

Friday, August 15, 2014

Why bother with 23andMe?

The genetics company 23andMe is no longer permitted (under a current US FDA ruling) to provide you with health advice based on sequencing your own DNA sample. In practice this is not an issue since you simply go to Promethease and import to them your raw data from 23andMe (as described here). For some reason this is not forbidden.

How useful are these reports? To be honest, I believe they're of bounded interest, and here's why.

"A number of commercial firms offer targeted or extensive genotyping to anyone who wants to submit a saliva specimen and pay a fee. Some of the reasons suggested for doing this include identification of ancestral background, relationship certification and most commonly, detection of genetic susceptibilities to disease.

"The latter are almost entirely based on GWAS that have associated specific single nucleotide polymorphisms (SNP) with an increased (or decreased) likelihood of developing a particular common disease. In almost all such GWAS-based analyses, the association with disease is highly statistically significant but of remarkably little predictive value. In other words, the relative risks of developing a disease based on having one of these markers is typically in the range of 1.1–1.4.

"Moreover, virtually no research has been done to examine the clinical utility of being identified as having one of these risk markers. For example, is someone with the 9p21-linked SNP that has no known biologic function but is associated with a slightly greater risk of developing an atherosclerosis-related condition more likely to alter their lifestyle, change their diet, or stop smoking? "

In addition, there are many other sources of error in the genetic code which can have profound medical implications - but are not SNPs - such as (from the same report):

"Translocation results from an exchange of parts of two chromosomes.

"Deletion is loss of chromosomal material.

"Duplication is the presence of two or more copies of the same region of a given chromosome. The redundancy may occur in the same chromosome or in a nonhomologous chromosome. In the latter case, a translocation will also have occurred."

So most health reports tell you that you have some SNPs which increase, or decrease your susceptibility to this or that condition, but in the current state of the research it's not known how many other SNPs or distinct genetic modifications could also affect the likelihood of acquiring it. Early days indeed!

One of the things 23andMe ask you is whether you permit them to keep your sample for ten years (I guess in liquid nitrogen or something). I imagine that in a decade the cost of a complete genome sequencing will have come down to something affordable and 23andMe will then be able to offer a much more sophisticated analysis/diagnostic service based on your complete genome (FDA willing!).

Assuming we will also by then understand a lot more about how the genome ties in to phenotypic traits such as health, intelligence, personality, appearance, sports potential and so on, the report in 2024 might be quite informative, and the one in 2034 even more so.

One generation down-track from now, everyone will have their genome transcribed for health reasons, and moreover we'll understand it. One of our remote descendants may well be interested in their ancestors, bewailing the fact that they never got genotyped.

But wait!

---

Steve Hsu has more to say here:

"... given sufficient phenotype|genotype data, genomic prediction of traits such as cognitive ability will be possible. If, for example, 0.6 or 0.7 of total population variance is captured by the predictor, the accuracy will be roughly plus or minus half a standard deviation (e.g., a few cm of height, or 8 IQ points). The required sample size to extract a model of this accuracy is probably on the order of a million individuals. As genotyping costs continue to decline, it seems likely that we will reach this threshold within five years for easily acquired phenotypes like height (self-reported height is reasonably accurate), and perhaps within the next decade for more difficult phenotypes such as cognitive ability. At the time of this writing SNP genotyping costs are below $50 USD per individual, meaning that a single super-wealthy benefactor could independently fund a crash program for less than $100 million.

"Once predictive models are available, they can be used in reproductive applications, ranging from embryo selection (choosing which IVF zygote to implant) to active genetic editing (e.g., using powerful new CRISPR techniques). In the former case, parents choosing between 10 or so zygotes could improve their expected phenotype value by a population standard deviation. For typical parents, choosing the best out of 10 might mean the difference between a child who struggles in school, versus one who is able to complete a good college degree. Zygote genotyping from single cell extraction is already technically well developed, so the last remaining capability required for embryo selection is complex phenotype prediction. The cost of these procedures would be less than tuition at many private kindergartens, and of course the consequences will extend over a lifetime and beyond.

"The corresponding ethical issues are complex and deserve serious attention in what may be a relatively short interval before these capabilities become a reality. Each society will decide for itself where to draw the line on human genetic engineering, but we can expect a diversity of perspectives. Almost certainly, some countries will allow genetic engineering, thereby opening the door for global elites who can afford to travel for access to reproductive technology. As with most technologies, the rich and powerful will be the first beneficiaries. Eventually, though, I believe many countries will not only legalize human genetic engineering, but even make it a (voluntary) part of their national healthcare systems. The alternative would be inequality of a kind never before experienced in human history."

Monday, August 11, 2014

Promethease from 23andMe

Take a look at this image (click on it to make it large enough – it’s from Population Genetics, 2nd Edition by John H. Gillespie, p. 3).

A gene - showing coding nucleotides and SNPs

It’s the reference allele (gene) which codes for the alcohol dehydrogenase enzyme in a particular species of fruit fly. Fruit flies need this enzyme to handle the alcohol in rotting fruit. There are 768 coding bases in this gene (a coding base, or nucleotide, is one of the four constituents of DNA commonly abbreviated to A, G, C and T).

Grouped in threes, the nucleotides code for amino acids: so at position 578 the sequence AAG codes for lysine. Change it to AAC and you get threonine instead. This change of a single base is called a SNP – a ‘Single Nucleotide Polymorphism’ pronounced ‘snip’. Two coding sequences in a population which differ in one or more SNPs are called alleles. Note that the string of amino acids listed in the image, when assembled together, constitute the alcohol dehydrogenase enzyme in variants depending upon which SNPs were present..

SNPs are happening all the time in DNA due to mutations, e.g. copy errors of various kinds. Mostly they so mess up the gene that it can’t function properly, the organism dies without reproducing and thus natural selection ‘purifies’ the genome. In some places, though, a SNP merely alters the function slightly and creates variation between individuals. Most traits such as height, intelligence and personality are under the control of many different alleles ‘of small effect’ so just looking at one SNP won’t tell you too much.

The human genome contains at least 30,000 genes constituted from three billion base pairs on 23 chromosome pairs. One of these pairs is the sex-determining chromosome pair: XX (you’re a girl) or XY (you’re a boy). In each chromosome pair you get one of the chromosomes from your father and the other from your mother. However, each of these chromosomes has itself been randomised from the two corresponding chromosomes in each of your parents through a process called recombination (except for the Y chromosome which has no female variant to pair with in its chromosome and is thus handed on unchanged).

Humans have at least ten million SNPs – the number increases with research. Many studies have looked at people with medical conditions and tried to work out if they have some specific SNPs which non-sufferers lack. When you have a genome analysis – as with 23andMe – they put your sample through a chip (made by a company called Illumina) which knows about a million SNPs reflecting those currently considered important and significant by the research community.

Here’s some background on human genome SNPs. There’s a standard database called SNPedia which centralises what’s known and gives a reference number (such as rs1234 - a tutorial SNP) to each unique SNP. Here is what the 23andMe raw data looks like (just the first few entries!)

# This data file generated by 23andMe at: Tue Apr 23 09:13:29 2013
#
# Below is a text version of your data. Fields are TAB-separated
# Each line corresponds to a single SNP. For each SNP, we provide its identifier
# (an rsid or an internal id), its location on the reference human genome, and the
# genotype call oriented with respect to the plus strand on the human reference sequence.
# We are using reference human assembly build 37 (also known as Annotation Release 104).
# Note that it is possible that data downloaded at different times may be different due to ongoing
# improvements in our ability to call genotypes. More information about these changes can be found at:
# https://www.23andme.com/you/download/revisions/
#
# More information on reference human assembly build 37 (aka Annotation Release 104):
# http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606
#
# rsid chromosome position genotype
rs4477212 1 82154 AA
rs3094315 1 752566 AA
rs3131972 1 752721 GG
rs12124819 1 776546 AA
rs11240777 1 798959 GG
rs6681049 1 800007 CC
rs4970383 1 838555 AC
rs4475691 1 846808 CT
rs7537756 1 854250 AG
rs13302982 1 861808 GG
rs1110052 1 873558 GT
rs2272756 1 882033 AG

So now we come to Promethease. This is a self-service program which links your raw data to the current scientific literature. For $5 you get a report which tells you what is known about the unique set of SNPs which define you (at least as far as 23andMe presently go - some way short of a full genome analysis which is still too expensive).

Promethease has a reputation as being difficult to use; it is not. Here’s the YouTube video which I watched and then knew exactly what to do. It was no problem at all.

And here’s the report I got back (zipped, 40 MB). It’s basically not too hard to interpret and the help links are good. I found Medical Conditions particularly informative once I looked at the help link to understand the graphics.

A I expected, there are few surprises. I was pleased to be in the 12% where exercise actually loses you weight. And of the SNPs currently known to be associated with Autism, I have only a few. Testicular cancer – not so good.

Do you want to know more?