Friday, November 23, 2018

Understanding Polygenic Scores (PGS)

In the years to come it will be very important to understand the concept of your polygenic score for traits such as height, weight, intelligence, personality and many others. In chapter 12 of his book, "Blueprint: How DNA Makes Us Who We Are" Robert Plomin gives a gentle introduction to the PGS concept which I excerpt here.

---

"Because polygenic scores are the basis for the DNA revolution in psychology, it is essential to understand what they are. A polygenic score is like any composite score that psychologists routinely use to create scales from items, such as those on a personality questionnaire. The goal of a polygenic score is to provide a single genetic index to predict a trait, whether schizophrenia, well-being or intelligence.

To get a concrete understanding of a polygenic score, consider a personality trait like shyness. A questionnaire to assess shyness includes multiple items in order to tap into different facets of shyness. For example, a typical shyness questionnaire will have items about how anxious you are in social situations and how much you avoid these situations for example, going to a party, meeting strangers and speaking up at a meeting. You might be asked to respond using a three-point scale (0 = not at all, 1 = sometimes, 2 = a lot).

A shyness score is created by adding these items, taking care to ‘reverse’ items as needed so that a high score means a high degree of shyness. If our shyness measure had ten items scored 0, 1 and 2, total scores could vary from 0 to 20. Simply adding the items like this treats each item as if it is equally useful, but all items are not equally useful. For this reason, items are often added after they are weighted by some criterion of their usefulness at capturing the construct of shyness.

This is exactly how polygenic scores are created, except that, instead of items on a questionnaire, we add up SNP genotypes. Like the three-point rating scale for shyness, SNP genotypes are scored as 0, 1 or 2, indicating the number of ‘increasing’ alleles, as in the example of the FTO SNP [a polymorphism implicated in weight gain].

In the same way that we can add up alleles for one SNP to create a genotypic score, we can also add up alleles for many SNPs to create a polygenic score, just as we add questionnaire items to create a shyness score.

The results from genome-wide association studies are used to select SNPs and to assign weights to each SNP. For example, in the GWA analysis of weight, the FTO SNP accounts for much more variance than other SNPs, so it should count for much more in a polygenic score for weight.

The following table shows how one individual’s polygenic score is created from ten SNPs. For the first SNP, this individual’s genotype is AT. For this SNP, the T allele happens to be the increasing allele that is positively associated with the trait. So, the individual’s genotypic score for this SNP is 1 because the genotype has only one increasing T allele.

Across the ten SNPs, the individual has a total of nine increasing alleles for the trait out of a possible score of 20. So, this individual would have a polygenic score just below the population average score of 10 for this trait.

This score merely adds the number of increasing alleles, which works reasonably well as a polygenic score.



However, we can increase its precision by weighting the genotypic score for each SNP by how much the SNP correlates with the trait. The correlation between each SNP and the trait is taken from the GWA analysis. If one SNP correlates five times more with the trait than another SNP such as SNP 1 versus SNP 10 it should count for five times as much in the polygenic score.

The weighted genotypic scores in the last column of the table are the product of the genotypic score for each SNP and the correlation with the trait. The sum of these weighted genotypic scores for the ten SNPs is 0.023.

This number isn’t as interpretable as the unweighted genotypic score of 9, which is just the sum of the ‘increasing’ alleles. However, both the unweighted polygenic score of 9 and the weighted score of 0.023 can be expressed simply as a percentile in the population. For this individual, both types would indicate a polygenic score just below average.

How many SNPs should go into a polygenic score? Initially, polygenic scores were created using only the genome-wide significant ‘hits’ from a GWA study. For weight, ninety-seven independent SNPs reached genome-wide significance. Creating a polygenic score from these top ninety-seven SNPs explains 1.2. percent of the variance in weight in independent samples. This is only slightly better than the prediction from the FTO SNP by itself, which explains 0.7 per cent of the variance.

Using only genome-wide significant hits is like demanding that each item in our shyness scale predicts significantly on its own. We don’t do this for other psychological scores because it is unrealistic to expect each item to stand on its own. The goal is to have a composite scale that is as useful as possible.

A better idea is to do what we do when we create other psychological scores: keep adding items as long as they add to the reliability and validity of the composite in independent samples. For polygenic scores, the key criterion is prediction. The new approach to polygenic scores is to keep adding SNPs as long as they add to the predictive power of the polygenic score in independent samples.

This is the strategy that has paid off in the last two years in producing powerful polygenic scores for psychological traits. Some false positives will be included in the polygenic score but that is acceptable as long as the signal increases relative to the noise, in the sense that the polygenic score predicts more variance.  ...

To interpret polygenic scores, it is important to keep in mind that they are always distributed like a bell-shaped curve, that is, a normal distribution. This bell-shaped curve is dictated by the fundamental law of probability, the central limit theorem, which is the basis for all statistics.




The normal distribution is found when many random events contribute to a phenomenon, like flipping a coin and counting the number of times the coin comes up heads. If you flip a coin ten times, you could get no heads or ten heads in a row, but most of the time the total number of heads will be between four and seven. If you do this many times, you will get a perfectly normal bell-shaped distribution, peaking at five, which will be the average number of heads. Flipping coins and counting heads is exactly analogous to counting the numbers of ‘increasing’ alleles from SNPs to construct polygenic scores for many individuals.

I will describe all my polygenic scores in terms of percentiles in the normal distribution. That is, to what extent is my polygenic score above or below the average polygenic score in the comparison sample, the 50th percentile?

It turns out that my polygenic score for height is at the 90th percentile. So, based on my DNA alone, knowing nothing else about me, you could predict that I am tall. And, in fact, I am 6 feet 5 inches. Of course, you can easily see that I am tall if you saw me, but with DNA you could tell that I am tall without even looking at me.

Most importantly, you could have predicted when I was born that I would be tall. Unlike any other predictors, polygenic scores are just as predictive from birth as from any other age because inherited DNA sequence does not change during life. In contrast, height at birth scarcely predicts adult height.

The predictive power of polygenic scores is greater than any other predictors, even the height of the individuals’ parents. Another advantage of polygenic scores over family resemblance is that parental height provides only a family-wide prediction that is the same for any child born to those parents.

In contrast, polygenic scores provide a prediction specific to each individual. In other words, my polygenic scores at birth would have predicted that I would be taller than expected on the basis of the average height of my parents.

Before looking at my other polygenic scores, one other general point needs to be highlighted about predicting individuals. My actual height is at the 99th percentile but my polygenic score is at the 90th percentile. Are polygenic scores sufficiently accurate for prediction?

For example, in TEDS [Twins Early Development Study - 1994 onwards], the polygenic score for height predicts 15 percent of the variance in actual height in these young adults. But 15 per cent is a long way from 100 per cent.

In fact, polygenic scores can never predict 100 per cent of the variance of any trait, because the ceiling for prediction is heritability. For height, heritability is 80 percent, but for psychological traits heritability is 50 percent, which means that polygenic score prediction is always going to be way south of perfect.




The big question is the extent to which polygenic scores will be able to predict all the heritable variance of traits. This gap is called missing heritability, and is described in the Notes section at the end of this book. "

--- [end of text extract] ---

Plomin's scatter plot looks rather messy but it hides the extent of clustering when you consider each decile separately.



As mentioned in the legend above, the vertical lines indicate the 95% confidence intervals. They clearly illustrate the linear trend line. Also, I suspect the limitations on sample size (20,000 pairs of twins in TEDS) for genetic studies.

For the top and bottom deciles, the following chart shows the extent of overlap.


Yet at the extremes the differences are very large. This is a general truth as regards traits whose values are are normally distributed.

Plomin finishes this section by emphasising yet again that due to the current lack of power (too few SNPs identified) and the ceiling of heritability, the PGS prediction is just that - a prediction with an error distribution around it. It is not deterministic.

I would comment that the error bars may be long right now, but as sample sizes get larger and non-additive effects are factored in, they can be made considerably smaller.

This is a small excerpt from an excellent book, by the way, which I reviewed here.

No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.