Wading Through Treacle: Estimating IQ from genotype

Friday, November 07, 2014

Estimating IQ from genotype

This post is a simple back of the envelope calculation based on Davide Piffer's paper as discussed in my earlier post.

First a quick reminder about opinion polls and sampling.

Opinion Polls

We assume a large population of interest and we sample n individuals (often 1,000) with a yes-no question. Something like "Are you going to vote for the Labour Party in the forthcoming election?" We want to know how likely it is that the population as a whole votes in the same proportions as found in our survey. Suppose p is the fraction of the sample-population who tell us they will vote yes (example: 0.32).

This is just the same as throwing a biased coin (Heads with probability 0.32) a thousand times and seeing how many Heads we actually get. Clearly on average we'll get 320 Heads [the mean of our sample is np]. Of more interest, however, is the standard deviation of the mean if we took sample after sample (or coin-throwing exercise after coin-throwing exercise). We would like to know the upper and lower bounds of 'yes' respondents we would get in, say, 95% of the samples we took, corresponding to +/- 1.96 standard deviations. We can be pretty confident that those bounds would play out in real life (nineteen times out of 20).

The standard deviation of a binomial distribution, which is what we have here, is √(npq) where n is the size of the sample (example, 1,000), p is the probability of the 'yes' outcome (example: 0.32) and q is the probability of the 'no' outcome (0.68 = 1-p).

The 95% confidence interval around the mean np is +/- 1.96 standard deviations - which we approximate here to 2. We also approximate p and q to 0.5 as this is the largest value of √(pq).

Plugging the numbers in, we get the 95% confidence interval as: 2 times √(0.5 x 0.5 x n) = √n.

In our running example with 1,000 people sampled (√1000 equals around 32), this tells us that the interval 320 +/- 32 will contain the number of 'yes' answers we'll get 95% of the time. We sometime prefer to have the results as a proportion, usually written as a percentage, in which case we divide everything by n.

The mean number of 'yes' voters here is 320/1,000 = np/n = p (0.32 or 32%).

The 95% confidence interval here is 32/1,000 = √n/n = 1/√n (usually described as +/- 3%).

Note that if we had sampled just 100 voters, we would have a 95% confidence interval of +/- 1/√100 = +/- 10%. We're already losing quite a bit of predictive power.

Asking just 10 people, the 95% confidence interval is 1/√10 = 0.32 = approx. 30%. So the three people who said they'd vote 'yes' .. in multiple surveys that number could dip as low as zero and as high as six. Pretty much worthless in forecasting the election.

---

To apply this to IQ I'm going to use the data in Davide Piffer's paper, as discussed in my earlier post - to which you may need to refer.

Looking at my own results I had 16 alleles to play with, of which 7 were 'good for intelligence'. So this is an opinion poll where I was able to survey only 16 people. Duh!

My computed allele frequency was 44% against a European average of 35.5% so I'm 8.5 percentage points up from the average.

Looking at the Chinese/Japanese figures we see an allele frequency score of 39.1% (a difference of 3.6% from the European mean) which corresponds to an IQ difference of 5 IQ points from Europeans. I'm going to assume a linear relation - an additive model.

To convert a difference of mean allele frequency to IQ difference we multiple by 5/3.6 = 1.4. So the estimate of my IQ is 8.5 * 1.4 = 12 points above the European average of 100. In my incorrigible vanity I'd like to believe that 112 is rather on the low side! What is the 95% confidence interval for this calculation?

Since n = 16, and following the path described above, the 95% confidence interval is +/- 1/√16 = 25%.

That's the allele frequency limits so my true allele frequency (of those hundreds or thousands of 'good alleles driving IQ') is probably in the range 44% +/- 25% or [19%, 69%]. To change these limits into IQ scores multiply the confidence interval of +/-25% by 1.4 giving +/- 35 IQ points

We may be 95% confident that my IQ is in the range [77, 147].

So I guess we can be 95% confident that I'm neither extremely educationally subnormal nor Albert Einstein!

The take-home message is that we need hundreds of alleles to give us a big enough sample to get the error bounds down. The concordance of twins brought up together for IQ is around 0.86 so non-genetic factors will still prevent us getting all the way.

BTW we're just a few years from getting to that 'hundreds of IQ-affecting alleles' point, so although this is a fun exercise, reality will be along soon enough.