Processing math: 4%

Estimating Imputed R-squares

A common way to measure the imputation r2 is calculate the variance of the imputed alleles probabilities and divide that by the variance if the alleles were perfectly imputed. An allele is perfectly imputed if Pr equals 0 or 1 for all i.

The variance of the alleles when are perfectly imputed is q(1-q) where q is the alternate allele frequency. Given the imputation data We do not know what q is the general population. However we can estimate it using the dosage values for each subject. \hat q = \sum_{i = 1}^{N}\frac{d_i}{2N} where the dosage is calculated as d_i = \frac{\Pr(g_i = 1) + 2\Pr(g_i=2)}{2} Another problem with the dosage data is we don’t have the probabilities for each allele. Instead we have \Pr(g_i=0), \Pr(g_i=1), and \Pr(g_i=2). If we assume that a subject’s two allelic probabilities, q_1, q_2 are independently imputed we know the following q_1(1-q2) + (1-q_1)q_2 = \Pr(g=1) and q_1 q_2 = \Pr(g=2) These equations can be solved resulting in the following values q_1 = \frac{d - \sqrt{d^2 - \Pr(g = 2)}}{2}\\% q_2 = \frac{d + \sqrt{d^2 - \Pr(g = 2)}}{2} There can be some problems using the above equations. Sometimes the value inside the radical can be negative. This can be caused by roundoff error. If the value is negative and close to zero, the value can be set to zero.

Note: The documentation for minimac and Impute 2 indicate that the imputation values for the two alleles are imputed independently.

Since each subject has two alleles we can let q_1 to q_N represent the first allele of each subject and q_{N+1} to q_{2N} represent the second allele. Given this we can calculate all the q’s as follows q_i = \left\{\begin{array}{ll}% \frac{d_i - \sqrt{d_i^2 - 4\Pr(g_i = 2)}}{2} & \; 0<i\leq N\\% \frac{d_i + \sqrt{d_i^2 - 4\Pr(g_i = 2)}}{2} & \; N<i\leq 2N % \end{array}\right. Once the q’s have been calculated, the imputation r^2 can be estimated as follows

\hat r^2 = \frac{\sum_{i = 1}^{2N}\frac{(q_i - \hat q)^2}{2N}}{\hat q(1 - \hat q)}