## Computing principal components

`-kinship`

option to compute a
relatedness matrix, the `-UDUT`

option to eigendecompose it, and the `-PCs`

option
to output PCs. A complete example would look like this:
This outputs the first 20 PCs to the file `PCs.csv`

, in addition to the estimated kinship
matrix and its eigendecomposition. The following sections show the use of these options in more detail.

`-kinship`

option can be used to estimate a kinship matrix, as in:
This outputs pairwise kinship values to the file `kinship.csv`

, which is stored in a 'long' format
with columns holding the first sample id, second sample id, the number of pairwise non-missing genotypes,
and the estimated kinship value. (Only the upper triangle of this matrix is output).

More precisely, Suppose *X* is the *L×N* matrix of
genotypes, with variants indexed by row.
Let *f _{i}* be an estimate of the frequency of the

*i*th variant. We write

*Z*for the matrix

*X*after centring and rescaling each row based on the allele frequency,

*Z _{i·} = (X_{i·} - mean(X_{i·})) / √ (2 f_{i} (1-f_{i}))*

QCTOOL estimates the kinship matrix as *1/L Z^t Z*.
In forming *Z*, QCTOOL uses a posterior estimate of allele
frequency *f _{i}* under a

*Beta(2,2)*distribution, i.e.

*f*where

_{i}= (1+N_{b})/(2+2N))*N*is the count number of 'b' alleles in the data. This can be understood as implicitly adding a single haplotype of each allelic type to the data before computing the frequency, which in turn ensures that the frequency estimate is not zero or 1.

_{b}`-UDUT`

option can be used to compute a UDUT decomposition (i.e. an eigendecomposition)
of the computed kinship matrix.
E.g.
*N×(N+1)*matrix in which the first column represents the diagonal elements of

*D*, i.e. the eigenvalues, and the following

*N*columns are the right eigenvectors (i.e. the columns of

*U*). To additionally output principal components (PCs), additionally add the

`-PCs`

option:
The argument is the number of PCs to output.

**Note**: the PCs computed are simply rescaled entries of the right eigenvectors;
they are computed as *PC _{i}* = √(1/L) × U

_{·i}D

^{-1/2}. This scaling ensures the PCs do not grow with the number of variants.

**Note:** PCs are output to the file specified by `-osample`

.
Depending on the command line, other values might also be output to this file. For example,
if you specify both `-sample-stats`

and `-PCs`

, the output file will contain both
per-sample summary statistics and PCs.
See the page
on summary statistic file formats for more information on the format of the output.

In some contexts it may be preferable to load a previously
computed kinship matrix, rather than to recompute a new one. This can be acheived with the
`-load-kinship`

option:

`-loadings`

option, e.g.:
`-PCs`

option can again be used to adjust how many loadings are computed.
**Note**: you should ensure the same set of variants is used to compute loadings as were used in constructing the kinship matrix.

`-project-onto`

option: