Predicting sequence variant deleteriousness in genomes of livestock species

More Info
expand_more

Abstract

Illuminating the functional part of the genome of livestock species has the potential to facilitate precision breeding and to accelerate improvements. Identifying functional and potentially deleterious mutations can provide breeders with crucial information to tackle inbreeding depression or to increase the overall health of their populations and animal welfare. By performing Genome Wide Association Studies (GWAS) the genome can be interrogated for mutations that co-occur with a phenotype of interest. However, every GWAS delivers a large number of potentially functionally important single nucleotide polymorphisms (SNPs). The exact effect of each of these SNPs is often not known, especially for SNPs in noncoding sequences. Investigating each candidate SNP variantin detail is laborious and, eventually, infeasible, given the sheer number of variants. Thus, there is a strong need for approaches to select the most promising SNP candidates. Prioritizing variants, in particular, SNPs, has seen major developments in recent years which led to several discoveries and insights inheritable diseases of humans. Despite their great economical value, for livestock and other non-human species, this development is lagging behind.A major contributing factor to the deficit in prioritization tools for non-human species is a lack of genomic annotations. In this thesis, we translated one of the currently popular SNP prioritization tools, CADD (Combined Annotation-Dependent Depletion), to mouse (mCADD) and performed an experiment in which we simulated a decrease in the number of available genomic annotations.These results showed that following the CADD approach to predict the putative deleteriousness of SNPs is meaningful in a non-human species, even when fewer genomic annotations are available than for the human case. This motivated us to build various CADD-like SNP prioritization tools for livestock species, in particular for pig (pCADD) and chicken (chCADD). We validated the pig prioritization tool on a set of well-known functional pig variants. Further, we showed how functional and non-functional parts of the pig genome are scored differently by pCADD. In collaboration with the breeding industry, we built upon the pCADD scores and implemented them in a pipeline to identify likely causal variants in GWAS. To this end, we utilized SNPs that were found significant in GWAS based on SNP-array data and found variants with high pCADD scores in whole genome sequence data that are in linkage disequilibrium with high GWAS-scoring SNPs. Thus, these pCADD-identified SNPs are likely (causal) functional candidates for the phenotypes tested. We also identified several expression quantitative loci (eQTL) variants, SNPs that explain observed differences in gene expression, which we were able to validate using RNA-seq data. This demonstrated the power of this new tool and its usefulness in identifying novel, functional variants. For chicken, we used the chCADD to interrogate highly conserved elements in the chicken genome. Here we found that, despite being highly conserved, not all parts of these elements might be functionally active. chCADD differentiates between regions within each conserved element that are predicted to be functionally different. Taken together, the results presented in this thesis demonstrate SNP prioritization can successfully be done in non-human species, which can greatly assist breeders and animal geneticists in their work to illuminate the functional genome.