C. Groß | TU Delft Repository

Accelerated discovery of functional genomic variation in pigs

Journal article (2021) - Martijn F.L. Derks, Christian Groß, Marcos S. Lopes, Marcel .J.T. Reinders, Mirte Bosse, Arne B. Gjuvsland, Dick de Ridder, Hendrik-Jan Megens, Martien A.M. Groenen

The genotype-phenotype link is a major research topic in the life sciences but remains highly complex to disentangle. Part of the complexity arises from the number of genes contributing to the observed phenotype. Despite the vast increase of molecular data, pinpointing the causal variant underlying a phenotype of interest is still challenging. In this study, we present an approach to map causal variation and molecular pathways underlying important phenotypes in pigs. We prioritize variation by utilizing and integrating predicted variant impact scores (pCADD), functional genomic information, and associated phenotypes in other mammalian species. We demonstrate the efficacy of our approach by reporting known and novel causal variants, of which many affect non-coding sequences. Our approach allows the disentangling of the biology behind important phenotypes by accelerating the discovery of novel causal variants and molecular mechanisms affecting important phenotypes in pigs. This information on molecular mechanisms could be applicable in other mammalian species, including humans. ...

Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD

Journal article (2020) - Christian Groß, Chiara Bortoluzzi, Dick de Ridder, Hendrik-Jan Megens, Martien A.M. Groenen, Marcel Reinders, Mirte Bosse

The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken. ...

Predicting sequence variant deleteriousness in genomes of livestock species

Doctoral thesis (2020) - C. Groß, M.J.T. Reinders, D. de Ridder

Illuminating the functional part of the genome of livestock species has the potential to facilitate precision breeding and to accelerate improvements. Identifying functional and potentially deleterious mutations can provide breeders with crucial information to tackle inbreeding depression or to increase the overall health of their populations and animal welfare. By performing Genome Wide Association Studies (GWAS) the genome can be interrogated for mutations that co-occur with a phenotype of interest. However, every GWAS delivers a large number of potentially functionally important single nucleotide polymorphisms (SNPs). The exact effect of each of these SNPs is often not known, especially for SNPs in noncoding sequences. Investigating each candidate SNP variantin detail is laborious and, eventually, infeasible, given the sheer number of variants. Thus, there is a strong need for approaches to select the most promising SNP candidates. Prioritizing variants, in particular, SNPs, has seen major developments in recent years which led to several discoveries and insights inheritable diseases of humans. Despite their great economical value, for livestock and other non-human species, this development is lagging behind.A major contributing factor to the deficit in prioritization tools for non-human species is a lack of genomic annotations. In this thesis, we translated one of the currently popular SNP prioritization tools, CADD (Combined Annotation-Dependent Depletion), to mouse (mCADD) and performed an experiment in which we simulated a decrease in the number of available genomic annotations.These results showed that following the CADD approach to predict the putative deleteriousness of SNPs is meaningful in a non-human species, even when fewer genomic annotations are available than for the human case. This motivated us to build various CADD-like SNP prioritization tools for livestock species, in particular for pig (pCADD) and chicken (chCADD). We validated the pig prioritization tool on a set of well-known functional pig variants. Further, we showed how functional and non-functional parts of the pig genome are scored differently by pCADD. In collaboration with the breeding industry, we built upon the pCADD scores and implemented them in a pipeline to identify likely causal variants in GWAS. To this end, we utilized SNPs that were found significant in GWAS based on SNP-array data and found variants with high pCADD scores in whole genome sequence data that are in linkage disequilibrium with high GWAS-scoring SNPs. Thus, these pCADD-identified SNPs are likely (causal) functional candidates for the phenotypes tested. We also identified several expression quantitative loci (eQTL) variants, SNPs that explain observed differences in gene expression, which we were able to validate using RNA-seq data. This demonstrated the power of this new tool and its usefulness in identifying novel, functional variants. For chicken, we used the chCADD to interrogate highly conserved elements in the chicken genome. Here we found that, despite being highly conserved, not all parts of these elements might be functionally active. chCADD differentiates between regions within each conserved element that are predicted to be functionally different. Taken together, the results presented in this thesis demonstrate SNP prioritization can successfully be done in non-human species, which can greatly assist breeders and animal geneticists in their work to illuminate the functional genome. ...

Illuminating the functional part of the genome of livestock species has the potential to facilitate precision breeding and to accelerate improvements. Identifying functional and potentially deleterious mutations can provide breeders with crucial information to tackle inbreeding depression or to increase the overall health of their populations and animal welfare. By performing Genome Wide Association Studies (GWAS) the genome can be interrogated for mutations that co-occur with a phenotype of interest. However, every GWAS delivers a large number of potentially functionally important single nucleotide polymorphisms (SNPs). The exact effect of each of these SNPs is often not known, especially for SNPs in noncoding sequences. Investigating each candidate SNP variantin detail is laborious and, eventually, infeasible, given the sheer number of variants. Thus, there is a strong need for approaches to select the most promising SNP candidates. Prioritizing variants, in particular, SNPs, has seen major developments in recent years which led to several discoveries and insights inheritable diseases of humans. Despite their great economical value, for livestock and other non-human species, this development is lagging behind.A major contributing factor to the deficit in prioritization tools for non-human species is a lack of genomic annotations. In this thesis, we translated one of the currently popular SNP prioritization tools, CADD (Combined Annotation-Dependent Depletion), to mouse (mCADD) and performed an experiment in which we simulated a decrease in the number of available genomic annotations.These results showed that following the CADD approach to predict the putative deleteriousness of SNPs is meaningful in a non-human species, even when fewer genomic annotations are available than for the human case. This motivated us to build various CADD-like SNP prioritization tools for livestock species, in particular for pig (pCADD) and chicken (chCADD). We validated the pig prioritization tool on a set of well-known functional pig variants. Further, we showed how functional and non-functional parts of the pig genome are scored differently by pCADD. In collaboration with the breeding industry, we built upon the pCADD scores and implemented them in a pipeline to identify likely causal variants in GWAS. To this end, we utilized SNPs that were found significant in GWAS based on SNP-array data and found variants with high pCADD scores in whole genome sequence data that are in linkage disequilibrium with high GWAS-scoring SNPs. Thus, these pCADD-identified SNPs are likely (causal) functional candidates for the phenotypes tested. We also identified several expression quantitative loci (eQTL) variants, SNPs that explain observed differences in gene expression, which we were able to validate using RNA-seq data. This demonstrated the power of this new tool and its usefulness in identifying novel, functional variants. For chicken, we used the chCADD to interrogate highly conserved elements in the chicken genome. Here we found that, despite being highly conserved, not all parts of these elements might be functionally active. chCADD differentiates between regions within each conserved element that are predicted to be functionally different. Taken together, the results presented in this thesis demonstrate SNP prioritization can successfully be done in non-human species, which can greatly assist breeders and animal geneticists in their work to illuminate the functional genome.

PCADD

SNV prioritisation in Sus scrofa

Journal article (2020) - Christian Groß, Martijn Derks, Hendrik Jan Megens, Mirte Bosse, Martien A.M. Groenen, Marcel Reinders, Dick De Ridder

Background: In animal breeding, identification of causative genetic variants is of major importance and high economical value. Usually, the number of candidate variants exceeds the number of variants that can be validated. One way of prioritizing probable candidates is by evaluating their potential to have a deleterious effect, e.g. by predicting their consequence. Due to experimental difficulties to evaluate variants that do not cause an amino-acid substitution, other prioritization methods are needed. For human genomes, the prediction of deleterious genomic variants has taken a step forward with the introduction of the combined annotation dependent depletion (CADD) method. In theory, this approach can be applied to any species. Here, we present pCADD (p for pig), a model to score single nucleotide variants (SNVs) in pig genomes. Results: To evaluate whether pCADD captures sites with biological meaning, we used transcripts from miRNAs and introns, sequences from genes that are specific for a particular tissue, and the different sites of codons, to test how well pCADD scores differentiate between functional and non-functional elements. Furthermore, we conducted an assessment of examples of non-coding and coding SNVs, which are causal for changes in phenotypes. Our results show that pCADD scores discriminate between functional and non-functional sequences and prioritize functional SNVs, and that pCADD is able to score the different positions in a codon relative to their redundancy. Taken together, these results indicate that based on pCADD scores, regions with biological relevance can be identified and distinguished according to their rate of adaptation. Conclusions: We present the ability of pCADD to prioritize SNVs in the pig genome with respect to their putative deleteriousness, in accordance to the biological significance of the region in which they are located. We created scores for all possible SNVs, coding and non-coding, for all autosomes and the X chromosome of the pig reference sequence Sscrofa11.1, proposing a toolbox to prioritize variants and evaluate sequences to highlight new sites of interest to explain biological functions that are relevant to animal breeding. ...

Background: In animal breeding, identification of causative genetic variants is of major importance and high economical value. Usually, the number of candidate variants exceeds the number of variants that can be validated. One way of prioritizing probable candidates is by evaluating their potential to have a deleterious effect, e.g. by predicting their consequence. Due to experimental difficulties to evaluate variants that do not cause an amino-acid substitution, other prioritization methods are needed. For human genomes, the prediction of deleterious genomic variants has taken a step forward with the introduction of the combined annotation dependent depletion (CADD) method. In theory, this approach can be applied to any species. Here, we present pCADD (p for pig), a model to score single nucleotide variants (SNVs) in pig genomes. Results: To evaluate whether pCADD captures sites with biological meaning, we used transcripts from miRNAs and introns, sequences from genes that are specific for a particular tissue, and the different sites of codons, to test how well pCADD scores differentiate between functional and non-functional elements. Furthermore, we conducted an assessment of examples of non-coding and coding SNVs, which are causal for changes in phenotypes. Our results show that pCADD scores discriminate between functional and non-functional sequences and prioritize functional SNVs, and that pCADD is able to score the different positions in a codon relative to their redundancy. Taken together, these results indicate that based on pCADD scores, regions with biological relevance can be identified and distinguished according to their rate of adaptation. Conclusions: We present the ability of pCADD to prioritize SNVs in the pig genome with respect to their putative deleteriousness, in accordance to the biological significance of the region in which they are located. We created scores for all possible SNVs, coding and non-coding, for all autosomes and the X chromosome of the pig reference sequence Sscrofa11.1, proposing a toolbox to prioritize variants and evaluate sequences to highlight new sites of interest to explain biological functions that are relevant to animal breeding.

A survey of functional genomic variation in domesticated chickens

Journal article (2018) - Martijn F.L. Derks, Hendrik-Jan Megens, Martien A.M. Groenen, Mirte Bosse, Jeroen Visscher, Katrijn Peeters, Marco C.A.M. Bink, Addie Vereijken, Christian Gross, Dick de Ridder, Marcel Reinders

Background: Deleterious genetic variation can increase in frequency as a result of mutations, genetic drift, and genetic hitchhiking. Although individual effects are often small, the cumulative effect of deleterious genetic variation can impact population fitness substantially. In this study, we examined the genome of commercial purebred chicken lines for deleterious and functional variations, combining genotype and whole‑genome sequence data.
Results: We analysed over 22,000 animals that were genotyped on a 60 K SNP chip from four purebred lines (two white egg and two brown egg layer lines) and two crossbred lines. We identified 79 haplotypes that showed a significant deficit in homozygous carriers. This deficit was assumed to stem from haplotypes that potentially harbour lethal recessive variations. To identify potentially deleterious mutations, a catalogue of over 10 million variants was derived from 250 whole‑genome sequenced animals from three purebred white‑egg layer lines. Out of 4219 putative delete rious variants, 152 mutations were identified that likely induce embryonic lethality in the homozygous state. Inferred deleterious variation showed evidence of purifying selection and deleterious alleles were generally overrepresented in regions of low recombination. Finally, we found evidence that mutations, which were inferred to be evolutionally intolerant, likely have positive effects in commercial chicken populations.
Conclusions: We present a comprehensive genomic perspective on deleterious and functional genetic variation in egg layer breeding lines, which are under intensive selection and characterized by a small effective population size. We show that deleterious variation is subject to purifying selection and that there is a positive relationship between recombination rate and purging efficiency. In addition, multiple putative functional coding variants were discovered in selective sweep regions, which are likely under positive selection. Together, this study provides a unique molecular
perspective on functional and deleterious variation in commercial egg‑laying chickens, which can enhance current genomic breeding practices to lower the frequency of undesirable variants in the population.

...

Background: Deleterious genetic variation can increase in frequency as a result of mutations, genetic drift, and genetic hitchhiking. Although individual effects are often small, the cumulative effect of deleterious genetic variation can impact population fitness substantially. In this study, we examined the genome of commercial purebred chicken lines for deleterious and functional variations, combining genotype and whole‑genome sequence data.
Results: We analysed over 22,000 animals that were genotyped on a 60 K SNP chip from four purebred lines (two white egg and two brown egg layer lines) and two crossbred lines. We identified 79 haplotypes that showed a significant deficit in homozygous carriers. This deficit was assumed to stem from haplotypes that potentially harbour lethal recessive variations. To identify potentially deleterious mutations, a catalogue of over 10 million variants was derived from 250 whole‑genome sequenced animals from three purebred white‑egg layer lines. Out of 4219 putative delete rious variants, 152 mutations were identified that likely induce embryonic lethality in the homozygous state. Inferred deleterious variation showed evidence of purifying selection and deleterious alleles were generally overrepresented in regions of low recombination. Finally, we found evidence that mutations, which were inferred to be evolutionally intolerant, likely have positive effects in commercial chicken populations.
Conclusions: We present a comprehensive genomic perspective on deleterious and functional genetic variation in egg layer breeding lines, which are under intensive selection and characterized by a small effective population size. We show that deleterious variation is subject to purifying selection and that there is a positive relationship between recombination rate and purging efficiency. In addition, multiple putative functional coding variants were discovered in selective sweep regions, which are likely under positive selection. Together, this study provides a unique molecular
perspective on functional and deleterious variation in commercial egg‑laying chickens, which can enhance current genomic breeding practices to lower the frequency of undesirable variants in the population.

Predicting variant deleteriousness in non-human species

Applying the CADD approach in mouse

Journal article (2018) - Christian Groß, Dick de Ridder, Marcel Reinders

Background: Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species. Results: Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good. Conclusions: It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation. ...