Automatic gene function prediction in the 2020’s

Review (2020)
Author(s)

Stavros Makrodimitris (Keygene N.V., TU Delft - Pattern Recognition and Bioinformatics)

Roeland C.H.J. Van Ham (Keygene N.V., TU Delft - Pattern Recognition and Bioinformatics)

Marcel J.T. Reinders (TU Delft - Pattern Recognition and Bioinformatics, Leiden University Medical Center)

DOI related publication
https://doi.org/10.3390/genes11111264 Final published version
More Info
expand_more
Publication Year
2020
Language
English
Issue number
11
Volume number
11
Article number
1264
Pages (from-to)
1-18
Downloads counter
278
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.