AMES, Iowa – When computer scientists face a major challenge, they often compete with one another to see who can come up with the best solution. The inclination to compete is hardwired into the culture, said Iddo Friedberg, an associate professor of veterinary microbiology and preventive medicine at Iowa State University and chair of the bioinformatics and computational biology graduate program.
So when Friedberg saw that the ability of scientists to sequence the genomes of organisms far surpassed their ability to determine the function of individual genes, it felt natural to put together a formal competition to see who could write the best software to predict gene function. Friedberg and a multi-institutional team created CAFA, or the Critical Assessment of Function Annotation. CAFA is an international competition held every three years that evaluates dozens of software programs designed to help biologists assign function to the countless genes they come across sifting through the genomes of organisms.
The competition acts as a proof of concept for new computational approaches for predicting gene function, but it also advances science’s understanding of genomics. Friedberg and his collaborators published the findings of their most recent competition in the scientific journal Genome Biology.
Understanding the functions of genes
Scientists have sequenced the genomes of a multitude of organisms, from humans to corn to flies. Those genomes contain all the genetic material of an organism, but they don’t tell scientists anything about the functions of individual genes. Scientists have to employ other means to figure that out, usually through experiments.
“Genomes are like trying to read a book in a language you only partially understand,” Friedberg said. “Genome sequencing – identifying the letters and words – is relatively easy, but understanding the actual meaning of the words is more difficult.”
He said biology depends increasingly on computational science, and predictive computer programs can be a powerful tool in figuring out what genes do. The CAFA competition aims to provide an objective, side-by-side comparison of the various approaches computational biologists are using, so other biologists studying genomes can make informed decisions about what tools might work best for their research, said Naihui Zhou, a graduate student in Friedberg’s lab who oversaw much of the most recent competition.
“Biologists need software that predicts outputs, and they want to use the best prediction method available,” Zhou said. “The competition shows biologists what kind of software is out there and how it performs.”
The most recent competition included 144 entries from 68 teams, most of which came from universities but some from private firms as well. To judge the programs fairly, Friedberg’s team works with biologists who experimentally identify the functions of genes but haven’t yet publicly disseminated their results. The entrants run their predictive software programs on the genomes to see how accurately they predict the functions of the genes in question. The programs receive an evaluation across a wide range of metrics and standards, Zhou said.
In addition to providing software designers with a proving ground for their products, the competition also advances scientists’ understanding of genomics, Friedberg said. The latest competition asked the entrants to look for genes that affect long-term memory in multiple organisms, including flies and humans. The competition uncovered the functions of more than 20 new genes, which could have implications for understanding human memory and neurodegenerative disorders.
And the competition sparks a feeling of community and healthy competition among the software developers studying genomics, Friedberg said.
“This is a naturally competitive community, so having this competition helps to foster a community where students talk with faculty and with each other,” he said. “Sometimes those conversations spark disagreement, but it keeps the community moving forward.”