Researchers use AI to develop artificial proteins

Scientists and engineers have long sought to harness this power by designing artificial proteins that can perform new tasks, like treat disease, capture carbon, or harvest energy.
Jeff Rowe

We’ve become accustomed to hearing about AI being used to diagnose diseases and identify potential new potential medicines, to name just two of the uses to which the new technology has been put in healthcare.   

But reproduce one of the biological building blocks of life?

According to news out of the University of Chicago, a team lead by researchers in the Pritzker School of Molecular Engineering (PME) has developed an AI-process that can review protein information culled from genome databases and construct artificial proteins that they say rival those found in nature.

“We have all wondered how a simple process like evolution can lead to such a high-performance material as a protein,” said Rama Ranganathan, Joseph Regenstein Professor in the Department of Biochemistry and Molecular Biology, Pritzker Molecular Engineering, and the College. “We found that genome data contains enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to bottle nature’s rules to create proteins ourselves.”

The results of the project were published July 24 in the journal Science.

Proteins are made up of hundreds or thousands of amino acids, and these amino acid sequences specify the protein’s structure and function. But understanding just how to build these sequences to create novel proteins has been challenging. Past work has resulted in methods that can specify structure, but function has been more elusive.

What Ranganathan and his collaborators realized over the past 15 years is that genome databases—which are growing exponentially—contain enormous amounts of information about the basic rules of protein structure and function.

“We generally assume that to build something, you have to first deeply understand how it works,” Ranganathan said. “But if you have enough data examples, you can use deep learning methods to learn the rules of design, even as you are understanding how it works or why it’s built that way.”

His group developed mathematical models based on this data and then began using machine-learning methods to reveal new information about proteins’ basic design rules. They created synthetic genes to encode for the proteins, cloned them into bacteria, and watched as the bacteria then made the synthetic proteins using their normal cellular machinery. They found that the artificial proteins had the same catalytic function as the natural chorismate mutase proteins.

Because the design rules are so relatively simple, the number of artificial proteins that researchers could potentially create with them is extremely large.

“The constraints are much smaller than we ever imagined they would be,” Ranganathan said. “There is a simplicity in nature’s design rules, and we believe similar approaches could help us search for models for design in other complex systems in biology, like ecosystems or the brain.”

Though AI revealed the design rules, Ranganathan and his collaborators still don’t fully understand why the models work. Next they will work to understand just how the models came to this conclusion. 

“There is much more work to be done,” he said.