Machine Learning Applications

Metabolite and reaction inference based on enzyme specificities


In biotechnology, much effort is spent on altering metabolism, mainly of industrially relevant microorganisms such as bacteria, yeasts and fungi. In most cases, the aim is to increase existing product yield or to introduce and optimize a pathway to a new product.

To be able to perform such metabolic engineering, one needs a full description of the metabolism of the species of interest: to select desired functions (enzymes) needed to introduce a new pathway, to unravel metabolic regulation, to find bottlenecks in metabolism, and to reveal undesired bypasses. Missing functions or ‘gaps’ in this metabolic network description make metabolic engineering difficult; but even when the main pathways are known, missing bypasses or cross-links may pose problems. It is therefore essential to have a full overview of all possible metabolic reactions in the cell.

A number of researchers have explored the idea of predicting metabolic reactions based on an analysis of the basic biochemical transformations performed by enzymes. Specifically in the field of the biotransformation of xenobiotics (substances foreign to a biological system), several such systems have been developed. These tools mainly consist of manually supplied reaction rules and heavily depend on user selection of feasible predicted pathways.

We developed a novel system for metabolite and reaction inference based on enzyme specificities (MaRIboES). MaRIboES employs structural and stereochemistry similarity measures and molecular fingerprints to generalize enzymatic reactions based on data available in the Braunschweig enzyme database.

Leave-one-out cross-validation shows that 80% of known reactions are predicted well. Application to the yeast glycolytic and pentose phosphate pathways predicts a large number of known and new reactions, often leading to the formation of novel compounds.  Initial validation in the lab led to the discovery of two novel enzymatic activities of decarboxylases, providing clues for understanding pathways leading to the formation of higher-order alcohols in the fermentation of wine, beer and spirits.

Matlab and C++ code is freely available at