Studying Biology with System Engineering Principles
In the IEEE Open Journal of Systems Engineering, I recently co-authored a paper with Dr. Gerald Fudge at Texas A&M on the intersection of biology and engineering. Our paper does two things: 1) It lays out a methodology based on systems engineering for biologists. 2) It illustrates the usefulness of the methodology with a case study of glycolysis.
The project was inspired a couple of years back when I read Uri Alon’s An Introduction to Systems Biology, which made me realize that biologists could benefit from the same engineering approaches used to build iPhones. These approaches could lead to uncovering the intricate designs in life.
As a biologist, I’ve often wondered what the best way is to integrate engineering ideas in biology research. While there are many methods, one way engineering can assist the everyday biologist is in providing a robust methodology for approaching the study of living systems. A great illustration is the paper, “Can a Biologist Fix a Radio?” The punchline is that a handyman can fix a radio, but a biologist probably can’t — and this has nothing to do with IQ but everything to do with methodology. (Lazebnik 2002)
Current practice in biology does not involve a formal methodology for reverse engineering a system. Instead, biologists are taught the scientific method, which is very useful for rigorously testing hypotheses, along with a reductionistic bottom-up processes of interrogation. Different from these is a methodology that helps one understand and interrogate a complex system. Having identified this problem, Dr. Fudge, a long-time engineer, and I teamed up to work on integrating the proven systems engineering methodology to enhance discovery in living organisms.
Proven in What Way?
I used the word “proven” because systems engineering has built amazing technology, from rockets to iPhones. It has a track record of being able to develop complex systems. The standard systems engineering process goes something like this. Engineers meet with stakeholders and are given a rough outline of requirements (verbal or written details about what the product should do). This is eventually formalized into a set of specific requirements and then often modeled using a systems engineering tool. More specific models are then developed, from which a variety of refinements result. Then construction begins. Construction of the smaller parts happens first, followed by the assembly of subsystems. Throughout this build phase, testing is ongoing, and all is compared with the list of requirements and the initial systems model. Eventually a completed product is produced, meeting the stakeholders’ expectations. Or that is the goal, anyway.
Dr. Fudge and I adapted this methodology for biology. We call it model-based reverse systems engineering (MBRSE). “Model-based,” because it utilizes a system model as a map to keep track of relationships between objects and processes. “Reverse,” because the goal of biology is to understand and predict how organisms function. “Systems,” because this approach utilizes requirements and modeling to tie components into a system-level design, illustrating how the whole is more than the sum of its parts.
To Start with Literature Mining
Our approach, as in biology, begins with observations via literature mining. However, these observations are guided by classic systems engineering questions. Those include: (1) What requirements is this system meeting? (2) What are its interfaces? (3) What are the associated derived requirements? (4) What predictions can we make, whether at the system, sub-system, or component level, based on these derived requirements? From observations, our methodology shifts quickly into a more traditional systems engineering approach, where we infer requirements from observations and build a system model (in our case we used OPCloud). Building a system model starts with qualitative conceptual modeling and can be followed by more specific computational modeling. Conceptual modeling, to my surprise, is highly accessible to biologists. It is more like creating a map than it is like quantitative modeling. Yet it serves as a critical foundation for quantitative modeling since it sets relationships between objects and processes through a formal language. This also allows for errors to be identified early. Once the system model and requirements are developed, which often identifies key knowledge gaps since it is a methodical process, one can make predictions, test, and then validate experimentally and update the model and requirements based on observed results. This is an iterative process where the goal is to develop a list of requirements and a systems model that accurately reflect the biological system or organism.
A Case Study of Glycolysis
In our paper, to illustrate the utility of our approach, we use glycolysis as a case study. Glycolysis is reasonably well understood and is familiar to many non-biologists since most high school biology courses teach the basics of this crucial metabolic pathway.
Similarities and Differences in Glycolysis by Systems Engineering
Before we talk about similarities and differences in glycolysis across different types of organisms, it’s important to define a term: topology. Topology refers to the overall metabolic sequence — i.e., the ordering of the pathway steps that lead from, say, glucose to ATP and the intermediates that are produced along this pathway. It has been noted for glycolysis that among different types of organisms there are both remarkable similarities (for example, most organisms use one of two topological patterns for catabolism of glucose, commonly the EMP or ED topology) and remarkable differences (while the topology is conserved, the DNA sequences of the enzymes used in the pathway are not). (Rivas, Becerra, and Lazcano 2018) The high degree of similarity for the topology of the pathway across different organisms led many to assume that the uniformity resulted from common ancestry, and also to expect a common ancestry pattern for the genetic sequences of the enzymes. But this hypothesis overlooked system requirement-based reasons for topological similarity. As we write in our paper:
Traditionally, uniformity has been attributed as an artifact of common descent, meaning uniformity resulted from a historical relationship between all living organisms and does not have functional importance. However, in systems engineering, uniformity at a low level in a system design is often an optimized solution to upper-level requirements. We therefore propose that the striking similarity in the topology and metabolites of glycolysis across organisms is driven by a requirement for compatibility between organism energy interfaces, aiming to maximize efficiency at the ecosystem level.
Fudge and Reeves 2024
Ecosystem requirements shape the design of organisms, which in turn influence the requirements of metabolic design, ultimately constraining the structure of lower subsystems like glycolysis. This is because higher-level system needs determine the architecture of the subsystems below them. For glycolysis, a need for ecosystem efficiency and optimization of energy catabolism is a hypothesis with increasing evidentiary support that best explains the uniformity of the glycolytic topology. First, ecosystem efficiency requires some level of biomass commonality to maximize thermodynamic efficiency in reusing complex molecules by minimizing the amount of required biomolecule break-down and rebuild. This also helps minimize waste buildup, as shared waste products simplify the maintenance of ecosystem homeostasis. Second, the glycolytic pathway is recognized as optimized for a number of key metabolic constraints, further supporting its uniformity across species.
Ebenhöh and Heinrich [40] showed that the glycolysis architecture with a preparatory phase followed by a payoff phase is highly efficient based on kinetic and thermodynamic analysis. Similarly, Court et al. [41] discovered that the payoff phase has a maximally efficient throughput rate. In 2010, Noor et al. [42] demonstrated that the 10 step glycolytic pathway is minimally complex, with all glycolytic intermediates essential either for building biomass or for ATP production. In fact, it turns out that glycolysis is Pareto-optimized to maximize efficiency while serving multiple, often competing, purposes. Ng et al. [43] published their analysis in 2019 by analyzing over 10000 possible routes between glucose and pyruvate to show that the two primary glycolysis variant pathways are Pareto-optimized to balance ATP production against biomass production while simultaneously minimizing protein synthesis cost.
Fudge and Reeves 2024
In contrast, the differences in glycolytic enzyme or transporter sequences amongst organisms seem to be due to lower subsystem design requirements and constraints, which are expected to reflect more organism-specific differences. In our paper, we discuss the example of mammalian glucose transporters, which have 14 subtypes, only four of which are well characterized. (Thorens and Mueckler 2010) Of the four, each plays a unique role in system level glucose control within the mammalian system. Thus, differences in glucose transporters are explainable by their tissue-adapted roles. Similarly, differences between the glycolytic enzymes themselves are poorly correlated with ancestry and have led to complete dismissal of the previous assumption that the pathway had a single evolutionary origin. (Rivas, Becerra, and Lazcano 2018) Instead, evidence continues to accumulate that glycolytic enzyme differences between organisms play functional roles due to the unique subsystem environments in which they are placed.
The Warburg Effect and Cancer Research
Using our system engineering approach, we also generated a hypothesis for the Warburg effect, which is a well understood phenomenon in many cancer types. Briefly, the Warburg effect is preferential use of glucose in cancer via upregulation (i.e., increased usage) of glycolysis even in the presence of oxygen. This is often thought to be a deleterious byproduct of cancer, but our paper proposes a new perspective. Our hypothesis is that the Warburg effect is a normal system response to local organism injury or other temporary situations that require rapid tissue growth, such as during certain early developmental stages. Cancer occurs when the signal to turn off rapid tissue growth fails. The downstream effect is the continued signal for upregulated glycolysis, hence the Warburg effect. From our paper:
Under certain (currently unknown) conditions, the feedback control loop for injury response can be broken, resulting in an under-controlled or completely uncontrolled response. In other words, we hypothesize a cellular level failure in the control system that upregulates cellular processes for division including glycolysis such that the rate of glycolysis is unconstrained at the cellular level. Note that all four proposed functions of the Warburg effect, plus its ability to support cellular metabolism if the oxygen supply is interrupted due to local loss of normal blood flow, are beneficial for tissue repair after an injury where 1) there might be reduced oxygen, 2) faster cell division and local ATP energy supply is needed, and 3) more biomass is required. A similar situation can occur during early organism development when tissue growth is more rapid than in the adult stage, and in which the blood supply is developing simultaneously.
Fudge and Reeves 2024
To our surprise, in our literature search, we found little about the Warburg effect as a critical part of injury repair. An exception was Heiden et al., who suggested that the increased cellular division rate associated with the Warburg effect can be beneficial in tissue repair as well as in immune responses. (Vander Heiden, Cantley, and Thompson 2009) We propose that this could be a very important area for investigation. Research that focuses on feedback mechanisms in the control system responsible for the rate of glycolysis upregulation should be able to verify or falsify our hypothesis.
A Useful Design-Based Tool
Engineering is a design-driven field, born from the creativity of intelligent human agents. Many tools developed in the field have applications in biology. For example, the MBRSE approach overcomes a key challenge facing biology: many biological objects and processes are not linked to system-level requirements. Without these connections, a divide occurs between the structure of components and how they fit into the system’s function. On a personal note, one aspect of system modeling that I find particularly appealing is its use of formal relationships and structured language. Once you’re familiar with the tool, it becomes much easier to identify connections between subsystems or constraints, even when looking at a different system model. This offers a major advantage over the inconsistent, often free-form diagrams found in biology research papers, where each tends to differ from the next. Another benefit of systems modeling is that it organizes information from research papers in a structured, graphical manner. No matter how brilliant a researcher is, it’s impossible to keep track of information from thousands of papers. However, a systems model can do that. It’s remarkable that while these modeling tools are standard in engineering, they are largely absent from biological training, despite the clear benefit they offer in overcoming the inconsistencies of biological diagrams.
Our reverse systems engineering approach is motivated by some key observations:
Biological systems look as if they are designed; for example, Francis Crick cautions biologists about using evolutionary ideas to guide research because biological systems look designed though he thinks they evolved (Campana 2000). Even Richard Dawkins admitted in The God Delusion, “The illusion of design is so powerful that one has to keep reminding oneself that it is indeed an illusion.”
Biological systems have much in common with human engineered systems (Csete and Doyle 2002); and
Biological systems exhibit features such as modularity, robustness, and design re-use (Alon 2003) that are traditionally associated with good top-down engineering practices.
These observations suggest that from a pragmatic perspective, the best approach to reverse engineer biological systems will be to treat them as if they are the result of a top-down requirements-driven systems engineering process.
It is good news, then, that design-based tools and hypotheses play an increasingly prominent role in biology, offering a clear, coherent path to understanding biological complexity. From this understanding, more than a few deeper philosophical questions arise.
References
Alon, U. 2003. “Biological Networks: The Tinkerer as an Engineer.” Science (New York, NY) 301 (5641): 1866-67.
Campana, Joey. 2000. “The Design Isomorph and Isomorphic Complexity.” Nature Reviews Molecular Cell Biology, 149-53.
Csete, Marie E., and John C. Doyle. 2002. “Reverse Engineering of Biological Complexity.” Science (New York, NY) 295 (5560): 1664-69.
Fudge, Gerald L., and Emily Brown Reeves. 2024. “A Model-Based Reverse System Engineering Methodology for Analyzing Complex Biological Systems with a Case Study in Glycolysis.” IEEE Open Journal of Systems Engineering 2:119–34.
Lazebnik, Yuri. 2002. “Can a Biologist Fix a radio? — Or, What I Learned While Studying Apoptosis.” Cancer Cell2 (3): 179–82.
Rivas, Mario, Arturo Becerra, and Antonio Lazcano. 2018. “On the Early Evolution of Catabolic Pathways: A Comparative Genomics Approach. I. the Cases of Glucose, Ribose, and the Nucleobases Catabolic Routes.” Journal of Molecular Evolution 86 (1): 27–46.
Thorens, Bernard, and Mike Mueckler. 2010. “Glucose Transporters in the 21st Century.” American Journal of Physiology. Endocrinology and Metabolism 298 (2): E141-45.
Vander Heiden, Matthew G., Lewis C. Cantley, and Craig B. Thompson. 2009. “Understanding the Warburg Effect: The Metabolic Requirements of Cell Proliferation.” Science 324 (5930): 1029-33.