Search This Blog

Thursday, 31 October 2024

On reverse engineering JEHOVAH'S tech

 Studying Biology with System Engineering Principles


In the IEEE Open Journal of Systems Engineering, I recently co-authored a paper with Dr. Gerald Fudge at Texas A&M on the intersection of biology and engineering. Our paper does two things: 1) It lays out a methodology based on systems engineering for biologists. 2) It illustrates the usefulness of the methodology with a case study of glycolysis. 

The project was inspired a couple of years back when I read Uri Alon’s An Introduction to Systems Biology, which made me realize that biologists could benefit from the same engineering approaches used to build iPhones. These approaches could lead to uncovering the intricate designs in life. 

As a biologist, I’ve often wondered what the best way is to integrate engineering ideas in biology research. While there are many methods, one way engineering can assist the everyday biologist is in providing a robust methodology for approaching the study of living systems. A great illustration is the paper, “Can a Biologist Fix a Radio?” The punchline is that a handyman can fix a radio, but a biologist probably can’t — and this has nothing to do with IQ but everything to do with methodology. (Lazebnik 2002)

Current practice in biology does not involve a formal methodology for reverse engineering a system. Instead, biologists are taught the scientific method, which is very useful for rigorously testing hypotheses, along with a reductionistic bottom-up processes of interrogation. Different from these is a methodology that helps one understand and interrogate a complex system. Having identified this problem, Dr. Fudge, a long-time engineer, and I teamed up to work on integrating the proven systems engineering methodology to enhance discovery in living organisms.

Proven in What Way?

I used the word “proven” because systems engineering has built amazing technology, from rockets to iPhones. It has a track record of being able to develop complex systems. The standard systems engineering process goes something like this. Engineers meet with stakeholders and are given a rough outline of requirements (verbal or written details about what the product should do). This is eventually formalized into a set of specific requirements and then often modeled using a systems engineering tool. More specific models are then developed, from which a variety of refinements result. Then construction begins. Construction of the smaller parts happens first, followed by the assembly of subsystems. Throughout this build phase, testing is ongoing, and all is compared with the list of requirements and the initial systems model. Eventually a completed product is produced, meeting the stakeholders’ expectations. Or that is the goal, anyway.

Dr. Fudge and I adapted this methodology for biology. We call it model-based reverse systems engineering (MBRSE). “Model-based,” because it utilizes a system model as a map to keep track of relationships between objects and processes. “Reverse,” because the goal of biology is to understand and predict how organisms function. “Systems,” because this approach utilizes requirements and modeling to tie components into a system-level design, illustrating how the whole is more than the sum of its parts.

To Start with Literature Mining

Our approach, as in biology, begins with observations via literature mining. However, these observations are guided by classic systems engineering questions. Those include: (1) What requirements is this system meeting? (2) What are its interfaces? (3) What are the associated derived requirements? (4) What predictions can we make, whether at the system, sub-system, or component level, based on these derived requirements? From observations, our methodology shifts quickly into a more traditional systems engineering approach, where we infer requirements from observations and build a system model (in our case we used OPCloud). Building a system model starts with qualitative conceptual modeling and can be followed by more specific computational modeling. Conceptual modeling, to my surprise, is highly accessible to biologists. It is more like creating a map than it is like quantitative modeling. Yet it serves as a critical foundation for quantitative modeling since it sets relationships between objects and processes through a formal language. This also allows for errors to be identified early. Once the system model and requirements are developed, which often identifies key knowledge gaps since it is a methodical process, one can make predictions, test, and then validate experimentally and update the model and requirements based on observed results. This is an iterative process where the goal is to develop a list of requirements and a systems model that accurately reflect the biological system or organism.

A Case Study of Glycolysis

In our paper, to illustrate the utility of our approach, we use glycolysis as a case study. Glycolysis is reasonably well understood and is familiar to many non-biologists since most high school biology courses teach the basics of this crucial metabolic pathway.

Similarities and Differences in Glycolysis by Systems Engineering 

Before we talk about similarities and differences in glycolysis across different types of organisms, it’s important to define a term: topology. Topology refers to the overall metabolic sequence — i.e., the ordering of the pathway steps that lead from, say, glucose to ATP and the intermediates that are produced along this pathway. It has been noted for glycolysis that among different types of organisms there are both remarkable similarities (for example, most organisms use one of two topological patterns for catabolism of glucose, commonly the EMP or ED topology) and remarkable differences (while the topology is conserved, the DNA sequences of the enzymes used in the pathway are not). (Rivas, Becerra, and Lazcano 2018) The high degree of similarity for the topology of the pathway across different organisms led many to assume that the uniformity resulted from common ancestry, and also to expect a common ancestry pattern for the genetic sequences of the enzymes. But this hypothesis overlooked system requirement-based reasons for topological similarity. As we write in our paper:

Traditionally, uniformity has been attributed as an artifact of common descent, meaning uniformity resulted from a historical relationship between all living organisms and does not have functional importance. However, in systems engineering, uniformity at a low level in a system design is often an optimized solution to upper-level requirements. We therefore propose that the striking similarity in the topology and metabolites of glycolysis across organisms is driven by a requirement for compatibility between organism energy interfaces, aiming to maximize efficiency at the ecosystem level.

Fudge and Reeves 2024

Ecosystem requirements shape the design of organisms, which in turn influence the requirements of metabolic design, ultimately constraining the structure of lower subsystems like glycolysis. This is because higher-level system needs determine the architecture of the subsystems below them. For glycolysis, a need for ecosystem efficiency and optimization of energy catabolism is a hypothesis with increasing evidentiary support that best explains the uniformity of the glycolytic topology. First, ecosystem efficiency requires some level of biomass commonality to maximize thermodynamic efficiency in reusing complex molecules by minimizing the amount of required biomolecule break-down and rebuild. This also helps minimize waste buildup, as shared waste products simplify the maintenance of ecosystem homeostasis. Second, the glycolytic pathway is recognized as optimized for a number of key metabolic constraints, further supporting its uniformity across species.

Ebenhöh and Heinrich [40] showed that the glycolysis architecture with a preparatory phase followed by a payoff phase is highly efficient based on kinetic and thermodynamic analysis. Similarly, Court et al. [41] discovered that the payoff phase has a maximally efficient throughput rate. In 2010, Noor et al. [42] demonstrated that the 10 step glycolytic pathway is minimally complex, with all glycolytic intermediates essential either for building biomass or for ATP production. In fact, it turns out that glycolysis is Pareto-optimized to maximize efficiency while serving multiple, often competing, purposes. Ng et al. [43] published their analysis in 2019 by analyzing over 10000 possible routes between glucose and pyruvate to show that the two primary glycolysis variant pathways are Pareto-optimized to balance ATP production against biomass production while simultaneously minimizing protein synthesis cost.

Fudge and Reeves 2024

In contrast, the differences in glycolytic enzyme or transporter sequences amongst organisms seem to be due to lower subsystem design requirements and constraints, which are expected to reflect more organism-specific differences. In our paper, we discuss the example of mammalian glucose transporters, which have 14 subtypes, only four of which are well characterized. (Thorens and Mueckler 2010) Of the four, each plays a unique role in system level glucose control within the mammalian system. Thus, differences in glucose transporters are explainable by their tissue-adapted roles. Similarly, differences between the glycolytic enzymes themselves are poorly correlated with ancestry and have led to complete dismissal of the previous assumption that the pathway had a single evolutionary origin. (Rivas, Becerra, and Lazcano 2018) Instead, evidence continues to accumulate that glycolytic enzyme differences between organisms play functional roles due to the unique subsystem environments in which they are placed.

The Warburg Effect and Cancer Research

Using our system engineering approach, we also generated a hypothesis for the Warburg effect, which is a well understood phenomenon in many cancer types. Briefly, the Warburg effect is preferential use of glucose in cancer via upregulation (i.e., increased usage) of glycolysis even in the presence of oxygen. This is often thought to be a deleterious byproduct of cancer, but our paper proposes a new perspective. Our hypothesis is that the Warburg effect is a normal system response to local organism injury or other temporary situations that require rapid tissue growth, such as during certain early developmental stages. Cancer occurs when the signal to turn off rapid tissue growth fails. The downstream effect is the continued signal for upregulated glycolysis, hence the Warburg effect. From our paper: 

Under certain (currently unknown) conditions, the feedback control loop for injury response can be broken, resulting in an under-controlled or completely uncontrolled response. In other words, we hypothesize a cellular level failure in the control system that upregulates cellular processes for division including glycolysis such that the rate of glycolysis is unconstrained at the cellular level. Note that all four proposed functions of the Warburg effect, plus its ability to support cellular metabolism if the oxygen supply is interrupted due to local loss of normal blood flow, are beneficial for tissue repair after an injury where 1) there might be reduced oxygen, 2) faster cell division and local ATP energy supply is needed, and 3) more biomass is required. A similar situation can occur during early organism development when tissue growth is more rapid than in the adult stage, and in which the blood supply is developing simultaneously.

Fudge and Reeves 2024

To our surprise, in our literature search, we found little about the Warburg effect as a critical part of injury repair. An exception was Heiden et al., who suggested that the increased cellular division rate associated with the Warburg effect can be beneficial in tissue repair as well as in immune responses. (Vander Heiden, Cantley, and Thompson 2009) We propose that this could be a very important area for investigation. Research that focuses on feedback mechanisms in the control system responsible for the rate of glycolysis upregulation should be able to verify or falsify our hypothesis.

A Useful Design-Based Tool

Engineering is a design-driven field, born from the creativity of intelligent human agents. Many tools developed in the field have applications in biology. For example, the MBRSE approach overcomes a key challenge facing biology: many biological objects and processes are not linked to system-level requirements. Without these connections, a divide occurs between the structure of components and how they fit into the system’s function. On a personal note, one aspect of system modeling that I find particularly appealing is its use of formal relationships and structured language. Once you’re familiar with the tool, it becomes much easier to identify connections between subsystems or constraints, even when looking at a different system model. This offers a major advantage over the inconsistent, often free-form diagrams found in biology research papers, where each tends to differ from the next. Another benefit of systems modeling is that it organizes information from research papers in a structured, graphical manner. No matter how brilliant a researcher is, it’s impossible to keep track of information from thousands of papers. However, a systems model can do that. It’s remarkable that while these modeling tools are standard in engineering, they are largely absent from biological training, despite the clear benefit they offer in overcoming the inconsistencies of biological diagrams. 

Our reverse systems engineering approach is motivated by some key observations: 

Biological systems look as if they are designed; for example, Francis Crick cautions biologists about using evolutionary ideas to guide research because biological systems look designed though he thinks they evolved (Campana 2000). Even Richard Dawkins admitted in The God Delusion, “The illusion of design is so powerful that one has to keep reminding oneself that it is indeed an illusion.”
Biological systems have much in common with human engineered systems (Csete and Doyle 2002); and
Biological systems exhibit features such as modularity, robustness, and design re-use (Alon 2003) that are traditionally associated with good top-down engineering practices.
These observations suggest that from a pragmatic perspective, the best approach to reverse engineer biological systems will be to treat them as if they are the result of a top-down requirements-driven systems engineering process.

It is good news, then, that design-based tools and hypotheses play an increasingly prominent role in biology, offering a clear, coherent path to understanding biological complexity. From this understanding, more than a few deeper philosophical questions arise.

References

Alon, U. 2003. “Biological Networks: The Tinkerer as an Engineer.” Science (New York, NY) 301 (5641): 1866-67.
Campana, Joey. 2000. “The Design Isomorph and Isomorphic Complexity.” Nature Reviews Molecular Cell Biology, 149-53.
Csete, Marie E., and John C. Doyle. 2002. “Reverse Engineering of Biological Complexity.” Science (New York, NY) 295 (5560): 1664-69.
Fudge, Gerald L., and Emily Brown Reeves. 2024. “A Model-Based Reverse System Engineering Methodology for Analyzing Complex Biological Systems with a Case Study in Glycolysis.” IEEE Open Journal of Systems Engineering 2:119–34.
Lazebnik, Yuri. 2002. “Can a Biologist Fix a radio? — Or, What I Learned While Studying Apoptosis.” Cancer Cell2 (3): 179–82.
Rivas, Mario, Arturo Becerra, and Antonio Lazcano. 2018. “On the Early Evolution of Catabolic Pathways: A Comparative Genomics Approach. I. the Cases of Glucose, Ribose, and the Nucleobases Catabolic Routes.” Journal of Molecular Evolution 86 (1): 27–46.
Thorens, Bernard, and Mike Mueckler. 2010. “Glucose Transporters in the 21st Century.” American Journal of Physiology. Endocrinology and Metabolism 298 (2): E141-45.
Vander Heiden, Matthew G., Lewis C. Cantley, and Craig B. Thompson. 2009. “Understanding the Warburg Effect: The Metabolic Requirements of Cell Proliferation.” Science 324 (5930): 1029-33.

An interlude XXI

 

On the nexus of art and information.

 

Human civilization is a Greek tragedy?

 

I.D has always been mainstream

 Using AI to Discover Intelligent Design


Human senses are excellent design detectors, but sometimes they need a little help. In a recent case, AI tools were applied to aerial photographs of the Nazca plain in Peru. The algorithms, trained on known geoglyphs, were able to select hundreds of candidate sites with figures too faint for the human eye. Many of them, on closer inspection, turned out to indeed contain patterns on the ground indicative of purposeful manipulation by indigenous tribes that lived in the area long ago. 

Here is a case where humans used their intelligent design to create intelligently designed “machine intelligences” capable of detecting intelligent design. Even so, the scientists needed to use their innate design detection abilities to follow up on the AI results to validate the potential design detections. AI is a tool, not a thinker. As a tool, it offers new powers to archaeology: one of the examples of intelligent design in action in science.

The Nazca Pampa is designated a World Heritage Site by UNESCO because of its immense geoglyphs, averaging 90m in length. The well-known ones, consisting of lines, geometric figures and images of animals, were rediscovered in the early 20th century and have fascinated scientists and laypeople alike. UNESCO describes what makes them unique:

They are located in the desert plains of the basin river of Rio Grande de Nasca, the archaeological site covers an area of approximately 75,358.47 Ha where for nearly 2,000 uninterrupted years, the region’s ancient inhabitants drew on the arid ground a great variety of thousands of large scale zoomorphic and anthropomorphic figures and lines or sweeps with outstanding geometric precision, transforming the vast land into a highly symbolic, ritual and social cultural landscape that remains until today. They represent a remarkable manifestation of a common religion and social homogeneity that lasted a considerable period of time.

They are the most outstanding group of geoglyphs anywhere in the world and are unmatched in its extent, magnitude, quantity, size, diversity and ancient tradition to any similar work in the world. The concentration and juxtaposition of the lines, as well as their cultural continuity, demonstrate that this was an important and long-lasting activity, lasting approximately one thousand years.

Based on pottery fragments, the geoglyphs are dated to between at least 100 BC and possibly up to the 15th century. The spellings (Nasca vs Nazca) appear to be interchangeable. Mysteries remain about the purpose of geoglyphs, and various theories are debated. One thing is indisputable: they were designed by intelligent minds. The people made considerable effort to modify the landscape for whatever purposes that drove them. But that’s OK; ID theory can detect design without knowing the identity of the designer(s) or why they did their work. ID’s job is done when the Design Filter has ruled out chance and natural law to conclude something is the product of a designing intelligence. Discerning the purposes of designs like these are left in the capable hands of anthropologists, historians, and archaeologists, who may find themselves puzzled by some of the discoveries like the “knife-wielding killer whale” figure.

The New AI-Directed Discoveries

New detections of Nazca geoglyphs have continued slowly through the years. A team of Japanese, European, and American researchers, Sakai et al., publishing in PNAS, boasts that AI has accelerated the pace of new discoveries:

The rate of discovery of new figurative Nazca geoglyphs has been historically on the order of 1.5 a y (from 1940s to 2000s). It has accelerated due to the availability of remotely sensed high-resolution imagery to 18.7/y from 2004 to 2020. Our current work represents another 16-fold acceleration (303 new figurative geoglyphs during the 2022/23 season of field work) using big geospatial data technologies and data mining with the aid of AI. Thus, AI may be at the brink of ushering in a revolution in archaeological discoveries like the revolution aerial imaging has had on the field.

The Nazca geoglyphs can be classified as line-type (carved into the ground) or relief-type (made by aligning stones above ground). They can also be distinguished by subject matter and size. Sakai et al. surveyed the entire Nazca Pampa (629 km2), then subdivided aerial photographs with 10-cm resolution into grids. They trained their AI model on 406 relief-type glyphs and gave the AI some puzzles to solve:

To leverage the limited number of known relief-type geoglyphs, and to render the training robust, data augmentation is paramount. Hand-labeled outlines of known geoglyphs serve to pick 10 random crops from within each of the known geoglyphs. These are also randomly rotated, horizontally flipped, and color jittered. Similarly, 25 negative training images are randomly cropped from the area surrounding each known geoglyph. We set the ratio of positive to negative training images to 10:25 for a reasonable balance between precision and recall.

This method yielded 1,309 hotspots of likely geoglyphs, which the scientists classed as Rank I, II, or III from most to least likely. “Of the 303 newly discovered figurative geoglyphs,” the paper says, “178 were individually suggested by the AI and 125 were not individually AI-suggested.” It still required 2,640 labor hours of follow-up on foot and with drones to validate the AI selections. Nevertheless, this effort represented a quantum leap in design detection of glyphs with such low contrast they were barely visible to the unaided human eye.

New Scientist included photos of some of the new geoglyphs outlined for clarity. The new ones tend to be smaller and located near trails rather than larger roads, leading the scientists to surmise that they were intended for viewing by local groups instead of for community-wide religious rituals. Reporter Jeremy Hsu wrote about the need for human intelligence to corroborate the selections made by AI:

The researchers followed up on the AI suggestions and discovered a total of 303 figurative geoglyphs during field surveys in 2022 and 2023. Of these figures, 178 geoglyphs were individually identified by the AI. Another 66 were not directly pinpointed, but the researchers found them within a group of geoglyphs the AI had highlighted.

“The AI-based analysis of remote sensing data is a major step forward, since a complete map of the geoglyphs of the Nazca region is still not available,” says Karsten Lambers at Leiden University in the Netherlands. But he also cautioned that “even this new, powerful technology is more likely to find the better visible geoglyphs — the low hanging fruits — than the more difficult ones that are likely still out there”.

The authors believe that many more geoglyphs remain to be discovered in the area. Now that design has been concluded, we may understandably wonder what the people had in mind when they made these figures:

Line-type geoglyphs predominantly depict wildlife-related motifs (e.g., wild animals and plants). Most relief-type geoglyphs (81.6%) depict human motifs or motifs of things modified by humans (33.8% humanoids, 32.9% decapitated heads, and 14.9% domesticated camelids). These do not appear in the line-type figurative geoglyphs at all. Decapitated heads are sometimes depicted alone, while humanoids are repeatedly depicted with decapitated heads and together with domesticated camelids. Examples of both are shown as Insets to Fig. 5. Wild animals, which dominate the line-type geoglyphs, represent only 6.9% (47 geoglyphs) of the relief-type geoglyphs. These include bird, cat, snake, monkey, fox, killer whale, and fish.

Again, though, figuring out the meaning of the designs is not ID’s job. ID is equally valid at detecting evil designs and good designs. Undoubtedly future archaeologists might have trouble understanding 21st century graffiti if they happened upon a destroyed U.S. city without written records or history. But thanks to the Design Filter, determining whether contemporary “art” was designed or not would be a straightforward project

Rise ,fall,repeat?