Protein Designers Explore Sequence Space
The twenty major amino acids used in life as we know it can be assembled in countless ways. What portion of that vast sequence space is functional? This question has had a long history among Darwin skeptics because the answer contributes to probability calculations for assessing the explanatory power of chance vs design for the origin of life.
Historical Background
The Wistar Institute symposium in 1966 has often been cited by ID advocates as a death knell for hopes that functional proteins would spontaneously arise by chance. Around this time in the late 1960s, about a decade after Francis Crick had proposed his famous “sequence hypothesis” for DNA and proteins, my father James F. Coppedge recognized the informational character of biomolecules. Working on a graduate degree in chemistry at UCLA, he attempted to estimate the “usable” portion of sequence space by analogy with useful combinations of letters in English words and sentences. He tested the analogy by searching through tens of thousands of random letters. In his 1973 book,1 he applied his rough estimate of useful text strings arising by random selections to argue for the extreme improbability of arriving at a single usable protein by chance, even granting a world-sized primordial soup of plentiful amino acids combining under ideal conditions at fantastically rapid rates.
In 1984, Thaxton, Bradley and Olsen in their book The Mystery of Life’s Origin (updated in 2020), wrote about the formidable challenge of overcoming “configurational entropy” in sequence space. Douglas Axe, in his book Undeniable (2016), wrote about biochemistry experiments he performed to determine the limits of functionality by seeing how far a well-studied protein could be altered and still perform. His calculations, along with my father’s memorable “amoeba analogy” from his book (ch. 7) led to an episode in the Illustra Media film Origin (excerpted in their shorter video First Life).
William Dembski and Stephen Meyer have also discussed at length the informational nature of protein sequences and the probabilistic resources for accounting for them by chance in their books.2 Studies like these have all agreed that functional proteins occupy an infinitesimal fraction of sequence space, like a vanishingly small box in the corner of a sheet of graph paper.
The New Explorers
The arrival of AI tools such as AlphaFold that can predict protein folds for computer-generated polypeptides has opened up new ways to explore functional portions of sequence space outside of biology.3 In a fascinating News Feature in Nature on October 15, Ewen Callaway told about international contests to find new proteins. Promises of lucrative prizes are motivating explorers from around the world to join “protein-design competitions [that] aim to sift out the functional from the fantastical.” Notice the key word design:
Contests have driven key scientific advances in the past, particularly for the field of protein-structure prediction. This latest crop of competitions is drawing people from around the world into the related field of protein design by lowering the barrier to entry. It could also quicken the pace of validation and standards development and perhaps help to foster community. “It will push the field forward and test methods more quickly,” says Noelia Ferruz Capapey, a computational biologist at the Centre for Genomic Regulation in Barcelona, Spain.
The tournaments bypass the stodgy method of grant application, peer review and publication, speeding discovery and stimulating involvement. Callaway describes half a dozen competitions generating tens of thousands of candidate sequences, even from “people with no professional experience in biology” using their gaming computers at home.
Englert says that the high-quality entries from people who aren’t established researchers reminds him of the garage-tinkering origins of Apple, Microsoft and other tech giants. “It would have taken them two years of studying and joining a lab to get to the point where they can get started. Here they can do it over a weekend.” He imagines a future in which freelance protein designers vie for bounties set by companies, academic labs and others seeking a custom molecule.
Is This Evolution?
These contests are goal-directed with specific criteria, such as “looking for proteins capable of attaching to a growth hormone receptor called EGFR that is overactive in many cancers.” Another contest “tasked entrants with re-engineering an existing protein — a plant-virus enzyme used widely in protein purification — to make the molecule more efficient.”
Efforts at this kind of “directed evolution” have been around for a long time in labs. As Dembski explains in No Free Lunch and The Design Inference 2nd Ed, these “evolutionary algorithms” are not random searches comparable to natural selection, which must survive at each mutation, but intelligently guided, goal-directed projects. In the contests described by Callaway, success for the contestants is judged by a sequence’s match to a foreordained goal: it must fold, and it must bind to a specified molecule. A contestant may attempt random searches in sequence space but has the intelligence to determine whether a sequence meets the criteria.4 Even if the contestant does not know in advance what approach will be successful, he or she can perform an intelligently guided “search for a search” as if looking through a pile of treasure maps to identify which is best for locating a treasure.
It is misleading, therefore, to call a contest “Evolved 2024” or to name a new AI biology startup “EvolutionaryScale.” These have nothing to do with Darwinian evolution. This type of equivocation confuses the public. It resembles Darwin’s own blunder in comparing natural selection to artificial selection, a fallacy he continued all his life.5
Intelligence Far Surpasses the Reach of Chance
The capabilities of intelligence over chance are profound. My father calculated that on average it would take chance 1,500 years (“If a person could draw and record one coin every five seconds day and night”) to arrange coins numbered one to ten in order—something an eight-year-old child could do in a few moments (p. 51). From there, he calculated how long it would take to expect success by chance at arranging the phrase “The Theory of Evolution” from a set containing lower- and uppercase letters and a space. The probability was 1 in 4.5 x 1039. Envisioning a machine attempting this project that could perform a billion draws per second at the speed of light, he concluded that the time required to expect one success would be 28 trillion times the assumed age of the earth. Then he compared it to the capabilities of a child:
So chance requires twenty-eight trillion times the age of the earth to write merely the phrase: “The Theory of Evolution,” drawing from a set of small letters and capitals as described, drawing at the speed of light, a billion draws per second! Only once in that time could the letters be expected in proper order.
Again, a child can do this, using sight and intelligence, in a few minutes at most. Mind makes the difference in the two methods. Chance really “doesn’t have a chance” when compared with the intelligent purpose of even a child.
If chance had to rely on earthquakes and wind to do the job, it would never happen.6
While we can hope for revolutionary insights from the contests to find new proteins, they will come about by intelligent design, not by evolution.
Notes
Coppedge, Dr James F, Evolution: Possible or Impossible? (Zondervan, 1973). This book was one of the few pre-ID Movement publications to use the phrase “intelligent design.” After eight printings, his popular and influential book went out of print but he self-published it through 2002. I have the remaining stock of copies for those interested. A digitized version is available at this link: http://crev.epoi
Dembski, The Design Revolution (2004), ch 9-10; The Design Inference (2nd ed., 2023). Meyer, Signature in the Cell (2009), ch 8-10.
To make exploration of sequence space somewhat tractable, one must assume using only the canonical amino acids and assume they were already left-handed and join solely with peptide bonds at the proper linkages. Chance, of course, wouldn’t care about those details.
Success depends on context. One of the longest meaningful alphabet sequences my father detected was “AGMCAP”—an imaginative stretch, but potentially useful in some contexts (p. 104). Protein sequences are even more demanding since they must fold and perform a useful function in three dimensions within a cell.
Robert Shedinger, Darwin’s Bluff (2024), p.71-78, 171-172, 199-200.
This is not an exaggerated claim. Dr. A. E. Wilder-Smith debunked the old Huxley analogy of a million monkeys typing Shakespeare given enough time with the observation that biochemical reactions are reversible. The monkey-typewriter analogy depends on assuming that the letters stay on the page. If they fall off soon after they are typed, a Shakespeare sonnet will never emerge. In biochemistry, peptide bonds fall apart in water. A growing random chain, therefore, would not survive for long in the best of real-world conditions, nor would any progress in the meaningful alphabet string survive the next quake or gust of wind.