Specified Complexity and a Tale of Ten Malibus
Yesterday in my series on specified complexity, I promised to show how all this works with an example of cars driving along a road. The example, illustrating what a given value of specified complexity means, is adapted from section 3.6 of the second edition of The Design Inference, from which I quote extensively. Suppose you witness ten brand new Chevy Malibus drive past you on a public road in immediate, uninterrupted succession. The question that crosses your mind is this: Did this succession of ten brand new Chevy Malibus happen by chance?
Your first reaction might be to think that this event is a publicity stunt by a local Chevy dealership. In that case, the succession would be due to design rather than to chance. But you don’t want to jump to that conclusion too quickly. Perhaps it is just a lucky coincidence. But if so, how would you know? Perhaps the coincidence is so improbable that no one should expect to observe it as happening by chance. In that case, it’s not just unlikely that you would observe this coincidence by chance; it’s unlikely that anyone would. How, then, do you determine whether this succession of identical cars could reasonably have resulted by chance?
Obviously, you will need to know how many opportunities exist to observe this event. It’s estimated that in 2019 there were 1.4 billion motor vehicles on the road worldwide. That would include trucks, but to keep things simple let’s assume all of them are cars. Although these cars will appear on many different types of roads, some with traffic so sparse that ten cars in immediate succession would almost never happen, to say nothing of ten cars having the same late make and model, let’s give chance every opportunity to succeed by assuming that all these cars are arranged in one giant succession of 1.4 billion cars arranged bumper to bumper.
But it’s not enough to look at one static arrangement of all these 1.4 billion cars. Cars are in motion and continually rearranging themselves. Let’s therefore assume that the cars completely reshuffle themselves every minute, and that we might have the opportunity to see the succession of ten Malibus at any time across a hundred years. In that case, there would be no more than 74 quadrillion opportunities for ten brand new Chevy Malibus to line up in immediate, uninterrupted succession.
So, how improbable is this event given these 1.4 billion cars and their repeated reshuffling? To answer this question requires knowing how many makes and models of cars are on the road and their relative proportions (let’s leave aside how different makes are distributed geographically, which is also relevant, but introduces needless complications for the purpose of this illustration). If, per impossibile, all cars in the world were brand new Chevy Malibus, there would be no coincidence to explain. In that case, all 1.4 billion cars would be identical, and getting ten of them in a row would be an event of probability 1 regardless of reshuffling.
But Clearly, Nothing Like That Is the Case
Go to Cars.com, and using its car-locater widget you’ll find 30 popular makes and over 60 “other” makes of vehicles. Under the make of Chevrolet, there are over 80 models (not counting variations of models — there are five such variations under the model Malibu). Such numbers help to assess whether the event in question happened by chance. Clearly, the event is specified in that it answers to the short description “ten new Chevy Malibus in a row.” For the sake of argument, let’s assume that achieving that event by chance is going to be highly improbable given all the other cars on the road and given any reasonable assumptions about their chance distribution.
But there’s more work to do in this example to eliminate chance. No doubt, it would be remarkable to see ten new Chevy Malibus drive past you in immediate, uninterrupted succession. But what if you saw ten new red Chevy Malibus in a row drive past you? That would be even more striking now that they all also have the same color. Or what about simply ten new Chevies in a row? That would be less striking. But note how the description lengths covary with the probabilities: “ten new red Chevy Malibus in a row” has a longer description length than “ten new Chevy Malibus in a row,” but it corresponds to an event of smaller probability than the latter. Conversely, “ten new Chevies in a row” has shorter description length than “ten new Chevy Malibus in a row,” but it corresponds to an event of larger probability than the latter.
What we find in examples like this is a tradeoff between description length and probability of the event described (a tradeoff that specified complexity models). In a chance elimination argument, we want to see short description length combined with small probability (implying a larger value of specified complexity). But typically these play off against each other. “Ten new red Chevy Malibus in a row” corresponds to an event of smaller probability than “ten new Chevy Malibus in a row,” but its description length is slightly longer. Which event seems less readily ascribable to chance (or, we might say, worthier of a design inference)? A quick intuitive assessment suggests that the probability decrease outweighs the increase in description length, and so we’d be more inclined to eliminate chance if we saw ten new red Chevy Malibus in a row as opposed to ten of any color.
The lesson here is that probability and description length are in tension, so that as one goes up the other tends to go down, and that to eliminate chance both must be suitably low. We see this tension by contrasting “ten new Chevy Malibus in a row” with “ten new Chevies in a row,” and even more clearly with simply “ten Chevies in a row.” The latter has a shorter description length (lower description length) but also much higher probability. Intuitively, it is less worthy of a design inference because the increase in probability so outweighs the decrease in description length. Indeed, ten Chevies of any make and model in a row by chance doesn’t seem farfetched given the sheer number of Chevies on the road, certainly in the United States.
But There’s More
Why focus simply on Chevy Malibus? What if the make and model varied, so that the cars in succession were Honda Accords or Porsche Carreras or whatever? And what if the number of cars in succession varied, so it wasn’t just 10 but also 9 or 20 or whatever? Such questions underscore the different ways of specifying a succession of identical cars. Any such succession would have been salient if you witnessed it. Any such succession would constitute a specification if the description length were short enough. And any such succession could figure into a chance elimination argument if both the description length and the probability were low enough. A full-fledged chance-elimination argument in such circumstances would then factor in all relevant low-probability, low-description-length events, balancing them so that where one is more, the other is less.
All of this can, as we by now realize, be recast in information-theoretic terms. Thus, a probability decrease corresponds to a Shannon information increase, and a description length increase corresponds to a Kolmogorov information increase. Specified complexity, as their difference, now has the following property (we assume, as turns out to be reasonable, that some fine points from theoretical computer science, such as the Kraft inequality, are approximately applicable): if the specified complexity of an event is greater than or equal to n bits, then the grand event consisting of all events with at least that level of specified complexity has probability less than or equal to 2^(–n). This is a powerful result and it provides a conceptually clean way to use specified complexity to eliminate chance and infer design.
Essentially, what specified complexity does is consider an archer with a number of arrows in his quiver and a number of targets of varying size on a wall, and asks what is the probability that any one of these arrows will by chance land on one of these targets. The arrows in the quiver correspond to complexity, the targets to specifications. Raising the number 2 to the negative of specified complexity as an exponent then becomes the grand probability that any of these arrows will hit any of these targets by chance.
Conclusion
Formally, the specified complexity of an event is the difference between its Shannon information and its Kolmogorov information. Informally, the specified complexity of an event is a combination of two properties, namely, that the event has small probability and that it has a description of short length. In the formal approach to specified complexity, we speak of algorithmic specified complexity. In the informal approach, we speak of intuitive specified complexity. But typically it will be clear from context which sense of the term “specified complexity” is intended.
In this series, we’ve defined and motivated algorithmic specified complexity. But we have not provided actual calculations of it. For calculations of algorithmic specified complexity as applied to real-world examples, I refer readers to sections 6.8 and 7.6 in the second edition of The Design Inference. Section 6.8 looks at general examples whereas section 7.6 looks at biological examples. In each of these sections, my co-author Winston Ewert and I examine examples where specified complexity is low, not leading to a design inference, and also where it is high, leading to a design inference.
For instance, in section 6.8 we take the so-called “Mars face,” a naturally occurring structure on Mars that looks like a face, and contrast it with the faces on Mount Rushmore. We argue that the specified complexity of the Mars face is too small to justify a design inference but that the specified complexity of the faces on Mount Rushmore is indeed large enough to justify a design inference.
Similarly, in section 7.6, we take the binding of proteins to ATP, as in the work of Anthony Keefe and Jack Szostak, and contrast it with the formation of protein folds in beta-lactamase, as in the work of Douglas Axe. We argue that the specified complexity of random ATP binding is close to 0. In fact, we calculate a negative value of the specified complexity, namely, –4. On the other hand, for the evolvability of a beta-lactamase fold, we calculate a specified complexity of 215, which corresponds to a probability of 2^(–215), or roughly a probability of 1 in 10^65.
With all these numbers, we estimate a Shannon information and a Kolmogorov information and then calculate a difference. The validity of these estimates and the degree to which they can be refined can be disputed. But the underlying formalism of specified complexity is rock solid. The details of that formalism and its applications go beyond a series titled “Specified Complexity Made Simple.” Those details can all be found in the second edition of The design inference
No comments:
Post a Comment