Is Genome Grammar Just a Figure of Speech?
Evolution News & Views
Without controversy, everyone knows texts written by human beings are intelligently designed, unless they are ramblings from a maniac. Human language is characterized by syntax (rules of word order) and grammar (rules of spelling, part of speech, case, person, number, voice, tense, mood, etc.). But that's not enough. One can follow the rules and make up nonsense sentences, such as "Banana voices improve inspirational aardvark submarines." For effective communication, language must also make sense. That's semantics: the communication of meaningful ideas.
Does DNA meet these conditions? How proper is it to discuss the DNA code as a language? We should point out that software meets the requirements of language. It, too, has rules of syntax, grammar, and semantics, even though humans do not speak it in conversation. The sender and receiver do not have to be sentient beings; software can drive actions on spacecraft, for example. We must also expand the range of languages beyond the English of this paragraph to include all languages, past and present, used by humans, plus all forms of coded information (including Morse code, sign language, mathematics, and even the rope-knot language of the Incas). Rules can vary widely between languages, but they all have in common the purposeful communication of meaningful information, regardless of the substrate or carrier of the symbols used.
Analogies can enlighten, but they can also mislead. We in intelligent design circles are prone to assume the comparison between DNA and language, but we must beware of drawing the analogy too tightly, even though most scientists unhesitatingly speak of the "genetic code" and the "language of life." Dubious comparisons have been made between brains and computers, as Dr. Michael Egnor has pointed out. Sometimes the differences are more significant than the similarities.
With these caveats in mind, we can examine the analogy between language and genomics. In a Commentary article in the Proceedings of the National Academy of Sciences, Scott Barolo from University of Michigan Medical School refers several times to the grammar and syntax of DNA. He's not trying to prove genes are analogous to language; the comparison just comes naturally as he discusses the roles of enhancers in gene expression.
Every cell's genome contains two main classes of functional DNA. The best understood type of DNA sequence, which was also the first to be discovered, is that which encodesRNA and protein products via the near-universal "genetic code". A more mysterious but equally important class of functional DNA is cis-regulatory sequence, which does not have a physical product but, instead,encodes the conditions under which a particular RNA will be produced.Cis-regulatory DNA sequences are the primary (although not the only) determinant of gene expression: Not only the rate of RNA production but also the timing, spatial patterning, and environmental control of every gene's activity are largely controlled by these DNA sequences, which are usually in the general vicinity of the gene they regulate. These DNA segments (often called enhancers) have no enzymatic activity on their own, but act as scaffolds for large complexes of proteins and RNAs that directly control the activity of a gene's promoter, sometimes over distances of 1 million base pairs or more. Transcription factors (TFs), key protein regulators of gene expression, bind DNA in a sequence-specific manner, which means that the nature of the complex assembled at a given enhancer at a given timedepends on its DNA sequence (cisinformation), in conjunction with the set of TFs present and active in the cell at that time (trans information). The chromatin state of regulatory DNA is also a very important factor, but local chromatin states are alsospecified, indirectly, by DNAsequence, because most chromatin-modifying enzymes are recruited to the genome by sequence-specific factors. [Emphasis added.]
So far, Barolo has only compared DNA to codes where the specific sequence matters. Next, he adds references to grammar and syntax:
In part, the DNA sequence of an enhancer encodes a pattern of gene expression through the combinations of sequence-specific regulators that bind to it (and the non-DNA-binding factors they recruit, in turn). However, cis-regulatory DNA sequences do more than merely determine which combinations of TFs will be recruited to an enhancer. The linear arrangement of binding sites (sometimes called "grammar" or "syntax") can play an important role in controlling enhancer output, especially by setting thresholds for gene activation.
By putting "grammar" and "syntax" in quotes, Barolo is acknowledging similarity but not necessarily identity.
In texts, we don't see alphabet letters swooping down and binding onto existing words or paragraphs to give them more "expression" or regulate what parts are readable. That's one important difference. In software, however, routines can be switched on or off depending on environmental factors. Also, in routing, tags can be attached to packets of information that permit or block their transmission, or determine their destinations. Those tags are very sequence-specific. One wrong character can disrupt function.
Barolo turns his attention to another paper inPNAS by Farley et al., who "investigate how the binding affinity and the spatial arrangement of TF binding sites within enhancers interact toencode precise patterns of gene expression in developing embryos." The scientists are basically trying to learn a foreign language by watching what certain words or paragraphs do. If they see one sequence (a binding site) activated by an enhancer, they might be able to predict other binding sites. The "spatial arrangement" refers to placement of binding sites within the sequence, analogous to word order in text, or to start-stop codes for switches in software routines. The analogy may not be exact, he suggests. We just don't know enough yet.
Taking the rules of syntax into account may provide an essentialsource of information for predicting enhancer outputs. However, theimportance of syntax/grammar for enhancer function is still debated; what has been needed is more experimental data.
While we still don't fully understand the rules of DNA, the assumption of syntax and grammar appears to be a fruitful heuristic. One only has to observe how accurate the DNA software runs!
In their new report, Farley et al. propose an interesting idea: Because current methods of enhancer prediction focus on high-affinity TF binding sites, perhaps they are best at identifying those enhancers that are the least subject to constraints on binding site syntax. If true, this approach would tend to paint a biased picture of the constraints that shape enhancer sequences. Now that a more complete view of the relationship between DNA sequenceand enhancer function is slowly emerging (Fig. 1), future predictive methods may be more effective at computing gene expression patterns from genomic DNA sequences. After all, that is a task that cells perform every day, with astonishing reliability and precision.
Let's take stock of the analogy between genomics and language.
- We know in human language and in software that "astonishing reliability and precision" depend strongly on sequence specificity. Unclear writing reduces information. Sloppy coding crashes or reduces functionality.
- The placement of binding sites, whether on cis (same) or trans (opposite) sides of the DNA double helix, matters. Similar constraints obtain for language and software code: word order matters in most languages.
- TFs must match their binding sites. In languages, subjects and verbs must agree. In codes, arguments for routines must follow the rules for length, character type, and order.
- Barolo's title, "How to tune an enhancer," refers to the ability of binding sites and transcription factors to affect "not only the rate of RNA expression but also the timing, spatial patterning, and environmental control of every gene's activity." One might compare this to keywords in internet searches, tags on social media messages, or prefixes and suffixes in certain languages that affect the meaning.
- DNA can be translated from the DNA code into the protein code. This resembles translators (whether human or machine) that can convert English into Chinese. Both have to share the same language convention.
- DNA uses molecules that symbolize or represent something else. Similarly, languages use symbols whose shapes do not necessarily represent what they mean.
An analogy is more useful the more factors it shares with its referent, and the fewer it doesn't share, especially when the shared factors are the most significant ones. Human language may lack the 3-D shape and activity of DNA "letters" but shares the most important characteristic: reliable communication of information for a functional result. It also shares requirements for specificity and rules of operation. If anything, the comparison is from the lesser to the greater: if human language is designed, how much more the "astonishing reliability and precision" of the genome?