New Book on "Junk DNA" Surveys the Functions of Non-Coding DNA
Casey Luskin
What Discovery Institute biologist Jonathan Wells calls the "myth of junk DNA," long a favorite with advocates of unguided evolution, isn't yet quite dead and buried. You still see it invoked in popular science media. Last month in The New York Times Magazine, Carl Zimmer defended the notion that our genome is mostly garbage, earning cheers from evolutionary advocates like PZ Myers and Lawrence Moran, the latter hailing Zimmer as the "best science journalist on the planet." We devoted some attention to Zimmer's article (see here andhere). But for many scientists, "junk DNA" is an idea that is increasingly untenable in light of the empirical data.
A new book from Columbia University Press, Junk DNA: A Journey Through the Dark Matter of the Genome, by virologist Nessa Carey provides a detailed review of the vast evidence being uncovered showing function for "junk DNA." She explains that junk DNA was initially "dismissed" by biologists because it was thought that if it didn't code for proteins, it didn't do anything:
For years, scientists had no explanation for why so much of our DNA doesn't code for proteins. These non-coding parts were dismissed with the term "junk DNA." But gradually this position has begun to look less tenable, for a whole host of reasons. (p. 2)
Of course in dismissing non-coding "junk" DNA, we must conclude that these same evolutionary scientists hindered research into its function.
Now Carey gives no indication that she's an ID proponent and in fact she adopts many standard evolutionary viewpoints within her book. But note how, in making her case that we ought to suspect non-coding DNA has function, she employs a curious analogy. She draws a comparison to a car factory -- something that obviously is intelligently designed:
Perhaps the most fundamental reason for the shift in emphasis is the sheer volume of junk DNA that our cells contain. One of the biggest shocks when the human genome sequence was completed in 2001 was the discovery that over 98 per cent of the DNA in a human is junk. It doesn't code for any proteins. ...
...Let's imagine we visit a car factory, perhaps for something high-end like a Ferrari. We would be pretty surprised if for every two people who were building a shiny red sports car, there were another 98 who were sitting around doing nothing. This would be ridiculous, so why would it be reasonable in our genomes? ...
...A much more likely scenario in our car factory would be that for every two people assembling a car, there are 98 others, doing all the things that keeps a business moving. Raising finances, keeping accounts, publicizing the product, processing the pensions, cleaning the toilets, selling the cars etc. This is probably a much better model for the role of junk in our genome. We can think of proteins as the final end points required for life, but they will never be properly produced and coordinated without the junk. Two people can build a car, but they can't maintain a company selling it, and certainly can't turn it into a powerful and financially successful brand. Similarly, there's no point having 98 people mopping the floors and staffing the showrooms if there's nothing to sell. The whole organization only works when all the components are in place. And so it is with our genomes. (p. 3)
Don't miss that last line: "The whole organization only works when all the components are in place. And so it is with our genomes." Doesn't that sound exactly like irreducible complexity? So here we have a biologist, unaffiliated with the intelligent-design community, arguing that junk DNA must be functional because it's like a car factory where all the components are needed in order for the entire system to function. Critics might claim that ID has had no impact on biological thinking, but the evidence shows otherwise.
Before moving on I must make a point about terminology. At the beginning of the book, Carey notes that when she uses the term "junk DNA," she does so to describe any stretch of non-coding DNA, and doesn't necessarily mean something that has no function. "Anything that doesn't code for a protein will be described as junk, as it originally was in the old days (second half of the twentieth century)," she writes. Thus, she's not affirmatively claiming the piece of "junk DNA" doesn't have function. Rather, she's referring to a type of DNA that was once thought to be functionless "junk," and now is most likely thought to have function.
Carey, in any event, goes on to explain how today we now believe that, far from being irrelevant, it's the "junk DNA" that is running the whole show:
The other shock from the sequencing of the human genome was the realisation that the extraordinary complexities of human anatomy, physiology, intelligence and behaviour cannot be explained by referring to the classical model of genes. In terms of numbers of genes that code for proteins, humans contain pretty much the same quantity (around 20,000) as simple microscopic worms. Even more remarkably, most of the genes in the worms have directly equivalent genes in humans.
As researchers deepened their analyses of what differentiates humans from other organisms at the DNA level, it became apparent that genes could not provide the explanation. In fact, only one genetic factor generally scaled with complexity. The only genomic features that increased in number as animals became more complicated were the regions of junk DNA. The more sophisticated an organism, the higher the percentage of junk DNA it contains. Only now are scientists really exploring the controversial idea that junk DNA may hold the key to evolutionary complexity. (p. 4)
Of course by "evolutionary complexity," what Carey means is genomic functionality. She goes on to spend the bulk of the book reviewing the numerous discoveries of function for non-coding "junk" DNA. Just a few of those include:
Structural roles such as packaging chromosomes and preventing DNA "from unravelling and becoming damaged," acting as "anchor points when chromosomes are shared equally between different daughter cells and during cell division," and serving as "insulation regions, restricting gene expression to specific regions of chromosomes."
Regulating gene expression, as "Thousands and thousands of regions of junk DNA are suspected to regulate networks of gene expression."
Introns are extremely important:
The bits of gobbledygook between the parts of a gene that code for amino acids were originally considered to be nothing but nonsense or rubbish. They were referred to as junk or garbage DNA, and pretty much dismissed as irrelevant. ... But we now know that they can have a very big impact. (pp. 17-18)
Preventing mutations by separating out gene-coding DNA.
Controlling telomere length that can serve as a "molecular clock" that helps control aging.
Forming the loci for centromeres.
Activating X chromosomes in females.
Producing long non-coding RNAs which regulate Hox genes or regulating brain development, or serving as attachment points for histone-modifying enzymes helping to turn genes on and off.
Serving as promoters or enhancers for genes, or imprinting control elements for "the expression of the protein-coding genes."
Producing RNA which acts "as a kind of scaffold, directing the activity of proteins to particular regions of the genome."
Producing RNAs which can fold into three-dimensional shapes and perform functions inside cells, much like enzymes, changing the shapes of other molecules, or helping to build ribosomes. As she notes: "We've actually known about these peculiar RNA molecules for decades, making it yet more surprising that we have maintained such a protein-centric vision of our genomic landscape." (p. 146)
Serving as tRNA genes which produce tRNA molecules. These genes can also serve as insulators or spacers to stop transcription from spreading from gene to gene.
Development of the fingers and face; changing eye, skin, and hair color; affecting obesity.
Gene splicing and generating spliceosomes.
Producing small RNAs which also affect gene expression.
The ENCODE Project as "Evolutionary Battleground"
Carey rightly calls the ENCODE project (Encyclopedia of DNA Elements) an "evolutionary battleground." That's because it claimed that some 80 percent of our genome is functional, a finding that some evolutionary biologists have vigorously disputed. Of course ENCODE's findings are based upon empirical data, and evolutionary objections are based upon theoretical concerns. That's why evolutionary biologists find ENCODE so threatening to their models. Though Carey doesn't mention this, Dan Graur said, "If ENCODE is right, then evolution is wrong." What's going on here?
The main evolutionary response, as Carey explains, is that "only about 5 percent of the human genome is conserved across the mammalian class." Since evolutionary biologists assume that a structure is only functional if it's under selection, non-conserved sequences suggest a sequence is not under selection pressure. Hence, they assume all that DNA can't be functional.
Carey provides a very good response to this argument, which I'll get to in a moment. But from an ID perspective there's an immediate rebuttal that is obvious: If we abandon the assumption that natural selection alone can generate functional elements in our genome, then we're freed up to consider that perhaps many functional elements of mammalian genomes did not arise by selection but were designed separately with distinct DNA sequences from the very beginning. That would result in the exact sort of data that ENCODE is revealing -- vast quantities of non-conserved functional genetic elements -- that is vexing evolutionary biologists. Indeed, other previous studies have revealed non-conserved DNA that has specific functions. As Jonathan Wells explains:
But while sequence conservation may imply function, non-conservation does not imply non-function -- as biologists have long recognized. Indeed, to whatever extent DNA differences play a role in distinguishing different species, non-conserved sequences must be functional.
Furthermore, biologists now know that as much as 30 percent of the protein-coding DNA in every organism consists of "orphan genes" that bear little or no similarity to DNA sequences in other organisms. While the functions of most orphan proteins are not yet known, few people would be so foolhardy as to suggest that they are non-functional. Yet in a search for evolutionary constraint such as the Oxford researchers used, these protein-coding regions would be judged non-functional.
Carey provides another intriguing hypothesis -- that even from an evolutionary perspective many functional genetic elements may be under relaxed selection for DNA sequence. She writes:
Protein-coding sequences are highly conserved in evolution because a particular protein is often used in more than one tissue or cell type. If the protein changed in sequence, the altered protein might function better in a particular tissue. But that same change might have a really damaging effect in another tissue that relies on the same protein. This acts as an evolutionary pressure that maintains protein sequence.
But regulatory RNAs, which don't code for proteins, tend to be more tissue-specific. Therefore they are under less evolutionary pressure because only one tissue relies on a regulatory RNA, and possibly only during certain periods of life or in response to certain environmental changes. This has removed the evolutionary brakes on the regulatory RNAs and allowed us to diverge from our mammalian cousins in these regions. (p. 195)
Her argument makes sense: genetic elements that interact with fewer aspects of an organism's genome, transcriptome, proteome, and other physiological components, might face fewer sequence-constraints than those that maybe have only one such interaction. Evolutionary biologists tend to assume that selection works the same in all physiological contexts, but if that's not true, then we could see relaxed selection allowing genetic elements to be functional and non-conserved.
Carey notes that ENCODE critics, namely evolutionary biologists, reacted harshly to ENCODE's claims that the vast majority of our genome is functional (pp. 196-197):
The most forthright responses were mainly from evolutionary biologists. This wasn't altogether surprising. Evolution is the biological discipline where emotions tend to run highest. Normally the bullets are targeted at creationists, but the Gatling guns may also be turned on other scientists. ... The angriest critique of ENCODE included the expressions "logical fallacy," "absurd conclusion," "playing fast and loose" and "used the wrong definition wrongly." Just in case we were still in doubt about their direction of travel, the authors concluded the paper with the following damning blast:
The ENCODE results were predicted by one of its lead authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass media hype, and public relationships may well have to be rewritten.
She notes: "There are interesting scientific arguments on both sides, but it would be disingenuous to believe that the amount of heat and emotion generated by ENCODE has been purely about the science. We can't ignore other, very human factors." (p. 198)
In any case, I suspect that Carey herself holds an even more pro-ENCODE view than her book lets on, and what we see may be a toned-down version of what she really thinks. She seems to be diplomatically trying to avoid the "emotional" attacks of angry evolutionary biologists who stridently oppose ENCODE.
Much Research Remains to Be Completed
Carey doesn't openly take a side in the debate over ENCODE, and she doesn't claim that our genome will eventually turn out to contain no "junk" DNA whatsoever. But she is clear that the trend line in research is away from junk DNA, and she notes that one reason for our lack of understanding of what a lot of junk DNA does is that we haven't yet developed the technologies to study it:
Part of the problem is that the systems we can use to probe the functions of junk DNA are still relatively underdeveloped. This can sometimes make it hard for researchers to use experimental approaches to test their hypotheses. We have only been working on this for a relatively short space of time. (p. 6)
In other words, it's very premature to conclude that our genome is full of truly functionless junk DNA. Lacking the technology to detect function, it's understandable why we might have missed a lot of the important functions going on in the genome.
As Carey notes, "We now know that in some cases just a single base-pair change in an apparently irrelevant region of the genome can have a definite effect" (p. 201), meaning there's a lot of work left to be done. After all, she points out: "One stretch of DNA can include a protein-coding gene, long non-coding RNAs, small RNAs, antisense RNAs, splice signal sites, untranslated regions, promoters and enhancers." (p. 287) Thus, she concludes, "When we really think about the complexity of our genomes, it isn't surprising that we can't understand everything yet." (p. 288) And that, along with much else in this excellent book, hits the nail on the head.
For years, scientists had no explanation for why so much of our DNA doesn't code for proteins. These non-coding parts were dismissed with the term "junk DNA." But gradually this position has begun to look less tenable, for a whole host of reasons. (p. 2)
Of course in dismissing non-coding "junk" DNA, we must conclude that these same evolutionary scientists hindered research into its function.
Now Carey gives no indication that she's an ID proponent and in fact she adopts many standard evolutionary viewpoints within her book. But note how, in making her case that we ought to suspect non-coding DNA has function, she employs a curious analogy. She draws a comparison to a car factory -- something that obviously is intelligently designed:
Perhaps the most fundamental reason for the shift in emphasis is the sheer volume of junk DNA that our cells contain. One of the biggest shocks when the human genome sequence was completed in 2001 was the discovery that over 98 per cent of the DNA in a human is junk. It doesn't code for any proteins. ...
...Let's imagine we visit a car factory, perhaps for something high-end like a Ferrari. We would be pretty surprised if for every two people who were building a shiny red sports car, there were another 98 who were sitting around doing nothing. This would be ridiculous, so why would it be reasonable in our genomes? ...
...A much more likely scenario in our car factory would be that for every two people assembling a car, there are 98 others, doing all the things that keeps a business moving. Raising finances, keeping accounts, publicizing the product, processing the pensions, cleaning the toilets, selling the cars etc. This is probably a much better model for the role of junk in our genome. We can think of proteins as the final end points required for life, but they will never be properly produced and coordinated without the junk. Two people can build a car, but they can't maintain a company selling it, and certainly can't turn it into a powerful and financially successful brand. Similarly, there's no point having 98 people mopping the floors and staffing the showrooms if there's nothing to sell. The whole organization only works when all the components are in place. And so it is with our genomes. (p. 3)
Don't miss that last line: "The whole organization only works when all the components are in place. And so it is with our genomes." Doesn't that sound exactly like irreducible complexity? So here we have a biologist, unaffiliated with the intelligent-design community, arguing that junk DNA must be functional because it's like a car factory where all the components are needed in order for the entire system to function. Critics might claim that ID has had no impact on biological thinking, but the evidence shows otherwise.
Before moving on I must make a point about terminology. At the beginning of the book, Carey notes that when she uses the term "junk DNA," she does so to describe any stretch of non-coding DNA, and doesn't necessarily mean something that has no function. "Anything that doesn't code for a protein will be described as junk, as it originally was in the old days (second half of the twentieth century)," she writes. Thus, she's not affirmatively claiming the piece of "junk DNA" doesn't have function. Rather, she's referring to a type of DNA that was once thought to be functionless "junk," and now is most likely thought to have function.
Carey, in any event, goes on to explain how today we now believe that, far from being irrelevant, it's the "junk DNA" that is running the whole show:
The other shock from the sequencing of the human genome was the realisation that the extraordinary complexities of human anatomy, physiology, intelligence and behaviour cannot be explained by referring to the classical model of genes. In terms of numbers of genes that code for proteins, humans contain pretty much the same quantity (around 20,000) as simple microscopic worms. Even more remarkably, most of the genes in the worms have directly equivalent genes in humans.
As researchers deepened their analyses of what differentiates humans from other organisms at the DNA level, it became apparent that genes could not provide the explanation. In fact, only one genetic factor generally scaled with complexity. The only genomic features that increased in number as animals became more complicated were the regions of junk DNA. The more sophisticated an organism, the higher the percentage of junk DNA it contains. Only now are scientists really exploring the controversial idea that junk DNA may hold the key to evolutionary complexity. (p. 4)
Of course by "evolutionary complexity," what Carey means is genomic functionality. She goes on to spend the bulk of the book reviewing the numerous discoveries of function for non-coding "junk" DNA. Just a few of those include:
Structural roles such as packaging chromosomes and preventing DNA "from unravelling and becoming damaged," acting as "anchor points when chromosomes are shared equally between different daughter cells and during cell division," and serving as "insulation regions, restricting gene expression to specific regions of chromosomes."
Regulating gene expression, as "Thousands and thousands of regions of junk DNA are suspected to regulate networks of gene expression."
Introns are extremely important:
The bits of gobbledygook between the parts of a gene that code for amino acids were originally considered to be nothing but nonsense or rubbish. They were referred to as junk or garbage DNA, and pretty much dismissed as irrelevant. ... But we now know that they can have a very big impact. (pp. 17-18)
Preventing mutations by separating out gene-coding DNA.
Controlling telomere length that can serve as a "molecular clock" that helps control aging.
Forming the loci for centromeres.
Activating X chromosomes in females.
Producing long non-coding RNAs which regulate Hox genes or regulating brain development, or serving as attachment points for histone-modifying enzymes helping to turn genes on and off.
Serving as promoters or enhancers for genes, or imprinting control elements for "the expression of the protein-coding genes."
Producing RNA which acts "as a kind of scaffold, directing the activity of proteins to particular regions of the genome."
Producing RNAs which can fold into three-dimensional shapes and perform functions inside cells, much like enzymes, changing the shapes of other molecules, or helping to build ribosomes. As she notes: "We've actually known about these peculiar RNA molecules for decades, making it yet more surprising that we have maintained such a protein-centric vision of our genomic landscape." (p. 146)
Serving as tRNA genes which produce tRNA molecules. These genes can also serve as insulators or spacers to stop transcription from spreading from gene to gene.
Development of the fingers and face; changing eye, skin, and hair color; affecting obesity.
Gene splicing and generating spliceosomes.
Producing small RNAs which also affect gene expression.
The ENCODE Project as "Evolutionary Battleground"
Carey rightly calls the ENCODE project (Encyclopedia of DNA Elements) an "evolutionary battleground." That's because it claimed that some 80 percent of our genome is functional, a finding that some evolutionary biologists have vigorously disputed. Of course ENCODE's findings are based upon empirical data, and evolutionary objections are based upon theoretical concerns. That's why evolutionary biologists find ENCODE so threatening to their models. Though Carey doesn't mention this, Dan Graur said, "If ENCODE is right, then evolution is wrong." What's going on here?
The main evolutionary response, as Carey explains, is that "only about 5 percent of the human genome is conserved across the mammalian class." Since evolutionary biologists assume that a structure is only functional if it's under selection, non-conserved sequences suggest a sequence is not under selection pressure. Hence, they assume all that DNA can't be functional.
Carey provides a very good response to this argument, which I'll get to in a moment. But from an ID perspective there's an immediate rebuttal that is obvious: If we abandon the assumption that natural selection alone can generate functional elements in our genome, then we're freed up to consider that perhaps many functional elements of mammalian genomes did not arise by selection but were designed separately with distinct DNA sequences from the very beginning. That would result in the exact sort of data that ENCODE is revealing -- vast quantities of non-conserved functional genetic elements -- that is vexing evolutionary biologists. Indeed, other previous studies have revealed non-conserved DNA that has specific functions. As Jonathan Wells explains:
But while sequence conservation may imply function, non-conservation does not imply non-function -- as biologists have long recognized. Indeed, to whatever extent DNA differences play a role in distinguishing different species, non-conserved sequences must be functional.
Furthermore, biologists now know that as much as 30 percent of the protein-coding DNA in every organism consists of "orphan genes" that bear little or no similarity to DNA sequences in other organisms. While the functions of most orphan proteins are not yet known, few people would be so foolhardy as to suggest that they are non-functional. Yet in a search for evolutionary constraint such as the Oxford researchers used, these protein-coding regions would be judged non-functional.
Carey provides another intriguing hypothesis -- that even from an evolutionary perspective many functional genetic elements may be under relaxed selection for DNA sequence. She writes:
Protein-coding sequences are highly conserved in evolution because a particular protein is often used in more than one tissue or cell type. If the protein changed in sequence, the altered protein might function better in a particular tissue. But that same change might have a really damaging effect in another tissue that relies on the same protein. This acts as an evolutionary pressure that maintains protein sequence.
But regulatory RNAs, which don't code for proteins, tend to be more tissue-specific. Therefore they are under less evolutionary pressure because only one tissue relies on a regulatory RNA, and possibly only during certain periods of life or in response to certain environmental changes. This has removed the evolutionary brakes on the regulatory RNAs and allowed us to diverge from our mammalian cousins in these regions. (p. 195)
Her argument makes sense: genetic elements that interact with fewer aspects of an organism's genome, transcriptome, proteome, and other physiological components, might face fewer sequence-constraints than those that maybe have only one such interaction. Evolutionary biologists tend to assume that selection works the same in all physiological contexts, but if that's not true, then we could see relaxed selection allowing genetic elements to be functional and non-conserved.
Carey notes that ENCODE critics, namely evolutionary biologists, reacted harshly to ENCODE's claims that the vast majority of our genome is functional (pp. 196-197):
The most forthright responses were mainly from evolutionary biologists. This wasn't altogether surprising. Evolution is the biological discipline where emotions tend to run highest. Normally the bullets are targeted at creationists, but the Gatling guns may also be turned on other scientists. ... The angriest critique of ENCODE included the expressions "logical fallacy," "absurd conclusion," "playing fast and loose" and "used the wrong definition wrongly." Just in case we were still in doubt about their direction of travel, the authors concluded the paper with the following damning blast:
The ENCODE results were predicted by one of its lead authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass media hype, and public relationships may well have to be rewritten.
She notes: "There are interesting scientific arguments on both sides, but it would be disingenuous to believe that the amount of heat and emotion generated by ENCODE has been purely about the science. We can't ignore other, very human factors." (p. 198)
In any case, I suspect that Carey herself holds an even more pro-ENCODE view than her book lets on, and what we see may be a toned-down version of what she really thinks. She seems to be diplomatically trying to avoid the "emotional" attacks of angry evolutionary biologists who stridently oppose ENCODE.
Much Research Remains to Be Completed
Carey doesn't openly take a side in the debate over ENCODE, and she doesn't claim that our genome will eventually turn out to contain no "junk" DNA whatsoever. But she is clear that the trend line in research is away from junk DNA, and she notes that one reason for our lack of understanding of what a lot of junk DNA does is that we haven't yet developed the technologies to study it:
Part of the problem is that the systems we can use to probe the functions of junk DNA are still relatively underdeveloped. This can sometimes make it hard for researchers to use experimental approaches to test their hypotheses. We have only been working on this for a relatively short space of time. (p. 6)
In other words, it's very premature to conclude that our genome is full of truly functionless junk DNA. Lacking the technology to detect function, it's understandable why we might have missed a lot of the important functions going on in the genome.
As Carey notes, "We now know that in some cases just a single base-pair change in an apparently irrelevant region of the genome can have a definite effect" (p. 201), meaning there's a lot of work left to be done. After all, she points out: "One stretch of DNA can include a protein-coding gene, long non-coding RNAs, small RNAs, antisense RNAs, splice signal sites, untranslated regions, promoters and enhancers." (p. 287) Thus, she concludes, "When we really think about the complexity of our genomes, it isn't surprising that we can't understand everything yet." (p. 288) And that, along with much else in this excellent book, hits the nail on the head.