The ENCODE project became controversial last year when it suggested that 80%+ of the human genome is “functional”, meaning “transcribed”, meaning “let’s all argue about what we mean”. The argument continues to rage vituperously though, of course, there is no disagreement whatsoever about the consensus science (fortunately for BioLogos which is theologically wedded to the consensus), because science seems to be helpfully defined nowadays by what isn‘t in dispute at any particular time. But in truth the stage was set for upset back in 2007, when an ENCODE paper suggested a new definition for the gene that said:
A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
It’s not clear to me that this is compatible with Wikipedia‘s more conservative “working definition”, based on two papers from 2006-7, which is “a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and or other functional sequence regions.”
ENCODE’S suggestion is rather wider: its “union of genomic sequences” is more complex than “a sequence”, and “a coherent set of overlapping functional products” than “a unit of inheritance”. I’ll try and add some detailing colour from the original paper. For instance, compare this to the Wikipedia definition:
To quote Falk (1986), “. . . the gene is [. . .]neither discrete [. . .] nor continuous [. . .], nor does it have a constant location [. . .], nor a clearcut function [. . .], not even constant sequences [. . .] nor definite borderlines.” And now the ENCODE project has increased the complexity still further.
And:
…before the advent of the ENCODE project, there were a number of aspects of genes that were very complicated, but much of this complexity was in some sense swept under the rug and did not really affect the fundamental definition of a gene. The experience of the ENCODE project, particularly the mapping of transcriptional activity and regulation using tiling arrays, has extended these puzzling and confusing aspects of genes, bringing them to the forefront, where one has to grapple more directly with them in relation to the definition of what a gene is.
Even their new definition deliberately omits important aspects of genomic function that the old theory never even considered:
Although regulatory regions are important for gene expression, we suggest that they should not be considered in deciding whether multiple products belong to the same gene… Regulation is simply too complex to be folded into the definition of a gene, and there is obviously a many-to-many (rather than one-to-one) relationship between regulatory regions and genes.
Elsewhere, ENCODE’s John Stamatoyannopoulos has stated:
Although the gene has conventionally been viewed as the fundamental unit of genomic organization, on the basis of ENCODE data it is now compellingly argued that this unit is not the gene but rather the transcript (Washietl et al. 2007; Djebali et al. 2012a). On this view, genes represent a higher-order framework around which individual transcripts coalesce, creating a polyfunctional entity that assumes different forms under different cellular states, guided by differential utilization of regulatory DNA.
In other words, their new definition of the gene must be taken in conjunction with the realisation that it is also the downgrading of the gene as the fundamental unit of organisation. Genes (if they can indeed be isolated) do not control the genome, but are used by it. One attempt in the literature to accommodate this idea has been to analogise genes as frequently-used subroutines within an operating system – the big thing to be explained being the global function of the “program” that calls the genes. But even this, our author says, does not go far enough:
The new ENCODE perspective does not, of course, fit with the metaphor of the gene as a simple callable routine in a huge operating system. In this new perspective, one enters a gene “routine” in many different ways in the framework of alternative splicing and lattices of long transcripts. The execution of the genomic OS does not have as neat a quality as this idea of repetitive calls to a discrete subroutine in a normal computer OS. However, the framework of describing the genome as executed code still has some merit. That is, one can still understand gene transcription in terms of parallel threads of execution, with the caveat that these threads do not follow canonical, modular subroutine structure. Rather, threads of execution are intertwined in a rather “higgledy-piggledy” fashion, very much like what would be described as a sloppy, unstructured computer program code with lots of GOTO statements zipping in and out of loops and other constructs.
For any still unfamiliar with the research that has undermined the original idea of the gene as a digital recipe sequence, that codes for a protein, that performs a function, (that lives in the house that Jack built) this paragraph sums it up:
Overall, the ENCODE experiments have revealed a rich tapestry of transcription involving alternative splicing, covering the genome in a complex lattice of transcripts. According to traditional definitions, genes are unitary regions of DNA sequence, separated from each other. ENCODE reveals that if one attempts to define a gene on the basis of shared overlapping transcripts, then many annotated distinct gene loci coalesce into bigger genomic regions. One obvious implication of the ENCODE results is that there is less of a distinction to be made between genic and intergenic regions. Genes now appear to extend into what was once called intergenic space, with newly discovered transcripts originating from additional regulatory sites. Moreover, there is much activity between annotated genes in the intergenic space.
The paper is quick to add that there is no dispute that “the genotype” is responsible for “the phenotype”. But the significance for the consideration of the gene as an actual entity cannot be stressed too much. Here the author suggests the background to their decision to redefine it:
At this point, it is not clear what to do: In the extreme, we could declare the concept of the gene dead and try to come up with something completely new that fits all the data. However, it would be hard to do this with consistency. Here, we made a tentative attempt at a compromise, devising updates and patches for the existing definition of a gene.
In other words, a deliberate attempt has been made to retrofit the new data to patch a leaking paradigm. The new definition was deliberately worded to be compatible with any instances that might be found of a single length of DNA coding for a single functional protein alone (have any been shown to exist?), as well as the far more common situation where this is not the case.
One analogy to the new viewpoint might be that the genome is not a program to be executed, but a book to be understood, with a range of related concepts presented as an integrated whole. A computer can execute a set of subroutines. But though a politician might consider Machiavelli’s The Prince to have formed their entire theory of government, yet it might be quite rare to find a specific example of policy taken directly from the text. A level of explanation orders of magnitude above the algorithmic is needed to make sense of such a scenario.
Perhaps anticipating, and attempting to defuse, the furore that would later erupt over the ENCODE’s critique of “Junk DNA” and so on, the article suggests:
However, we probably will not be able to ever know the function of all molecules in the genome. It is conceivable that some genomic products are just “noise,” i.e., results of evolutionarily neutral events that are tolerated by the organism (e.g., Tress et al. 2007). Or, there may be a function that is shared by so many other genomic products that identifying function by mutational approaches may be very difficult. While determining biological function may be difficult, proving lack of function is even harder (almost impossible).
In commenting on all this I want to follow up two strands in a number of recent posts on The Hump. The first is that the more fuzzy the understanding of the genes becomes (and therefore the less useful as a working concept), the more weight is given to a more holistic understanding of life and its processes, as we find in Aristotle of old and Goethe more recently. It’s far from clear from the data, as opposed to the ENCODE definition, to what extent the “potentially overlapping functional products” can ever actually be organised into a “coherent set” that doesn’t blend around the edges into every other genetic process. Perhaps everything does cause everything else in a “higgledy-piggledy” way.
As I commented on another thread, whilst some might see this as evidence of non-design, it could also (and more plausibly, given the sophistication observed) be a sign of the highest possible level of organisation. The Origin of Species, after all, was according to its author “one long argument”, not a set of separable components, though it has sentences, words and letters. Designed or not, though, it makes a reductionistic understanding of life as a set of machine-like processes increasingly problematic.
The other strand for comment is to pick up on James Maxwell’s division of theories into analogies, physical theories and mathematical theories.
The gene was, par excellence, a physical theory. It arose from Mendel’s discovery that (some) phenotypic traits are “digitally” organised, in contradiction to Darwin’s idea of “gemmules” blending their characteristics. “Genes” though, like gemmules, were deeply rooted in the corpuscularism/atomism of early-modern science, and were seen as the theoretical “irreducible particle” of heredity. They were looked for, as such, in the cell, and the discovery of DNA helpfully delivered “traits on a string”. The semantic nature of DNA may have been a surprise, but in effect that just gave the genes a “subatomic” structure of nucleotides. But up until recently – and very much up to the present in public awareness – the gene was the atomic particle of heredity. ENCODE (not uniquely) demolishes that as much as the classical corpuscularism of Boyle has been demolished in physics.
In physics it may still be mentally helpful to conceptualise atoms as “solid particles”, and even useful to deal with them as such in limited scientific situations. But such a view has shifted from a “physical theory” to a mere analogy. Likewise, I’ve no doubt that the abstract concept of an isolated “gene”, somehow responsible for an equally abstract concept of a discrete “trait”, might have practical use – as an analogy. In the same way one might still find Darwin’s gemmules to have a limited explanatory utility.
But an analogy for what? Unlike the case of physics, there is no set of partial differential equations that describes the “higgledy-piggledy” organisation of the genome. And since living processes already go far beyond our nearest physical comparator – digital technology – we currently have no physical theory to understand genome function either.
One cannot understand a biological function, like walking, without seeing its context in the whole animal. Perhaps the same is true for what we have called “genes”. It seems we can only begin to understand life in terms of itself, cut off from any all-encompassing theoretical framework – which is remarkably Goethian. That may prove a fruitful approach, but it is certainly very different from any other branch of science of which I’m aware.
So would it be fair to say the gene has gone the way of the specie?
Dunno, Merv – we’ll just have to wait and see (as we always say in our household). But when a definition seems to be made mainly to preserve the very existence of the entity, one has to wonder. At least you can see a species, even if the boundaries seem fuzzy. But that fuzziness is partly because of assuming the gene theory of piecemeal change: maybe a more holistic genomics might even reinstate the status of the species as a natural kind. Perhaps Aristotle isn’t dead after all.
When one considers, the gene was a theoretical generalised construct from the start, proposed in the light of Mendel’s rather specific experiments. Biology, and even evolution, would do quite well without it… provided one accepts one isn’t even at first base yet in understanding how it works.
But it is a time, if not of paradigm shift, then of paradigm shaking: I can’t believe ENCODE will end up taking us back to where we were before. Ways of thinking will change radically – maybe it’ll still be called Neo-Darwinism, like secularised liberals still call themselves Christians.
Even though genes are not the nice neat packages we once thought them to be, it is still a reasonable approximation, as is the atomic theory. The efficiency with which we can change crops by genetic modifications shows that the gene concept is still quite real. I’ve spent this week talking with plant scientists at the Carnegie Institution for Science, and they are doing things with genes that boggle my mind, things that would be unimaginable just ten years ago, and which would not work if genes were the fuzzy things that Jon suggests. The gene is still a very fruitful working concept or approximation.
Well Lou, I don’t think ENCODE were that influenced by me when they wrote their defintion, or the supporting papers.
But to be a fruitful concept has, in the context of what we’ve been discussing these few posts, relatively little bearing on underlying validity. Many targets were effectively hit using artillery calculations based on Aristotle’s theories of motion, and many accurate navigations made on Ptolemy’s astronomy.
I think Dawkins had more in mind for the gene than a working approximation.
A general comment. Ordinarily I would not find fault with how biologists understood genes, if they openly admitted they were providing the ‘best’ approximation. The scientific community, and the community at large, expect a balanced and sensible outlook – especially when dealing with knowledge of complicated areas such as these, and application to the wider community.
As previous remarks on the use and misuse of science have indicated, the ‘escape clause’ by those who think science is the truth itself, was of ‘pure science’, or ‘the pursuit of knowledge’ of ‘academic freedom’, and many other such excuses and reneging their responsibility. I am of the view that many are motivated by other factors besides the pursuit of so called pure knowledge. This of course goes to the matter of what type of person one is, or chooses to be.
I was reminiscing on the debates during my college years, where Darwin was beginning to make a comeback with Mendel’s work, and the scandal of the Piltdown man was public knowledge. Many made much of the fraud, yet I remember that no matter how often I turned the debate to the real issue, which was the argument for the missing link, those who accepted Darwin without reservation would ignore this important part of Darwinism, and would continue with Piltdown. Yet here is another example where a central concept to Darwin was simply brushed under the carpet. I and all other students were taught since junior school this basic concept regarding the descent of man as a fact of science (no doubt or questions were tolerated), with step progresses to cave man from ape-like, with splendid illustrations. This determination to hide speculation and incomplete ideas, even when error is shown, and insist that ‘science has spoken’ is one major factor for the demise of science within our community. Nowadays if one reminds them of this great error, they become aggressive, insisting their current concept is the last word on this matter, and the cycle begins again. Yet I think we scientists can never progress unless we learn from error as well as experimentally verified knowledge – we must at all times work on the notion of : “known-unknown”, “is correct- is error”. It is inconceivable to me that scientific work can progress in any other way.
Yet the trend continues, but with greater emphasis by atheists, who now put forward the view that people believe Darwin, they are scientific, but otherwise they are uneducated, religious bigots, stuck in a dark past.