The ENCODE project became controversial last year when it suggested that 80%+ of the human genome is “functional”, meaning “transcribed”, meaning “let’s all argue about what we mean”. The argument continues to rage vituperously though, of course, there is no disagreement whatsoever about the consensus science (fortunately for BioLogos which is theologically wedded to the consensus), because science seems to be helpfully defined nowadays by what isn‘t in dispute at any particular time. But in truth the stage was set for upset back in 2007, when an ENCODE paper suggested a new definition for the gene that said:
A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
It’s not clear to me that this is compatible with Wikipedia‘s more conservative “working definition”, based on two papers from 2006-7, which is “a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and or other functional sequence regions.”
ENCODE’S suggestion is rather wider: its “union of genomic sequences” is more complex than “a sequence”, and “a coherent set of overlapping functional products” than “a unit of inheritance”. I’ll try and add some detailing colour from the original paper. For instance, compare this to the Wikipedia definition:
To quote Falk (1986), “. . . the gene is [. . .]neither discrete [. . .] nor continuous [. . .], nor does it have a constant location [. . .], nor a clearcut function [. . .], not even constant sequences [. . .] nor definite borderlines.” And now the ENCODE project has increased the complexity still further.
…before the advent of the ENCODE project, there were a number of aspects of genes that were very complicated, but much of this complexity was in some sense swept under the rug and did not really affect the fundamental definition of a gene. The experience of the ENCODE project, particularly the mapping of transcriptional activity and regulation using tiling arrays, has extended these puzzling and confusing aspects of genes, bringing them to the forefront, where one has to grapple more directly with them in relation to the definition of what a gene is.
Even their new definition deliberately omits important aspects of genomic function that the old theory never even considered:
Although regulatory regions are important for gene expression, we suggest that they should not be considered in deciding whether multiple products belong to the same gene… Regulation is simply too complex to be folded into the definition of a gene, and there is obviously a many-to-many (rather than one-to-one) relationship between regulatory regions and genes.
Elsewhere, ENCODE’s John Stamatoyannopoulos has stated:
Although the gene has conventionally been viewed as the fundamental unit of genomic organization, on the basis of ENCODE data it is now compellingly argued that this unit is not the gene but rather the transcript (Washietl et al. 2007; Djebali et al. 2012a). On this view, genes represent a higher-order framework around which individual transcripts coalesce, creating a polyfunctional entity that assumes different forms under different cellular states, guided by differential utilization of regulatory DNA.
In other words, their new definition of the gene must be taken in conjunction with the realisation that it is also the downgrading of the gene as the fundamental unit of organisation. Genes (if they can indeed be isolated) do not control the genome, but are used by it. One attempt in the literature to accommodate this idea has been to analogise genes as frequently-used subroutines within an operating system – the big thing to be explained being the global function of the “program” that calls the genes. But even this, our author says, does not go far enough:
The new ENCODE perspective does not, of course, fit with the metaphor of the gene as a simple callable routine in a huge operating system. In this new perspective, one enters a gene “routine” in many different ways in the framework of alternative splicing and lattices of long transcripts. The execution of the genomic OS does not have as neat a quality as this idea of repetitive calls to a discrete subroutine in a normal computer OS. However, the framework of describing the genome as executed code still has some merit. That is, one can still understand gene transcription in terms of parallel threads of execution, with the caveat that these threads do not follow canonical, modular subroutine structure. Rather, threads of execution are intertwined in a rather “higgledy-piggledy” fashion, very much like what would be described as a sloppy, unstructured computer program code with lots of GOTO statements zipping in and out of loops and other constructs.
For any still unfamiliar with the research that has undermined the original idea of the gene as a digital recipe sequence, that codes for a protein, that performs a function, (that lives in the house that Jack built) this paragraph sums it up:
Overall, the ENCODE experiments have revealed a rich tapestry of transcription involving alternative splicing, covering the genome in a complex lattice of transcripts. According to traditional definitions, genes are unitary regions of DNA sequence, separated from each other. ENCODE reveals that if one attempts to define a gene on the basis of shared overlapping transcripts, then many annotated distinct gene loci coalesce into bigger genomic regions. One obvious implication of the ENCODE results is that there is less of a distinction to be made between genic and intergenic regions. Genes now appear to extend into what was once called intergenic space, with newly discovered transcripts originating from additional regulatory sites. Moreover, there is much activity between annotated genes in the intergenic space.
The paper is quick to add that there is no dispute that “the genotype” is responsible for “the phenotype”. But the significance for the consideration of the gene as an actual entity cannot be stressed too much. Here the author suggests the background to their decision to redefine it:
At this point, it is not clear what to do: In the extreme, we could declare the concept of the gene dead and try to come up with something completely new that fits all the data. However, it would be hard to do this with consistency. Here, we made a tentative attempt at a compromise, devising updates and patches for the existing definition of a gene.
In other words, a deliberate attempt has been made to retrofit the new data to patch a leaking paradigm. The new definition was deliberately worded to be compatible with any instances that might be found of a single length of DNA coding for a single functional protein alone (have any been shown to exist?), as well as the far more common situation where this is not the case.
One analogy to the new viewpoint might be that the genome is not a program to be executed, but a book to be understood, with a range of related concepts presented as an integrated whole. A computer can execute a set of subroutines. But though a politician might consider Machiavelli’s The Prince to have formed their entire theory of government, yet it might be quite rare to find a specific example of policy taken directly from the text. A level of explanation orders of magnitude above the algorithmic is needed to make sense of such a scenario.
Perhaps anticipating, and attempting to defuse, the furore that would later erupt over the ENCODE’s critique of “Junk DNA” and so on, the article suggests:
However, we probably will not be able to ever know the function of all molecules in the genome. It is conceivable that some genomic products are just “noise,” i.e., results of evolutionarily neutral events that are tolerated by the organism (e.g., Tress et al. 2007). Or, there may be a function that is shared by so many other genomic products that identifying function by mutational approaches may be very difficult. While determining biological function may be difficult, proving lack of function is even harder (almost impossible).
In commenting on all this I want to follow up two strands in a number of recent posts on The Hump. The first is that the more fuzzy the understanding of the genes becomes (and therefore the less useful as a working concept), the more weight is given to a more holistic understanding of life and its processes, as we find in Aristotle of old and Goethe more recently. It’s far from clear from the data, as opposed to the ENCODE definition, to what extent the “potentially overlapping functional products” can ever actually be organised into a “coherent set” that doesn’t blend around the edges into every other genetic process. Perhaps everything does cause everything else in a “higgledy-piggledy” way.
As I commented on another thread, whilst some might see this as evidence of non-design, it could also (and more plausibly, given the sophistication observed) be a sign of the highest possible level of organisation. The Origin of Species, after all, was according to its author “one long argument”, not a set of separable components, though it has sentences, words and letters. Designed or not, though, it makes a reductionistic understanding of life as a set of machine-like processes increasingly problematic.
The other strand for comment is to pick up on James Maxwell’s division of theories into analogies, physical theories and mathematical theories.
The gene was, par excellence, a physical theory. It arose from Mendel’s discovery that (some) phenotypic traits are “digitally” organised, in contradiction to Darwin’s idea of “gemmules” blending their characteristics. “Genes” though, like gemmules, were deeply rooted in the corpuscularism/atomism of early-modern science, and were seen as the theoretical “irreducible particle” of heredity. They were looked for, as such, in the cell, and the discovery of DNA helpfully delivered “traits on a string”. The semantic nature of DNA may have been a surprise, but in effect that just gave the genes a “subatomic” structure of nucleotides. But up until recently – and very much up to the present in public awareness – the gene was the atomic particle of heredity. ENCODE (not uniquely) demolishes that as much as the classical corpuscularism of Boyle has been demolished in physics.
In physics it may still be mentally helpful to conceptualise atoms as “solid particles”, and even useful to deal with them as such in limited scientific situations. But such a view has shifted from a “physical theory” to a mere analogy. Likewise, I’ve no doubt that the abstract concept of an isolated “gene”, somehow responsible for an equally abstract concept of a discrete “trait”, might have practical use – as an analogy. In the same way one might still find Darwin’s gemmules to have a limited explanatory utility.
But an analogy for what? Unlike the case of physics, there is no set of partial differential equations that describes the “higgledy-piggledy” organisation of the genome. And since living processes already go far beyond our nearest physical comparator – digital technology – we currently have no physical theory to understand genome function either.
One cannot understand a biological function, like walking, without seeing its context in the whole animal. Perhaps the same is true for what we have called “genes”. It seems we can only begin to understand life in terms of itself, cut off from any all-encompassing theoretical framework – which is remarkably Goethian. That may prove a fruitful approach, but it is certainly very different from any other branch of science of which I’m aware.