Another interesting paper, again brought to my attention by an ID website (sorry). But again, I was interested in looking at the original in PLoS Genetics, which fortunately for us all has open access. The basic finding is a surprisingly high number of de novo protein-coding genes in the human genome, to the tune of 60. This was compared to chimpanzee and other primate sources. This, they say, is three times more than what was found hitherto – but then they looked harder.
I hope I’m summarising fairly when I say they were extremely conservative in their estimate, excluding not only similar existing genes in apes that could have evolved, and long sequences without start/stop codons that could represent genes inactivated during evolution, but also quite short sequences that could have recombined to make new genes. In other words, as far as they could, they looked only for completely new genes. They also looked for the peptides coded by these new genes to show that they were, in fact, expressed and so genuine protein-coding genes.
Looking for function, and therefore selection, is hard, and they do report that most of the genes found were expressed only weakly. But since they were, with only one exception, fixed in the population, and in many cases expressed only in particular tissues, notably the brain, it seems likely that these genes are at least moderately advantageous, rather than being neutral or deleterious. Indeed the authors speculate, given the number involved in cortical tissue, that this group of genes may code for a significant part of the higher mental function of H. sapiens – in other words, completely new genes may play a major role in what makes us biologically human.
The Scientific American report on the paper says that the genes code for quite short proteins, but Table S1 and Table S2 in the paper itself show this is only relatively true: they range from 348 down to 126 residues, averaging around the 150 mark used by William Dembski in his rather conservative probability calculations.
OK, so what do we have here? The standard theory says that changes occur by random mutation and natural selection, but that is heavily stretched nowadays by incorporating changes of more limited randomness, like gene duplication, activation or recombination, all excluded by this study. At the same time selection is by many now largely limited to purifying selection, excluding the rubbish rather than selecting the excellent. Kimura’s neutral theory (as far as I can see somewhat preferred now) says most changes are due to near-neutral mutation rather than to selection, though some final adaptive role for selection seems to be maintained – whether because the science demands it or because it’s too iconic to ditch I’m not entirely sure.
Be that as it may, it appears in this study that 60 genes of some (possibly great) adaptive advantage have arisen in the 5-6 million years since we are assumed to have diverged from our common ancestor with chimpanzees. This does not seem much, but is a conservative estimate, as the authors say, and must be taken against a basal mutation rate of 10^−8 per base pairs per generation, though that’s higher in “junk DNA” where, it seems, most of these useful proteins arose.
Since the study is about entirely new genes, rather than tinkering with existing ones, the only mechanism available for these new genes and new proteins would seem to be stochastic, genuinely random mutations: a sequence of 126-348 residues assembles in the genome at random, acquires a start-stop codon, and turns up serving a function in a specific tissue now. Or to be more exact, 60 individual sequences of 126-348 residues arise in this way.
At this point it’s easy to do a probability calculation, because it’s just the same situation postulated by Dembski for an origin-of-life scenario. There’s no shortage of nucleotides to use, and plenty of amino-acids and all the compex apparatus needed to transcribe the gene into the protein. That’s a big simplifying assumption in OoL matters, but a given here. If I were less lazy I could do an exact calculation from the data in the study’s figures, but if we take the average protein here to be 150 residues long, the odds for each gene arising by random mutation are the number of possible base-triplet combinations (48) to the power of 150, ie 48^150. For all 60 to arise the probability (if my limited maths is holding out OK) is 48^150^60. But that’s irrelevant because even the first figure is well outside the probabalistic resources of the Universe.
There are four get-outs for the conclusion that it couldn’t have happened, as far as I can see from what I’ve read in the past.
The first is design, which as we all know is That Which Must Not Be Uttered.
The second is that protein function is so flexible that an equally good job would have been done by a large number of different genes, thus reducing the odds dramatically. The question that arises for me is just how flexible proteins must be to reduce the odds from impossible-on-the-multiverse scale to achievable-in-6-million-years. For those of a mathematical bent, it wouldn’t be too hard to work out the likely probablistic resources of 6 million years of human evolution, divide it into 48^150^60, and arrive at a figure for how many proteins could equally do the same job. It would be a vast number, I think. Makes you wonder why cells bother with error correction.
The third possibility is similar: that these genes originally arose with different selectable functions, and were able to evolve through a pretty flexible protein space, changing functions as they did so until they arrived at the brain- or testis- specific functions they now possess. The question of how easy it really is to migrate through protein-space is highly controversial, but given the limited role of selection in modern interpretations of Neodarwinism, is it realistic to say this complicated process happened 60 times in 6 million years (just happening in the process, you’ll note, to increase the pre-existing primate trend towards greater intelligence)?
The fourth solution is that protein evolution is actually highly directed, as per the natural genetic engineering of James Shapiro and similar authors. But as Jerry Coyne said, “Shapiro is heterodox”. Self-organisation might help us towards the truth, but it doesn’t help us towards Darwinian truth, which is what is needed since Darwinism is fact, not mere theory, as we all know. In any case, we are not talking here about bright cells recombining functional protein folds in novel ways, loading the odds of producing a winner. The whole point of the paper is that these are entirely novel genes for entirely novel proteins. What level of self-organisation does it take for a cell to plan and configure 60 proteins from scratch?
There is a fifth possibility I didn’t mention, and that is that since we are here, probabilities don’t count and random mutation plus or minus near-neutral mutation, purifying selection and adaptive selection did the job. I have a feeling that this is the best explanation, in term of acceptance by the mainstream scientific community. So let’s run with it.
But it does rather remind me of the schizophenic who thought he was dead. His doctor’s strategy was to convince him by reason that dead men don’t bleed, and then to prick the patient’s finger and show the blood. “What do you know!” said the patient. “Dead men do bleed after all.”