Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People speculate there may have been a wider variety of DNA coding in the past. But natural selection plus perhaps some reaction energetics versus complexity settled on the current system.

There was probably a simpler two nuclide encoding versus three beforehand. About half of the amino acids only use the first two nuclides and ignore the third.



that seems unlikely, because shifting your recognition domain count from 2-3 means that you basically lose all the evolved information from before and have to rely on chance "correct encodings" everywhere.


The idea is that the initial tRNA was not specific enough and only care about the first two letters of each codon and ignored the third. So for example Proline was determined by the first two letters CC? and was associated the four codons CCU, CCC, CCA and CCG. Actually, this is the current mapping.

Other blocks of four codons were split for some reason. We can imagine that originally Isoleucine was determined by AU? so initially AUU, AUC, AUA and AUG encoded Isoleucine, but now only the first three encode Isoleucine and the last one encodes Methionine instead.

This is somewhat based in the blocks of four codons that follow this patter where the first two base determine 16 block that sometimes are split https://en.wikipedia.org/wiki/Genetic_code and because the third base in the tRNA is strange https://en.wikipedia.org/wiki/Wobble_base_pair

Anyway, IIRC this is a reasonable speculation but it's not confirmed. So don't take this explanation too literally.

With this idea, the initial DNA could evolve for a few (zillions) years as list like

important-important-whatever-important-important-whatever-important-important-whatever-important-important-whatever-important-important-whatever-important-important-whatever-important-important-whatever

and then make the whatever letters also important with a almost backward compatible code, so in most case it still doesn't mater, but in a few cases it is important.

[Note: The official letter for whatever is "N" instead of "?"]


That's a great explanation! To add a cool point, the wobble position is frequently modified by highly specific enzymes to make it matter more. It's like some random protein mutated to do this modification and all of a sudden the organism got more RAM thus increasing it's fitness.


Yes, that is generally the accepted idea, but a two letter codon is not the same thing as a three letter codon with an absolutely ignored wobble.


What?

You start with a two-letter code, then something evolves that puts an (initially) rare third letter at a few locations on the tape. All the old "gear" that reads two-letter code can still read most of the tape.


It is difficult to imagine that as a possibility. The spacing of three mRNA nucleotides is pretty important structurally in the process of translation from mRNA to protein via the ribosome. It is difficult for me to imagine a ribosome that could operate arbitrarily between codons of two and three nucleotides, unless I am misunderstanding your comment.

A translational reading frame consists of non-overlapping codons of three nucleotides. If one nucleotide is skipped, the entire downstream message is thus garbled. So how would the translational machinery operate if each codon arbitrarily consisted of two or three nucleotides?


I see what you are getting at, but I think what the other comments are saying is that for the third nucleotide position of a given codon, it does not matter which nucleotide this is. The amino acid to be used would only depend on the first two nucleotides, while the third nucleotide can be AUCG.


Nah, I misunderstood completely.

I was thinking of how a hypothetical code might, in the abstract have evolved from binary, through ternary up to the (current base-4).

I haven't got enough biochem knowledge to speculate how three nucleotides per amino acid can evolve to have three.


They have done this with a four letter expanded codon. The fitness of the bacteria tanks.


>> About half of the amino acids only use the first two nuclides and ignore the third.

I've often thought some of that redundancy in the code could be a feature. Important (more sensitive) sequences could evolve to a coding that is more robust against mutations, while things that are less important could be more brittle in their encoding. This seems hard to prove though.

It also allows a particular triplet to have more neighbors, meaning you can go from one amino acid to more options without going through intermediates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: