Here I systematically examine the information complexity of all primary sequences of natural proteins deposited in the Swiss-Prot database. The sequence complexity is assessed by determining the frequency of occurrence of each amino acid type on sequence windows of fixed length, calculating the Shannon entropy of the window and then averaging over all windows covering the sequence. The minimum value in information content obtained from the present-day record imposes a lower limit in the number of letters that a primeval amino acid alphabet must have had.
© 1946 – 2014: Verlag der Zeitschrift für Naturforschung
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.