Proteins are the `active' molecules in the design of life. They do not merely code or translate information, but they perform the vast majority of biological functions. Their roles include the following: -
Proteins are long polymers, composed of sequences of amino acids joined by peptide links. There are 20 different amino acids. The human genome sequence aims to reveal the genetic DNA sequence, which is then translated into proteins, which perform the biological functions. Proteins vary in length from 10's to 1000's of amino acids, and may contain over 10,000 atoms. They form complexes with each other, and also with sugars, nucleic acids, and many small molecule ligands.
In vivo most proteins adapt a unique 3D structure, which is vital for their function. If we are to understand how these molecules perform their biological role, it is necessary first to reveal their 3 dimensional structures. These structures can be derived by 3 methods: X-ray crystallography was the first to achieve atomic resolution; and, more recently, nmr spectroscopy and electron microscopy have become useful. The first structure, that of haemoglobin, was solved in 1960 by Sir John Kendrew, Max Perutz and colleagues at the Laboratory of Molecular Biology in Cambridge (where the double helix structure of DNA was first revealed by Watson and Crick). It took them almost 20 years work. Today, it can still take years to solve the structure of a protein, although with recent advances in molecular biology and the use of synchrotron radiation, a structure may now be solved in only a few weeks (in favourable cases).
To date we know the structures of about 1000 `unique' proteins, derived from all forms of life (from humans to bacteria and plants). Although this is just the beginning (it is estimated that there are about 70,000 different proteins in humans), this work has already revealed a wonderful universe of protein structures. Some examples of protein structures are shown below, in different representations.
In reality proteins are rather solid objects (as seen in Fig.1), held together by non-covalent forces. Within this `blob', the protein chain weaves `to and fro' - and this is clarified by a representation (shown in Fig.2) in which the protein linear chain is illustrated. Certain conformations, the alpha-helixes and beta-sheet motifs, occur very frequently, and structures can be further simplified by coding these motifs by colour (see Fig.3).
Just as the animal and plant kingdoms were classified and categorised in the early 1700s, the molecular universe of proteins has also been classified into families, which tell us about their origins and evolution. Not surprisingly, we find that nature has been `economical', using the same protein structures to perform multiple varied tasks, by slight tuning.
As we learn more about proteins and their structures, this information can be used to understand why proteins sometimes go wrong (as in inherited genetic disorders). The structures can provide the basis for rational drug design and environmentally friendly agricultural crop protection. The genome projects will reveal the protein sequences, but it is the structures, combined with kinetic and thermodynamic biochemical data, which will for the first time reveal how life works at the molecular level.
Janet Thornton FRS is the Director of the European Bioinformatics Institute . She holds professorial postions at University College London and Birkbeck College.