Bioinformatics Tools to understand Human Diseases


Background

Genes and Genomes

Genes are distributed around the genome. Since the Human Genome DNA sequence has been completed, it has also been analysed to show the locations of all the known genes, and others have been predicted by computational methods which identify typical gene features in the genomic sequence. All this data is available in a public database, with a user-friendly interface to search and browse it. The annotated Human Genome, together with those of other organisms, can be found at http://genome.ucsc.edu.

Gene to Protein

Most of the genes you find in the Genome Browser will appear to be discontinuous, with large gaps (introns) between regions of functional sequence (exons). When a gene is expressed, first, a copy of the entire length of DNA in the gene is made, using a similar, but single-stranded, molecule called mRNA. This process is called transcription. This is then spliced to remove the introns and join up the exons. This molecule, the mature mRNA has, in addition to the coding region, some extra sequences at the beginning (5' end) and the end (3' end) which are involved in the control of the following stage of expression. The mature RNA is translated, a process in which each 3 bases of coding sequence is read to add the corresponding amino acid to a peptide. This peptide may undergo post-translational modification which can involve removal of segments, folding, combination with other peptides, and addition of other chemicals, to produce the mature protein.


Example 1: Cystic Fibrosis

Cystic fibrosis (or Mucoviscidosis) is one of the most common fatal hereditary diseases. 1 in 2500 children are born with cystic fibrosis and die early. The symptoms include breathing diffculties and frequent lung infections among multiple other symptoms affecting the entire body. The disease is caused by a mutation in a gene called cystic fibrosis transmembrane conductance regulator (CFTR). This gene is necessary to produce sweat, digestive juices and mucus and its gene product functions as a chloride channel. Cystic fibrosis is an autosomal recessive disease. This means if both copies of the gene carry a mutation the disease develops. People with only functional copy of the CFTR gene can prevent the disease.

The CFTR gene is located on the long arm of chromosome 7. Look up the gene in the Human Genome Browser. By clicking on the gene you can find out more about the gene's properties and the protein it encodes. Can you find the length of the gene?

The most common mutation in cystic fibrosis is a deletion of 3 base pairs in the CFTR gene (PHE508DEL), which results in an amino acid deletion in the protein sequence (You can find information on other mutations in the OMIM database). The mutation locates to the nucleotide binding domain of the chloride chanel and although the shortened protein sequence does not lead to a global change in the conformation of the protein, it changes its local surface at the position of the deletion. As a result of the change surface, the individual subunits of the chloride channel cannot properly interact and remain functionally restrained. The position that is commonly deleted in cystic fibrosis is highlighted in red and is located at the subunit interface, disrupting proper binding if deleted.

The Protein Data Bank (PDB) provides protein structures that have been experimentally determined by X-ray crystallograhpy or NMR spectroscopy. A substrucutre of the cystic fibrosis chloride channel can be found in the repository.


Example 2: Breast Cancer

The breast cancer 2 gene (BRCA2) functions as tumor supressor gene and prevents cells from growing and dividing out of control. In particular it regulates the cell cycle in the breast. Mutations in this gene lead to an increased susceptibility to breast cancer.

The BRCA2 gene has been mapped to the long arm of chromosome 13. Have a look at its location and structure in the Human Genome Browser.

For performing its function in DNA repair, BRCA2 binds to other proteins. The figure shows a subregion of BRCA2 (the BRC repeat, which is found several times in BRCA2) highlighted in blue bound to RAD51 (RecA-homology domain) highlighted in white. Mutations in this region of BRCA2 disrupt the interaction with RAD51 and increase cancer supceptibility. You can retrieve the protein structure from PDB.


Example 3: Neuropsychiatric disorders: Speech and Language disorder

Research at the Wellcome Trust Centre for Human Genetics in Oxford includes the quest for a deeper molecular understanding of speech and language development. The centre has helped to connect speech and language disorder to the FOXP2 gene, which functions as a transcription factor. Transcription factors switch on and off other genes and are important regulators in developmental processes. Transcription factors have a DNA-binding domain, which allows to bind to DNA and activate transcription of a gene.

FOXP2 is found on chromosome 7 (7q31) and its gene is structured into 17 exons. View FOXP2 in the Human Genome Browser. A mutation in exon 14 of FOXP2 (R553H), which affects a protein helix in the DNA-binding domain, has been shown to result in speech and language disorder (OMIM). This position is of great functional importance as it establishes a contact with the DNA molecule and is responsible to recognize a specific DNA pattern. If the FOXP2 gene is compared to its equivalent gene in other organism, we find that position 553 is absolutely conserved form yeast to vertebrates. The figure shows the DNA-binding domain (belonging to the Forkhead family of DNA-binding domains) bound to DNA and highlights the position of the R553H mutation in red. The structure of the protein complex can be retrieved from PDB.


Links to Bioinformatic Databases on the WWW

  • UCSC Genome Browser: Excellent browser for the human genome.
  • Ensembl: Database for eukaryotic genomes (Human, chimp, mouse, dog, fish, frog, etc.)
  • Protein Data Bank (PDB): Experimentally determined structures of biological molecules
  • OMIM - Online Mendelian Inheritance in Man: Catalog of human genes and genetic disorders
  • NCBI: National Center for Biotechnology Information
  • Swissprot: Protein sequence database with information on the function of a protein, its domain structure, post-translational modifications, variants, etc.
  • PubMed: Literature database

Some really fun links, not only for kids:


Last modified: Mon Mar 19 11:50:31 GMT 2007