Background
Genes and Genomes
Genes are distributed around the genome. Since the Human Genome DNA
sequence has been completed, it has also been analysed to show the
locations of all the known genes, and others have been predicted by
computational methods which identify typical gene features in the genomic
sequence. All this data is available in a public database, with a
user-friendly interface to search and browse it. The annotated Human
Genome, together with those of other organisms, can be found at
http://genome.ucsc.edu.
Gene to ProteinMost of the genes you find in the Genome Browser
will appear to be discontinuous, with large gaps (introns) between regions
of functional sequence (exons). When a gene is expressed, first, a copy of
the entire length of DNA in the gene is made, using a similar, but
single-stranded, molecule called mRNA. This process is called
transcription. This is then spliced to remove the introns and join up the
exons. This molecule, the mature mRNA has, in addition to the coding
region, some extra sequences at the beginning (5' end) and the end (3'
end) which are involved in the control of the following stage of
expression. The mature RNA is translated, a process in which each 3
bases of coding sequence is read to add the corresponding amino acid to a
peptide. This peptide may undergo post-translational modification which
can involve removal of segments, folding, combination with other peptides,
and addition of other chemicals, to produce the mature protein.
|
Example 1: Cystic Fibrosis
Cystic fibrosis (or Mucoviscidosis) is one of the most common fatal
hereditary diseases. 1 in 2500 children are born with cystic fibrosis and
die early. The symptoms include breathing diffculties and frequent lung
infections among multiple other symptoms affecting the entire body. The
disease is caused by a mutation in a gene called cystic fibrosis
transmembrane conductance regulator (CFTR). This gene is necessary to
produce sweat, digestive juices and mucus and its gene product functions
as a chloride channel. Cystic fibrosis is an autosomal recessive disease.
This means if both copies of the gene carry a mutation the disease
develops. People with only functional copy of the CFTR gene can prevent
the disease.
The CFTR gene is located on the long arm of chromosome 7. Look up the
gene in the Human
Genome Browser. By clicking on the gene you can find out more about
the gene's properties and the protein it encodes. Can you find the length
of the gene?
The most common mutation in cystic fibrosis is a deletion of 3 base
pairs in the CFTR gene (PHE508DEL), which results in an amino acid
deletion in the protein sequence (You can find information on other
mutations in the OMIM
database). The mutation locates to the nucleotide binding domain of the
chloride chanel and although the shortened protein sequence does not lead
to a global change in the conformation of the protein, it changes its
local surface at the position of the deletion. As a result of the change
surface, the individual subunits of the chloride channel cannot properly
interact and remain functionally restrained. The position that is commonly
deleted in cystic fibrosis is highlighted in red and is located at the
subunit interface, disrupting proper binding if deleted.
The Protein Data Bank (PDB) provides
protein structures that have been experimentally determined by X-ray
crystallograhpy or NMR spectroscopy. A substrucutre
of the cystic fibrosis chloride channel can be found in the
repository. |
 |
Example 2: Breast Cancer
The breast cancer 2 gene (BRCA2) functions as tumor supressor gene and
prevents cells from growing and dividing out of control. In particular it
regulates the cell cycle in the breast. Mutations in this gene lead to an
increased susceptibility to breast cancer.
The BRCA2 gene has been mapped to the long arm of chromosome 13. Have a
look at its location and structure in the Human
Genome Browser.
For performing its function in DNA repair, BRCA2 binds to other
proteins. The figure shows a subregion of BRCA2 (the BRC repeat, which is
found several times in BRCA2) highlighted in blue bound to RAD51
(RecA-homology domain) highlighted in white. Mutations in this region of
BRCA2 disrupt the interaction with RAD51 and increase cancer
supceptibility. You can retrieve the protein structure from PDB. |
 |
Example 3: Neuropsychiatric disorders: Speech and Language
disorder
Research at the Wellcome Trust Centre for Human Genetics in Oxford
includes the quest for a deeper molecular understanding of speech and
language development. The centre has helped to connect speech and language
disorder to the FOXP2 gene, which functions as a transcription factor.
Transcription factors switch on and off other genes and are important
regulators in developmental processes. Transcription factors have a
DNA-binding domain, which allows to bind to DNA and activate transcription
of a gene.
FOXP2 is found on chromosome 7 (7q31) and its gene is structured into
17 exons. View FOXP2 in the Human
Genome Browser. A mutation in exon 14 of FOXP2 (R553H), which affects
a protein helix in the DNA-binding domain, has been shown to result in
speech and language disorder (OMIM).
This position is of great functional importance as it establishes a
contact with the DNA molecule and is responsible to recognize a specific
DNA pattern. If the FOXP2 gene is compared to its equivalent gene in other
organism, we find that position 553 is absolutely conserved form yeast to
vertebrates. The figure shows the DNA-binding domain (belonging to the
Forkhead family of DNA-binding domains) bound to DNA and highlights the
position of the R553H mutation in red. The structure of the protein
complex can be retrieved from PDB. |
 |
Links to Bioinformatic Databases on the WWW
- UCSC Genome Browser: Excellent
browser for the human genome.
- Ensembl: Database
for eukaryotic genomes (Human, chimp, mouse, dog, fish, frog, etc.)
- Protein Data Bank
(PDB): Experimentally determined structures of biological
molecules
- OMIM
- Online Mendelian Inheritance in Man: Catalog of human genes and
genetic disorders
- NCBI: National Center for
Biotechnology Information
- Swissprot: Protein sequence
database with information on the function of a protein, its domain
structure, post-translational modifications, variants, etc.
- PubMed:
Literature database
Some really fun links, not only for kids:
|