In this article we will discuss about:- 1. Introduction to Human Genome Project 2. Goals of Human Genome Project 3. Features 4. Completion.
Introduction to Human Genome Project:
The Human Genome Project or HGP is an international 13-year long effort which formally began in October 1990 to sequence the human genome. The project was coordinated by the U.S. Department of Energy (DOE) and the National Institutes of Health (NIH). The project was initially planned to last for 15 years, but because of rapid technological advances the project was completed in 2003.
The HGP was called a mega project human of the magnitude and requirement of the project. Eighteen countries have participated in the worldwide effort, with significant contributions from the Sanger Center in the United Kingdom and research centers in Germany, France, and Japan.
The countries which have participated in the human genome research are Australia, Brazil, Canada, China, Denmark, European Union, France, Germany, Israel, Italy, Japan, Korea, Mexico, Netherlands, Russia, Sweden, the United Kingdom, and the United States.
The Human Genome Organization or HUGO has helped to coordinate this international project. The Department of Energy’s Human Genome Program and the National Human Genome Research Institute (NHGRI) together sponsored the U.S. Human Genome Project.
Goals of Human Genome Project:
The goals of the human genome project were to accomplish the following things:
a. To identify all the genes in the human DNA.
b. To determine the sequence of approximately three billion base pairs (3 x 109) that make up the human DNA.
c. To store the information in databases.
d. To improve tools of data analysis.
e. To use the data and transfer related technologies to the biotechnology industry and to develop new medical applications.
f. To address the ethical, legal and social issues, i.e. ELSI that may arise from the project.
g. Along with the HGP, the DNA of a few model organisms was also sequenced such as E.coli to interpret human gene function.
The completion of the human DNA sequence in 2003 coincided with the 50th anniversary of Watson and Crick’s description of the fundamental structure of DNA. The first working draft was completed in June 2000 but high quality “finished sequence” of the human genome was completed in 2003. The final report of the HGP was published in 2006. However, an in-depth analysis of complete chromosomes continues to be published.
The Procedure of Sequencing the DNA:
The determination of the exact order of the base pairs in a segment of DNA is called sequencing. Since the base pairs exist in pairs, identification of bases on one strand determines the other member of the pair.
The primary method used by the HGP to produce the finished version of the human genetic code is map-based or BAC-based sequencing. BAC is the acronym for ‘bacterial artificial chromosome’. BAC is a piece of human DNA fitted into a vector which is a carrier of DNA. Another host is yeast and the vector is called yeast artificial chromosome (YAC).
Human DNA is fragmented into pieces that are relatively large but still manageable in size (between 150,000 and 200,000 base pairs). The fragmentation of DNA is known as ‘shearing’. The sheared fragment is inserted into a plasmid (pUC18), which is introduced into E. coli for replication. The fragments are introduced into the middle of the Lac Z gene. The plasmid also possesses an antibiotic resistance gene.
The bacterial culture is plated on another culture plate containing ampicillin, an antibiotic. Only those cells that have taken in the plasmid with the ampicillin resistance gene are able to survive and reproduce (Fig. 1). Wherever a transformed cell lands on the agar plate, it survives and begins dividing once every 20 minutes. Each colony contains about a million bacteria.
Each bacterial cell contains a single fragment. About 20,000 different BAC clones are required to contain the 3 billion pairs of bases of the human genome. A collection of BAC clones containing the entire human genome is called a ‘BAC library’.
During sequencing, each BAC clone is cut into smaller fragments which are about 2000 bases long. These smaller pieces are known as ‘sub-clones’. A sequencing reaction is carried out on the sub-clones. The products of the sequencing reaction are loaded into an instrument known as sequencer.
The steps involved in the sequencing reaction are as follows:
a. The plasma membrane of the bacterial cell is broken to release the plasmid. The fragment is amplified by a reaction known as the rolling circle amplification or RCA, which is similar to the polymerase chain reaction or PCR.
b. The sequencing reaction requires primers, free bases, DNA polymerase and the template DNA. Template DNA is prepared by unzipping the DNA. Primers are added to the template. The strand becomes elongated in the 5-3′ direction. Amongst the free bases are also found bases that are attached to a fluorescent dye. The elongation process is halted when the fluorescently tagged nitrogenous base is inserted instead of the normal base.
c. The completed sequencing reaction contains an array of coloured DNA fragments. The products of the reaction are then separated by electrophoresis, which separates the strands according to the charge and length of the fragments. The shorter fragments move quickly compared to the longer strands.
In a sequencing machine, a laser beam excites the fluorescent dye and a camera detects the light emitted by the excited dye. One by one the sequencing machine reads the DNA molecules passing down the gel. The information is sent to a computer where the information is assembled and the DNA is sequenced.
About 50 million sequencing reactions are required to read the entire human genome and each base is read at least 9 times. The close association of HGP with computers has led to the development of a new area in biology called Bioinformatics.
Salient Features of Human Genome Project:
The salient features of the human genome project are:
a. The human genome has a total of 3164.7 million nucleotide bases.
b. The total number of genes is estimated to be 30,000.
c. The functions of over 50% of discovered genes are not known.
d. The average gene consists of 3000 bases. However, the largest gene is dystrophin made of 2.4 million bases.
e. A large portion of the human genome is made of repeated sequences.
f. Chromosome 1 has the maximum number of genes (2968) and the Y has the least (231).
g. About 1.4 million locations have been identified by scientists which shows single base DNA differences (SNP – single nucleotide polymorphism pronounced as ‘snips’). This information will enable locating disease associated sequences and in tracing human history.
Completion of the Human Genome Project:
The human genome project is as complete as it can be. Although the gene containing portion of the genome is complete for the purposes of scientific research, there are small gaps which cannot be sequenced with the existing technologies. New technologies will have to be invented to obtain the sequence of these regions.
On June 26, 2000, the rough draft of the human genome sequence was announced. The draft sequence covered 90 per cent of the genome at an error rate of one in 1,000 base pairs, but there were more than 150,000 gaps and only 28 per cent of the genome had reached the finished standard. In April, 2003, the finished version of the human genome sequence was announced where there are less than 400 gaps and 99 per cent of the genome is finished with an accuracy rate of less than one error every 10,000 base pairs.
Every part of the genome sequenced by the Human Genome Project was made public as soon as it was discovered. The Human Genome Project could not have been completed as quickly and as effectively without the strong participation of international institutions.
The ENCODE project is a related project of the HGP. It is a comprehensive encyclopedia of the DNA sequence comprising of a catalogue of protein coding and non-protein coding genes within a genome. It is worth mentioning that the human genome reference sequences do not represent a single person’s genome. The DNA resources used for these studies came from anonymous donors of European, African, American (North, Central, South), and Asian ancestry.