The gene is the fundamental unit of inherited information in DNA, and is defined as a section of base sequences that is used as a template for the copying process called transcription. Genes carry the necessary information to encode particular protein structures. Genes comprise only a fraction of all of the DNA carried on the chromosomes of a human cell. Although some of the extragenic sequences function as regulatory elements for the control of gene expression, a role has yet to be assigned to the extensive stretches of non-coding DNA sequences ("junk" DNA) in the genomes of higher organisms. It has been conjectured that these sections of the genome participate in the higher order structure of chromosomes, or they may interact with the cytoskeletal components to localize certain regions of DNA to specific nuclear locations. In higher organisms, most protein-coding gene sequences are unexpectedly interrupted by stretches of non-coding sequences, called introns. Intronic sequences often contribute more to the overall length of a gene than do the coding regions, called exons. Introns must be removed from the nascent RNA chain to bring the different portions of the protein coding sequences, the exons, into a continuous nucleotide chain for translation.
DNA sequences that are transcribed into RNA are collectively called the gene and include exons (expressed seqeunces) and introns (intervening sequences). Introns invariably begin with the nucleotide sequence GT and end with AG. An AT-rich sequence in the last exon forms a signal for processing the end of the RNA transcript. Regulatory sequences that make up the promoter and include the TATA box occur close to the site where transcription starts. Enhancer sequences are located at variable distances from the gene.
Gene expression begins with the binding of multiple protein factors to enhancer and promoter sequences. These factors facilitate the formation of the transcription initiation complex, which includes the enzyme RNA polymerase and polymerase-associated proteins. The primary transcript (pre-mRNA) includes both the exon and intron sequences. Post-transcriptional processing introduces changes at both ends of the transcript. At the 59 end, enzymes add a special nucleotide cap; at the 39 end, an enzyme clips the pre-mRNA about 30 bases after the AAUAAA sequences in the last exon. Another enzyme adds a polyA tail, which consists of up to 200 adenine nucleotides. Next spliceosomes remove the introns by cleaving the RNA at the boundaries of introns and exons. The spliced mRNA is now mature and can leave the nucleus for protein translation in the cytoplasm.