ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and...

16
ANALYSIS OF GENE EXPRESSION DATA

Transcript of ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and...

Page 1: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

ANALYSIS OF GENE EXPRESSION DATA

Page 2: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

•Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern recognition.• The complete set of mRNAs that are transcribed in a cell is often called its transcriptome.• Transcriptomes are at the moment studied with DNA microarray technology.•DNA microarray data are even more valuable when integrated with other types of data.

Page 3: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• Gene expression profiling is often used to find genes that are differentially used by the cell under particular circumstances.• the expression profiles can be used to predict molecular

or cellular function of genes: in a list of similarly expressed genes, those with an unknown function can be assigned a function based on their pattern of shared regulation with genes whose function is known.

Page 4: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• Monitoring gene expression in time can be used to order the activation and repression of transcription, suggesting which genes regulate the expression of other genes.• The transcriptional variance is an essential tool in

evolutionary genetics, allowing to determine differences between individuals, populations, and species under a variety of different environmental conditions.• On the clinical site, gene expression profiles are being

developed with regard to prognosis of cancer progression, classification of infections or predicting the side effects of drugs.

Page 5: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

DNA microarrays

• DNA microarrays, or DNA chips, are devices for checking a sample of DNA simultaneously for the presence of many sequences.• DNA microarrays can be used (1) to determine expression patterns of different proteins by detection

of mRNAs; or (2) for genotyping, by detection of different variant gene sequences, including but not limited to single-nucleotide polymorphisms (SNPs).

Page 6: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• To determine the expression pattern of all of a cell's genes, it is necessary to measure the relative amounts of many different mRNAs. Hybridization is an accurate and sensitive way to detect whether a particular sequence is present in a sample of DNA. The key to high-throughput analysis is to run many hybridization experiments in parallel. This is what microarrays achieve.

Page 7: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• To achieve parallel hybridization analysis, a large number of DNA oligomers are affixed to known locations on a rigid support, in a regular two-dimensional array. The mixture to be analysed is prepared with radioactive or fluorescent tags, to permit detection of hybrids. After the array is exposed to the mixture, each element of the array to which some component of the mixture has become attached bears the radioactive or fluorescent tag.

Page 8: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• A DNA array, or DNA chip, may contain 100 000 probe oligomers. Note that this is larger than the total number of genes even in higher organisms. The spot size may be as small as ~150μ in diameter. The grid is typically a few centimetres across.

Page 9: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• To measure expression patterns, the oligomeric probes are cDNAs (a DNA that is complementary to a given RNA which serves as a template for synthesis of the DNA in the presence of reverse transcriptase) or fragments of cDNAs, reflecting the mRNAs for different genes. Oligomers of length ~50–80 bp are used. For genotype analysis, genomic DNA fragments of length 500–5000 are used.

Page 10: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

Basic Procedure in DNA microarray

• This can be done by spotting oligonucleotide sequences, known as probes, on a slide (also called a chip), with different sequences spotted on different locations.• The mRNA from the biological sample is normally converted to

complementary DNA (cDNA), by reverse transcription, and finally labeled and put on the glass slide. If a cDNA sequence from the sample is complementary to the oligonucleotide sequence of one of the probes on the slide it will hybridize to it (i.e. bind to its complementary part). By labeling the cDNAsequences in the sample with a fluorescent dye the concentration of cDNA in the sample can be quantified by a scanner.

Page 11: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• Two types of chips are often used. The first type is the custom chip, where a robot is used to spot cDNA on a glass slide, and two different fluorescent labels are used to distinguish between sample and control. The second type of array is a prefabricated oligonucleotide chip where the oligonucleotide sequences are synthesized on the chip using photo-lithography. The sample and reference are hybridized on two different chips. The most common vendor is Affymetrix.

Page 12: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• The conventional chips of both types typically cover predefined genes from an entire genome. In the newest versions the entire set of exons from a complete organism, or even the complete genomic sequence, is covered. Other, more flexible technologies exist where researchers themselves can produce custom-made DNA chips. For example, NimbleGen makes DNA microarrays based on micromirror technology (used in data projectors), where the user can define the exact sequences of the probes.

Page 13: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• After the image processing the results are often normalized by adjusting the expression levels relative to a gene or group of genes that are assumed to have a constant expression level between samples. Normally, household genes, which are presumed to be equally expressed under all conditions, or the total amount of mRNA in the sample, are used for this normalization. We will now explain how to normalize custom chips. The normalization of Affymetrix chips follows the same principle.

Page 14: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

• Extracting mRNA from sample and control.• b. Converting the sample to cDNA or cRNA, amplification and labeling

with fluorescent dye.• c. Hybridization of the sample to the probes on the chip.• d. Washing.• e. Scanning of the chip.• f. Image processing of scanned image on a computer.

Page 15: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Page 16: ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

APPLICATIONS OF DNA MICROARRAYS

DNA microarrays can be applied in:• Investigating cellular states and processes. • Diagnosis of disease. • Genetic warning signs. • Drug selection. • Classification of disease. • Target selection for drug design. • Pathogen resistance.