User:FedeFede89/sandbox

From Wikipedia, the free encyclopedia

Gene expression profiling refers to the determination of the profile of transcribed genes within a cell, thus being a function of genomic responses to events and general functioning. The set of all RNA molecules in the cell (called the transcriptome) can be determined by a variety of newer techniques, such as high-throughput sequencing, or the presence of individual genes can be determined through older techniques such as Northern blotting and real-time polymerase chain reaction.

Gene expression profiling is particularly useful in adjunction with genomic sequencing, to determine what genes are actually active at different times, since although the gene information may be encoded in the genome, analysis of the transcriptome informs researchers of what genes are active, based on the promoters and inhibitors microenvironments. Furthermore, proteomic analysis is of particular importance when coupled to transcriptomic analysis, since even though genes may be transcribed, modifications at the translation levels may prevent genes from being activately translated into proteins, or may be upregulated comparatively to other genes, even with comparable amounts of mRNA transcripts.

As mRNAs can be very short lived and directly active (e.g. in the suprachiasmatic nucleus clock machinery genes), gene expression profiling must be conducted with carefully considered conditions and appropriate controls. Gene expression profiling can be qualitative or quantitative based on the method used, thus selection of method is an important factor in determining how much information about a particular transcriptome is required.

Applications[edit]

There are many research sectors where gene expression profiling may be applied.

The gene expression changes in specific diseases such as cancer may be investigated, to better understand the genomic basis for these changes, and particularly whether changes in gene expression are due to epigenetic or genomic changes, based on comparison between gene expression profiling and genomic sequencing.

The cellular response to drugs and treatments may be investigated, by determining what genes are upregulated or downregulated to cope with the cancer therapy. Particular implications arise in determining what causes resistance to certain anti-cancer chemotherapeutic agents, such as for example the abrogation of certain cell-cycle checkpoints, which may better inform treatment strategies to overcome these resistance effects.

Specific signalling pathways may be investigated, based on their genomic sequence. This is particularly relevant in cases where many genes orchestrate various events, perhaps within very fine time-keeping systems, such as the complicated interactions of the genetic clock machinery in the suprachiasmatic nucleus.

Example investigation[edit]

  1. Determine the subject cell type
  2. Determine conditions
    1. Determine experimental conditions
    2. Determine at least one control condition
  3. Perform the experiment
  4. Lyse the cells
  5. Extract mRNA
  6. Analyse with method of choice
  7. Read gene expression level from output
  8. Perform statistical analyses to determine changes in expression

Databases[edit]

The Gene Expression Omnibus (GEO) repository is the main database containing gene-expression profiling experiments from a variety of fields. The database contains numerous results and methods of experiments performed in which gene expression profiling was available. Stored results may be in the form of next-generation sequencing, microarrays or older sequencing methods, and can be a useful tool in preliminary analysis, or to advise particular lines of research. All data contained in the database conforms to the MIAME (Minimum Information About a Microarray Experiment) standards, which carefully describe what information is required in order to properly documents experiments which are verifiable and scientifically valid.

THIS IS WHERE THE VIDEO WILL GO AS TO HOW TO SEARCH IT

Techniques[edit]

Many techniques can be employed in the profiling of gene expression. The main techniques are compared in Table 1.

Advantages Disadvantages Sensitivity Simplicity Expense Time required Amount of genes Type of information
High-throughput sequencing
  • Process massive amounts of information simultaneously
  • Can detect novel sequences
  • Currently still not feasible for smaller research groups, due to high cost

++++

++++

$$$$

Hours/days (depends on technique)

Millions

Either

Microarray
  • Quick turn-around
  • High throughput
  • DNA sequence knowledge not required
  • Limited by resolution of scanner used
  • Huge volumes of data may be difficult to analyse

++

++++

$$$

<24 hours

Up to 20000 genes on one chip

Qualitative

SAGE
  • Analyses whole transcriptome without prior selection of known genes
  • Quantitative data for known and unknown genes
  • Good for monitoring change as result of action/treatment
  • Multiple genes may share the same tag
  • High error (chance of one or more errors is 10% for 10 bases)
  • Formulation of libraries can be difficult
  • Sensitive to contamination

++

++++

$$$

3 days

Variable

Quantitative

PCR
  • Fast and easy to perform
  • Only very small amount of target RNA is needed
  • Liable to inhibition by environmental chemicals in sample
  • Very prone to contamination

++++

++++

$

30 minutes - 2 hours

One gene

Quantitative

Northern Blot
  • Long-term storage for re-probing possible
  • Hazardous reagents
  • Risk of RNA degradation

+++

+++

$

<72 hours

One or very few

Qualitative

Table 1. Comparison of gene expression profiling techniques.

High-throughput sequencing[edit]

High-throughput sequencing refers to techniques such as Roche, Illumina and SOLiD which utilise cDNA in order to gain information about the content of RNA within a cell. These methods are paving the way for gene sequencing and are at the forefront of gene expression profiling. This method of analysis has become increasingly attractive due to the fact that these methods are capable of processing millions of sequences in parallel rather than the previous staccato approach of 96 at a time. High-throughput sequencing is ever more appealing due to the minimal bias in comparison to capillary based methods that require cloning and a vector, and requires just a few micrograms of DNA in order to construct a library. There are various high-throughput sequencing methods, with the most popular being Illumina, Roche(454) and SOLiD (summarised in Table 2).

Roche(454) Illumina SOLiD
Sequencing chemistry Pyrosequencing Polymerase-based sequencing-by-synthesis Ligation-based sequencing
Amplification approach Emulsion PCR Bridge amplification Emulsion PCR
Paired ends/separation Yes - 3kb Yes - 200bp Yes - 3kb
Megabytes(mb) per run 100mb 1300mb 3000mb
Time per run 7 hours 4 days 5 days
Read length 250bp 32-40bp 35bp
Cost per run $8439 $8950 $17447
Cost per mb $84.39 $5.97 $5.81
Table 2. Comparison of high-throughput sequencing techniques.

Illumina[edit]

Illumina’s sequencing by synthesis (SBS) technology is one of the most successful and widely-adopted next-generation sequencing platforms. TruSeq technology supports parallel sequencing through a reversible terminator-based method that enables detection of single bases as they are incorporated into growing DNA strands. A fluorescently-labelled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base. The end result is base-by-base sequencing that enables the most accurate data for a broad range of applications.[1] SBS technology supports both single read and paired-end libraries. It is the only platform that offers a short-insert paired-end capability for high-resolution genome sequencing as well as long-insert paired-end reads using the same chemistry for efficient sequence assembly, de novo sequencing, large-scale structural variation detection, and more. The combination of short inserts and longer reads increase the ability to fully characterize any genome. A wide array of available sample preparation methods allow for diverse applications, including: whole-genome and candidate region resequencing, transcriptome analysis, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.[2]

Microarrays[edit]

Example microarray read out

Microarray technology evolved from Southern blotting and was first used in 1982 in a study that looked at the cloning and screening of sequences expressed in a mouse colon tumour.[3] A microarray consists of small, solid supports onto which the sequences of thousands of different genes are immobilized. It is the most commonly used technique to profile thousands of transcripts simultaneously. The main use of arrays is to identify candidate genes expressed under a certain set of conditions, for example, the analysis of genes expressed during yeast sporulation.

There are two types of platforms that are commonly used; cDNA arrays and oligonucleotide arrays. In cDNA arrays, cDNAs from a clone collection or cDNA library are spotted on a nylon membrane or glass slide. Oligonucleotide arrays use oligonucleotides that are either etched on a silicon chip or printed on glass slides.

The oligonucleotide or cDNA spotted array is hybridized to cDNAs synthesized from the mRNA or total RNA extracted from the cell or tissue of interest. The cDNA from two different samples are labelled with fluorescent dyes such as Cy3 (green) and Cy5 (red).[4] These samples can be different cell populations or treatment conditions. The cDNA labelled with Cy3 and Cy5 are mixed together and hybridized against the same array. The two populations compete for the same targets or probe spots on the array. The array is scanned with two different wavelengths following hybridization and washing2. The spot intensity at the two wavelengths is determined. To interpret the results, a ratio or log ratio between the two fluorescent intensities is calculated. Alternatively, radioactivity can be used to increase the sensitivity of the assay but at the cost of decreased density of the array.

In terms of analysing the results, competitive hybridizations detect relative levels of expression by comparing fluorescence intensities of probes from each treatment on each spot. It is generally thought that a two-fold change (induction or repression) represents a biologically meaningful change in gene expression.[5] But there are several problems with only looking at a simple ratio of fluorescence produced by each sample (e.g. dye effects, sample amounts/intensity, background, slide-to-slide variation). Therefore, it is best to assess the statistical significance of a difference in signal strength. This is often done by a T-test, which tests if experiment reference ratios differ from one another or by an ANOVA, which compares normalized expression levels to the mean[5]

Microarray technology has proved to be very useful in the fields of genomics, bioinformatics and gene expression profiling. It has been widely used in comparative genomics of important bacterial strains, for example, a study by Behr et al[6] has used whole-genome DNA microarrays to study the comparative genomics of M. tuberculosis and M. bovis and identified specific virulence associated regions in the genomes.

Sage analysis[edit]

Serial analysis of gene expression (SAGE) is a method originally developed in 1995.[7] SAGE analysis is used in order to compare gene expressions between two mRNA populations. SAGE analysis results in the formation of small sequence tags that are specifically located within cDNA from which they are derived. This method is advantageous because it allows specific identification of cDNA from large quantities of varying transcripts. The structure of the tags produced may be in the form of a dimer or ditag which are then ligated together to form concatemers. These concatemers may then be cloned. The clone samples are run in an automated sequencing gel and from each lane more than 30 individual tags may be read. The expression of the gene is directly identified by the abundance of the tag. This method allows serial analysis of thousands of gene tags and from this information of the genes expressed in a given tissue and the gene expression profile may be simultaneously developed.[8] It is possible to analyse the entire transcriptome with great sensitivity even if the abundance of mRNA is low. This method has been of great use in oncology research as it is able to identify markers in malignant samples. One downfall of SAGE analysis was that cross comparison of tissue samples could not be easily conducted however, in 2010 Yang et al., applied Set theory to the analysis so that common and tissue-specific SAGE tag sequences could to put into ‘sets’. SAGE is a very flexible in its applications and can be used to form digital gene expression databases.[9][10][11] Other forms of SAGE analysis include LongSAGE and SuperSAGE. The long sage method involves the use of individual transcripts to produce 21 bp tags which can be matched to the human genome. Long SAGE is advantageous over SAGE analysis as it produces a higher percentage of accurate tag-to-gene, although for applications where expressed genes are vital and costing is key-factor SAGE analysis is suitable. Long SAGE is more useful is situations where gene discovery is the main objective and also when a large database is in use so that standardization may be achieved along with a low error rate[12] Despite longSAGE being more advantageous regarding tag-gene-mapping, the extra bases noticeably increases the cost of analysis, especially in large-scale gene expression projects for example the Cancer Genome Anatomy Project (CGAP).[13] Super Sage is the superior form of SAGE analysis as it allows the development of 26 bp tags to be formed from a cDNA template and can identify novel genes in any eukaryotic organism.[14] Aside from gene expression profiling, a further function of SAGE analysis that has been identified is transcript detection. A common method originally used for the identification of transcripts is Expressed Sequence Tag (EST) sequencing but it has now been established that SAGE analysis is more powerful.[15][16]

Real-time reverse transcription polymerase chain reaction[edit]

Reverse transcription-polymerase chain reaction is another technique which is used for quantification and detection of a known mRNA sequence in a sample. This technique is highly sensitive and it’s used in the gene expression as it enables you to test if a specific gene is active or inactive. Reverse transcription -PCR uses the enzyme reverse transcriptase to convert the RNA into cDNA. cDNA is then amplified using PCR. However, the exponential growth during each cycle makes the end point quantification unreliable. Due to product unreliability, Real-time polymerase chain reaction (RT-PCR) is used.[17] RT-PCR is the preferred technique in gene expression for quantitative analysis. RT-PCR enables us to collect data in real time as the PCR reaction proceeds. It’s highly sensitive technique and has a superior reproducibility. RT-PCR is required in order to quantitate the difference between mRNA expressions. It is a reliable method in order to detect and measure products that are generated by the PCR. This technique is only available after the introduction of the Oligonucleotide probe. Oligonucleotide probe is a short sequence of nucleotide which are synthesised in order to match a specific regions of the DNA or RNA which then use a molecular probes to detect the specific DNA or RNA sequence. Due to the activity of the Taq polymerase, the amplification of the target specific product during PCR can be detected after probe cleavage. ICycler is one of many machines that are used to monitor the amplification. It can accommodate up to 96 samples which mean many sample can be monitored simultaneously. The PCR arrays include a green optimized primer assay for a thorough study of panel of relevant, pathway or disease focused gene. The simultaneous monitoring allows for high amplification efficiency and specificity which is required for RT results. The fluorescence probe in the 96 well plates is monitored by a sensitive camera which is built within the machine. Due to its simplicity, the PCR array can be designed for a routine use, making the gene expression profiling accessible in every research lab.[17][18]

Northern blot[edit]

Flow diagram outlining the general procedure for RNA detection by northern blotting.

The Northern blot was first discovered in 1977. It is used to evaluate gene expression via the detection of RNA in a given sample. This technique is often used to evaluate gene expression during different conditions, such as during embryogenesis or tumour development.[19] To evaluate this different expression, sample would be simultaneously collected and evaluated. The first step in this technique is to extract the RNA from a homogenised sample, which allows the mRNA to be isolated. Gel electrophoresis is used to separate the RNA sample by size and weight, followed by transfer to a nylon membrane. After the transfer, it is immobilised and then hybridized to a labelled probe to allow detection of the RNA. The next phase is the ‘washing phase’, which washes any unbound probes off the membrane and reduces background signals, to give a clearer result. The signals from the probes are detected by X-ray films and quantified by densitometry.[20]

A newly develop version of the Northern blot, called the reverse Northern blot allows for more specific detection of RNA. The substrate nucleic acid that is fixed to the membrane is a collection of DNA fragments, which are cDNAs of RNA transcripts. After extraction of the RNA from a sample, the RNA is radioactively labelled and then brought into contact with the membrane. The RNA will hybridize with the matching DNA fragments already fixed to the membrane. This technique is useful when looking to determine if a particular gene is present in particular samples, such as if a particular gene is expressed in tumour growth.[21]

For an accurate analysis of gene expression, this technique should be followed by proteomics, as the presence of RNA doesn’t always mean that the RNA is being transcribed.

Example protocols[edit]

Microarray

Northern blot

SAGE

RT-PCR

Illumina (High-throughput)

References[edit]

  1. ^ Wellcome trust, DNA sequencing - the Illumina method. URL: http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTX056051.htm. Accessed on 03/05/2012
  2. ^ Reis-Filho, J. S. (2009). "Next Generation Sequencing", Breast Cancer Research, 11 Suppl 3:S12. PMID 20030863
  3. ^ Augenlicht, L.H., Kobrin, D. (1982). "Cloning and screening of sequences expressed in a mouse colon tumor". Cancer Research 42(3):1088–1093. PMID 7059971
  4. ^ Nguyen, D. V., Arpat, A. B., Wang, N., Carroll, R. J. (2002). “DNA Microarray Experiments: Biological and Technological Aspect” Biometrics 58(4):701-717. PMID 12495124
  5. ^ a b Tuimala, J., Laine, M., M. (2005). "DNA microarray data analysis". CSC Finnish IT Centre for Science, Helsinki, Edition 2, pp. 16-18
  6. ^ Behr, M. A., Wilson, M. A., Gill, W. P., Salamon, H., Schoolnik, G. K., Rane, S., Small, P. M. (1999). “Comparative Genomics of BCG Vaccines by Whole-Genome DNA Microarray”. Science 284(5419):1520-1523. PMID 10348738
  7. ^ Velculescu, V., E., Zhang, L., Vogelstein, B., Kinzler, K., W. (1995). "Serial analysis of gene expression". Science, 270(5235):484–487. PMID 7570003
  8. ^ Enhanced concatemer cloning—a modification to the SAGE (Serial Analysis of Gene Expression) technique. J., Powell
  9. ^ Lai, A., Lash, A. E., Altschul, S. F., Velculescu, V., Zhang, L., McLendor, R. E., Marra, M. A., Prange, C., Morin, P. J., Polyak, K., Papadopoulos, N., Vogelstein, B., Kinzler, K. W., Strausberg, R. L., Riggins, G. J. (1999). "A public database for gene expression in human cancers". Cancer Research, 59(21):5403-7. PMID 10554005
  10. ^ Lash, A. E., Tolstoshev, C. M., Wagner, L., Schuler, G. D., Strausberg, R. L., Riggins, G. J., Altschul, S. F. (2000). "SAGEmap: a public gene expression resource". Genome Research, 10(7):1051-60. PMID 10899154
  11. ^ Boon, K., Osorio, E. C., Greenhut, S. F., Schaefer, C. F., Shoemaker, J., Polyak, K., Morin, P. J., Buetow, K. H., Strausberg, R. L., De Souza, S. J., Riggins, G. J. (2002). "An anatomy of normal and malignant gene expression". Proceedings of the National Academy of Sciences of the United States of America, 99(17):11287-92. PMID 12119410
  12. ^ Lu, J., Lal, A., Merriman, B., Nelson, S., Riggins, G. (2004). "A comparison of gene expression profiles produced by SAGE, long SAGE, and oligonucleotide chips". Genomics, 84(4):631-6. PMID 15475240
  13. ^ Riggins, G. J., Strausberg, R. L. (2001). "Genome and genetic resources from the Cancer Genome Anatomy Project". Human Molecular Genetics, 10(7):663-7. PMID 11257097
  14. ^ Matsumura, H., Reich, S., Ito, A., Saitoh, H., Kamoun, S., Winter, P., Kahl, G., Reuter, M., Kruger, D. H., Terauchi, R. (2003). "Gene expression analysis of plant host-pathogen interactions by SuperSAGE". Proceedings of the National Academy of Sciences of the United States of America, 100(26):15718-23. PMID 14676315
  15. ^ Sun, M., Zhou, G., Lee, S., Chen, J., Shi, R., Z., Wang, S., M. (2004). "SAGE is far more sensitive than EST for detecting low-abundance transcripts". BMC Genomics, 5(1):1. PMID 14704093
  16. ^ Velculescu, V. E., Zhang, L., Vogelstein, B., Kinzler, K. W. (1995). "Serial analysis of gene expression". Science, 270(5235):484-7. PMID 7570003
  17. ^ a b Bustin, S. A. (2002). "Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems". Journal of Molecular Endocrinology, 29(1):23-39. PMID 12200227
  18. ^ Valasek, M. A., Repa, J. J. (2005). "The power of real-time PCR". Advances in Physiological Education, 29(3):151-9. PMID 16109794
  19. ^ Kevil, C. G., Walsh, L., Laroux, F. S., Kalogeris, T., Grisham, M. B., Alexander, J. S. (1997). "An improved, rapid Northern protocol". Biochemical and Biophysical Research Communications, 238(2):277-9. PMID 9299493
  20. ^ Alwine, J. C., Kemp, D. J., Stark, G. R. (1977). "Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes". Proceedings of the National Academy of Sciences of the United States of America, 74(12):5350-4. PMID 414220
  21. ^ Dilks, D. W., Ring, R. H., Khawaja, X. Z., Novak, T. J., Aston, C. (2003). "High-throughput confirmation of differential display PCR results using reverse Northern blotting". Journal of Neuroscience Methods, 123(1):47-54. PMID 12581848