Jump to content

User:Rcrzarg/SmrC14 RNA

From Wikipedia, the free encyclopedia

Introduction to αr14 sRNA[edit]

αr14 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria. The first member of this family (Smr14C2) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis identified 2 other chromosomal copies and 3 plasmidic ones. Moreover full-length Smr14C homologs have been identified in several nitrogen-fixing symbiotic rhizobia (i.e. R. leguminosarum bv.viciae, R. leguminosarum bv. trifolii , R. etli, and several Mesorhizobium species), in the plant pathogens belonging to Agrobacterium species (i.e. A. tumefaciens, A. vitis, A. radiobacter, and Agrobacterium H13) as well as in a broad spectrum of Brucella species (B. ovis, B. canis, B. abortus and B. microtis, and several viobars of B. melitensis). αr19C RNA species are 123-130 nt long (Table 1) and share a well defined common secondary structure (Figure 1). Most of the αr14 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes (Figure 4).

Discovery and Structure[edit]

Smr14C2 sRNA was described by del Val et. al [1], as a result of a computational comparative genomic approach consisting in the integration of complementary strategies, designed to search for novel sRNA-encoding genes in the intergenic regions (IGRs) of the reference S. meliloti 1021 strain (http://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi) . Northern hybridization experiments confirmed that the predicted smr14C2 locus did express a single transcript of the expected size, which accumulated differentially in free-living and endosymbiotic bacteria. TAP-based 5’-RACE experiments mapped the transcription start site (TSS) of the full-length Smr14C transcript to the 1,667,491 nt position in the S. meliloti 1021 genome (http://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi) whereas the 3’-end was assumed to be located at the 1,637,244 nt position matching the last residue of the consecutive stretch of Us of a bona fide Rho-independent terminator (Figure 5). Parallel and later studies [2], [3] in which Smr14C2 transcript is referred to as sra38 or Sm7' independently confirmed the expression this sRNA in S. melilloti and in its closely related strain 2011. Recent deep sequencing-based characterization of the small RNA fraction (50-350 nt) of S. meliloti 2011 further confirmed the expression of Smr14C2 (here referred to as SmelC397), and mapped the 5’- and 3´-ends of the full-length transcript to the same position in the S. meliloti 1021 genome.[4].

Figure 1: Consensus secondary structure of Smr14C2 and the ar14 family predicted by RNAalifold. The coloring of base pairs represents: Red: base pair occurring in all sequences used to generate the consensus; yellow: two types of base pairing occur; Green: three types of base pairing occur. The shading of base pairs represents: Saturated, no inconsistent sequences; Pale, one inconsistent sequence; Very pale, two inconsistent sequences. The gene strand is represented with the file direction.

Four out of the five additional copies found with the family model αr14 in S. meliloti genome have been independently confirmed in recent studies:

  • Smr14C3 referred as: sra38 [2], Sm7 [3] or SmelC398 [4]
  • Smr14psymA1 referred as: sma8 [3] or SmelA075 [4]
  • Smr14psymA2 referred as: SmelA099 [4]
  • Smr14psymB referred as: SmelB161 [4]

There are no experimental evidences up to now for the predicted copy Smr14C1.

The nucleotide sequence of Smr14C2 was initially used as query to search against the Rfam database (version 10.0;http://www.sanger.ac.uk/Software/Rfam). This homology search rendered no matches to known bacterial sRNA in this database. Smr14C2 was next BLASTed with default parameters against all the currently available bacterial genomes (1,615 sequences at 20 April 2011; http://www.ncbi.nlm.nih.gov;). The regions exhibiting significant homology to the query sequence (78-89% similarity) were extracted to create a Covariance Model (CM) from a seed alignment using Infernal (version1.0)[5] (Figure 2). This CM was used in a further search for new members of the αr9 family in the existing bacterial genomic databases.

The results were manually inspected to deduce a consensus secondary structure for the family (Figure 1 and Figure 2). The consensus structure was also independently predicted with the program locARNATE [6] with very similar predictions. The manual inspection of the sequences found with the CM using Infernal allowed finding 26 true homolog sequences, all of them present as single chromosomal copies in the α-proteobacterial genomes. The rhizobial species encoding the 36 closer homologs to Smr14C2 were: S. medicae and S. fredii, two R. leguminosarum trifolii strains (WSM304 and WSM35), two R. etli strains CFN 42 and CIAT 652, the reference R. leguminosarum bv. viciae 3841 strain, and the Agrobacterium species A. vitis,A. tumefaciens, A. radiobacter and A. H13. All these sequences showed significant Infernal E-values (5.63E-29 - 8.16E-18) and bit-scores. The rest of the sequences found with the model showed high E-values between (1.33E-17 and 8.79E-03) but lower bit-scores and are encoded by Brucella species (B. ovis, B. canis, B. abortus, B. microtis, and several biobars of B. melitensis), Ochrobactrum anthropi and the Mesorhizobum species loti, M. ciceri and M. BNC .

Figure 3: Phylogenetic distribution of known and predicted αr14 genes. Gene numbers are based on computational analysis using the program Infernal. Legend:

Expression information[edit]

Parallel studies assessed Smr14C expression in S. meliloti 1021 under different biological conditions; i.e. bacterial growth in TY, minimal medium (MM) and luteolin-MM broth and endosymbiotic bacteria (i.e. mature symbiotic alfalfa nodules) [1] and high salt stress, oxidative stress and cold and hot shock stresses [3] . Expression of Smr14C in free-living bacteria was found to be growth-dependent, being the gene strongly down-regulated when bacteria entered the stationary phase. Interestingly, expression of Smr14C2 increased ~5-fold in nodules when compared with free-living bacteria (log phase TY or MM cultures), suggesting the induction of these sRNAs during bacterial infection and/or bacteroid differentiation [1]. Recent deep sequencing data [4] found differential expression of the plasmic copies. Smr14psymA1 showed differential expression conditions, with a very low expression level in complex medium and in the same medium at decreased temperature. However, it was strongly up-regulated by heat-shock stress [4]. Smr14psymB showed an increase of its expression in the stationary phase greater that 8 fold. Moreover, also showed a week upregulation (<8 fold) upon acidic, basic and oxidative stress [4]

Promoter Analysis[edit]

All the promoter regions of the αr14 family members examined so far are very conserved in a sequence stretch extending up to 120 bp upstream of the transcription start site of the sRNA. All closest homolog loci have recognizable σ70-dependent promoters showing a -35/-10 consensus motif CTTAGAC-n17-CTATAT, which has been previously shown to be widely conserved among several other genera in the α-subgroup of proteobacteria[7]. To identify binding sites for other known transcription factors we used the fasta sequences provided by RegPredict [8](http://regpredict.lbl.gov/regpredict/help.html), and used those position weight matrices (PSWM) provided by RegulonDB[9] (http://regulondb.ccg.unam.mx). We built PSWM for each transcription factor from the RegPredict sequences using the Consensus/Patser program, choosing the best final matrix for motif lengths between 14–30 bps a threshold average E-value < 10E-10 for each matrix was establish, (see "Thresholded consensus" in http://gps-tools2.its.yale.edu). Moreover, we searched for conserved unknown motifs using MEME[10] (http://meme.sdsc.edu/meme4_6_1/intro.html) and used relaxed regular expressions (i.e. pattern matching) over all Smr14C2 homologs promoters. This studies revealed two well defined groups of loci, the first one represented by the closest homologs (Figure 5) that presented a 26 bp long region very conserved between positions -40 and -75, marked as conserved MEME motif in (Figure 5), but no significant similarity to known transcription factor biding sites matrices could be establish. A group of not so closely related members of the αr14 family constituted the second group of conserved promoters (Figure6). They presented a different promoter region, very well conserved across all members and an additional unknown 20 bp motif.

Figure 5: Graphic representation of the αr14 seed members' promoter region. All members presented putative σ70 promoters with -30 and -10 boxes marked in green and red respectively
Figure 6: Graphic representation of the αr14 further related members' promoter region

Genomic Context[edit]

Most of the members of the αr9 family are trans-encoded sRNAs transcribed from independent promoters in chromosomal IGRs. Exceptions are the cis-encoded antisense Smr9C homologs of A. tumefaciens and B. microti, which are located in the opposite strand of annotated genes, partially overlapping ORFs. Most of the neighboring genes of the seed alignment’s members were not annotated and thus were further manually curated. [11][12][13]. The predicted protein products of these overlapping ORFs could not be assigned to any functional category on the basis of the amino acid sequence homology. However, the genomic regions of almost all αr9 sRNAs exhibited a great degree of conservation including the sRNA-coding sequence and the upstream and downstream genes which have been predicted to code for a prolyl-tRNA syntethase (proS) and a transmembrane protein , respectively. Partial synteny of the αr9 genomic regions was observed in a few cases such as, S. medicae where instead of a proS gene an FAD-dependent pyridine nucleotide-disulfide oxidoreductase encoding gene was found upstream of the αr9 locus, and Mesorhizobium loti where no transmembrane coding gene was recognizable downstream of the sRNA gene. An special case is the Brucella group, where primary automatic annotation over their genomes identified ORFs smaller than 30 aa overlapping with the predicted αr9 sRNA in the same strand. These predicted ORFs, neither show any similarity with database entries nor any motif or signatures when searched against family and motif databases such as Interpro [14], PFAM [15] or Smart [16], and thus, are considered here as missannotations not registered in the genomic context graph.

References[edit]

  1. ^ a b c del Val C, Rivas E, Torres-Quesada O, Toro N, Jiménez-Zurdo JI. (2007). "Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics". Mol Microbiol. 66 (5): 1080–1091. doi:10.1111/j.1365-2958.2007.05978.x. PMID 17971083.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. ^ a b Vincent M Ulvé , Emeric W Sevin , Angélique Chéron and Frédérique Barloy-Hubler (2007). "dentification of chromosomal alpha-proteobacterial small RNAs by comparative genome analysis and detection in Sinorhizobium meliloti strain 1021". BMC Genomics. 8 (467). doi:10.1186/1471-2164-8-467.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  3. ^ a b c d Claudio Valverde , Jonathan Livny , Jan-Philip Schlüter , Jan Reinkensmeier , Anke Becker and Gustavo Parisi (2009). "rediction of Sinorhizobium meliloti sRNA genes and experimental detection in strain 2011". BMC Genomics. 9 (406). doi:10.1186/1471-2164-9-416.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  4. ^ a b c d e f g h Schlüter JP, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker JD, Giegerich R, Becker A (2010). "A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti". BMC Genomics. 11 (245). doi:10.1186/1471-2164-11-436.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  5. ^ "Infernal 1.0: inference of RNA alignments". Bioinformatics. 25 (10): 1335–1337. 2009. doi:10.1093/bioinformatics/btp157. {{cite journal}}: Cite uses deprecated parameter |authors= (help)
  6. ^ "Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering". PLoS Comput Biology. 4 (65). 2007. doi:10.1093/10.1371/journal.pcbi.0030065. {{cite journal}}: Cite has empty unknown parameter: |1= (help); Cite uses deprecated parameter |authors= (help)
  7. ^ "Promoter prediction in the rhizobia". Microbiology. 152: 1751–1763. 2006. doi:10.1099/mic.0.28743-0. {{cite journal}}: Cite uses deprecated parameter |authors= (help)
  8. ^ Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP, Mironov AA, Dubchak I (2010). "RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach". Nucleic Acids Research. 38 (Web Server issue): W299–W307. doi:10.1093/nar/gkq531.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  9. ^ Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, Garcia-Sotelo JS, Lopez-Fuentes A, Porron-Sotelo L, Alquicira-Hernandez S, Medina-Rivera A, Martinez-Flores I, Alquicira-Hernandez K, Martinez-Adame R, Bonavides-Martinez C, Miranda-Rios J, Huerta AM, Mendoza-Vargas A, Collado-Torres L, Taboada B, Vega-Alvarado L, Olvera M, Olvera L, Grande R, Morett E, Collado-Vides J (2010). "RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units)". Nucleic Acids Research. 39 (Database issue): D98–D105. doi:10.1093/nar/gkq1110.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. ^ Bailey TL, Elkan C (1994). "Fitting a mixture model by expectation maximization to discover motifs in biopolymers". Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California: 28–36.
  11. ^ Vinayagam A, del Val C, Schubert F, Eils R, Glatting KH, Suhai S, König R. (2006). "GOPET: a tool for automated predictions of Gene Ontology terms". BMC Bioinformatics. 7: 171. doi:10.1186/1471-2105-7-161. PMID 16549020.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  12. ^ Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005). "Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research". Bioinformatics. 21 (18): 3674–3676. doi:10.1093/bioinformatics/bti610. PMID 16081474.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  13. ^ del Val C, Ernst P, Falkenhahn M, Fladerer C, Glatting KH, Suhai S, Hotz-Wagenblatt A. "ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ". Nucleic Acids Res. 35 (Web Server issue): W444-50. doi:10.1093/nar/gkm364. PMID 17526514.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  14. ^ Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2009). "InterPro: the integrative protein signature database". Nucleic Acids Res. 37 (Database issue): D224-228. doi:10.1093/nar/gkn785.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  15. ^ Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR and Bateman A (2010). "The Pfam protein families database". Nucleic Acids Res. 38 (Database issue): D211-222. doi:10.1093/nar/gkp985.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  16. ^ Letunic I, Doerks T, Bork P (2008). "SMART 6: recent updates and new developments". Nucleic Acids Res. 38 (Database issue): D211-222. doi:10.1093/nar/gkn808.{{cite journal}}: CS1 maint: multiple names: authors list (link)