Pfam

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models.[1][2][3]

For each family in Pfam you can:

  • Look at multiple alignments
  • View protein domain architectures
  • Examine species distribution
  • Follow links to other databases
  • View known protein structures

74% of protein sequences have at least one match to Pfam. This number is called the sequence coverage.

The Pfam database contains information about protein domains and families. Pfam-A is the manually curated portion of the database that contains over 10,000 entries. For each entry a protein sequence alignment and a hidden Markov model is stored. These hidden Markov models can be used to search sequence databases with the HMMER package written by Sean Eddy. Because the entries in Pfam-A do not cover all known proteins, an automatically generated supplement is provided called Pfam-B. Pfam-B contains a large number of small families derived from clusters produce by an algorithm called ADDA [4]. Although of lower quality, Pfam-B families can be useful when no Pfam-A families are found.

The database iPfam [5] builds on the domain description of Pfam. It investigates if different proteins described together in the protein structure database PDB are close enough to potentially interact.

[edit] See also

  • TrEMBL Database performing an automated protein sequence annotation
  • InterPro Integration of protein domain and protein family databases

[edit] References

  1. ^ Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008). "The Pfam protein families database.". Nucleic Acids Res 36 (Database issue): D281–8. doi:10.1093/nar/gkm960. PMID 18039703. 
  2. ^ Finn, Rd; Mistry, J; Schuster-Böckler, B; Griffiths-Jones, S; Hollich, V; Lassmann, T; Moxon, S; Marshall, M; Khanna, A; Durbin, R; Eddy, Sr; Sonnhammer, El; Bateman, A (Jan 2006). "Pfam: clans, web tools and services" (Free full text). Nucleic acids research 34 (Database issue): D247–51. doi:10.1093/nar/gkj149. ISSN 0305-1048. PMID 16381856. PMC: 1347511. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=16381856.  edit
  3. ^ Bateman, A; Coin, L; Durbin, R; Finn, Rd; Hollich, V; Griffiths-Jones, S; Khanna, A; Marshall, M; Moxon, S; Sonnhammer, El; Studholme, Dj; Yeats, C; Eddy, Sr (Jan 2004). "The Pfam protein families database" (Free full text). Nucleic acids research 32 (Database issue): D138–41. doi:10.1093/nar/gkh121. ISSN 0305-1048. PMID 14681378. PMC: 308855. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=14681378.  edit
  4. ^ Heger, A; Wilton, Ca; Sivakumar, A; Holm, L (Jan 2005). "ADDA: a domain database with global coverage of the protein universe" (Free full text). Nucleic acids research 33 (Database issue): D188–91. doi:10.1093/nar/gki096. ISSN 0305-1048. PMID 15608174. PMC: 540050. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15608174.  edit
  5. ^ Finn, Rd; Marshall, M; Bateman, A (Feb 2005). "IPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions" (Free full text). Bioinformatics (Oxford, England) 21 (3): 410–2. doi:10.1093/bioinformatics/bti011. ISSN 1367-4803. PMID 15353450. http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15353450.  edit

[edit] External links

Personal tools