Pfam
From Wikipedia, the free encyclopedia
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models.[1][2][3]
For each family in Pfam you can:
- Look at multiple alignments
- View protein domain architectures
- Examine species distribution
- Follow links to other databases
- View known protein structures
74% of protein sequences have at least one match to Pfam. This number is called the sequence coverage.
The Pfam database contains information about protein domains and families. Pfam-A is the manually curated portion of the database that contains over 10,000 entries. For each entry a protein sequence alignment and a hidden Markov model is stored. These hidden Markov models can be used to search sequence databases with the HMMER package written by Sean Eddy. Because the entries in Pfam-A do not cover all known proteins, an automatically generated supplement is provided called Pfam-B. Pfam-B contains a large number of small families derived from clusters produce by an algorithm called ADDA [4]. Although of lower quality, Pfam-B families can be useful when no Pfam-A families are found.
The database iPfam [5] builds on the domain description of Pfam. It investigates if different proteins described together in the protein structure database PDB are close enough to potentially interact.
[edit] See also
- TrEMBL Database performing an automated protein sequence annotation
- InterPro Integration of protein domain and protein family databases
[edit] References
- ^ Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008). "The Pfam protein families database.". Nucleic Acids Res 36 (Database issue): D281–8. doi:. PMID 18039703.
- ^ Finn, Rd; Mistry, J; Schuster-Böckler, B; Griffiths-Jones, S; Hollich, V; Lassmann, T; Moxon, S; Marshall, M; Khanna, A; Durbin, R; Eddy, Sr; Sonnhammer, El; Bateman, A (Jan 2006). "Pfam: clans, web tools and services" (Free full text). Nucleic acids research 34 (Database issue): D247–51. doi:. ISSN 0305-1048. PMID 16381856. PMC: 1347511. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=16381856.
- ^ Bateman, A; Coin, L; Durbin, R; Finn, Rd; Hollich, V; Griffiths-Jones, S; Khanna, A; Marshall, M; Moxon, S; Sonnhammer, El; Studholme, Dj; Yeats, C; Eddy, Sr (Jan 2004). "The Pfam protein families database" (Free full text). Nucleic acids research 32 (Database issue): D138–41. doi:. ISSN 0305-1048. PMID 14681378. PMC: 308855. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=14681378.
- ^ Heger, A; Wilton, Ca; Sivakumar, A; Holm, L (Jan 2005). "ADDA: a domain database with global coverage of the protein universe" (Free full text). Nucleic acids research 33 (Database issue): D188–91. doi:. ISSN 0305-1048. PMID 15608174. PMC: 540050. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15608174.
- ^ Finn, Rd; Marshall, M; Bateman, A (Feb 2005). "IPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions" (Free full text). Bioinformatics (Oxford, England) 21 (3): 410–2. doi:. ISSN 1367-4803. PMID 15353450. http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15353450.
[edit] External links
- Pfam - Protein family database at Sanger Institute UK
- Pfam - Protein family database at Janelia Farm Research Campus USA
- Pfam - Protein family database at Center for Genomics and Bioinformatics Sweden
- iPfam - Interactions of Pfam domains in PDB

