Template talk:Infobox protein family

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconMolecular Biology: MCB Template‑class
WikiProject iconThis template is within the scope of WikiProject Molecular Biology, a collaborative effort to improve the coverage of Molecular Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
TemplateThis template does not require a rating on Wikipedia's content assessment scale.
Taskforce icon
This template is supported by the Molecular and Cell Biology task force.
WikiProject iconChemicals Template‑class
WikiProject iconThis template is within the scope of WikiProject Chemicals, a daughter project of WikiProject Chemistry, which aims to improve Wikipedia's coverage of chemicals. To participate, help improve this template or visit the project page for details on the project.
TemplateThis template does not require a rating on Wikipedia's content assessment scale.

Prosite vs PROSITE[edit]

Both are now accepted fields thanks to Boghog2. Abergabe (talk) 13:29, 24 June 2010 (UTC)[reply]

Suggested changes[edit]

Boghog suggested changes in this template here. I see three issues here.

  1. Making PDB RCSB query. Can we actually make a query to identify all PDB structures that belong to PFAM family PF0XXXX? The suggested template does not allow it. If we can, that would be a significant improvement, and I strongly support it because such version would automatically update the list of PDB files that belong to each PFAM family! This should be possible because current PDB RCSB version provides PDB->PFAM mapping, but I do not know how to do it, especially in the template.
  2. Links to list of PDB files in Pfam. That might be excessive because we have a link to PFAM already, but it does not hurt. Support.
  3. Links to PDBsum entries. They are not present in the new version proposed by Boghog. If we can make a query as for PDB (see #1), we could indeed replace all current links by a query (this should be possible because PDBsum provides PDB-PFAM mapping). If not, let's keep current links to specified PDB files.Hodja Nasreddin (talk) 02:44, 10 March 2011 (UTC)[reply]
ad 1) I just added such a link to the RCSB source code. It will be available on the public site soon. Also sent a link to a test server to Boghog and I think the link will work fine for him. I'll post an update here with the details once this is available for the public. --Andreas (talk) 03:34, 10 March 2011 (UTC)[reply]
In response to Hodja three points:
  1. Making PDB RCSB query. The sandbox version (see testcases) already returns all the structures that contain the Pfam family PF0XXXX. The sandbox version currently uses the {{Pfam2pdb}} template (which in turn is based on this pdb to pfam accession number list) to accomplish this. Because of the enormous size of this template, I asked Andreas if he could enable a Pfam query link to the RCSB PDB. As he mentioned above, he sent me a test link that I have verified works. As soon as a public version becomes available, we will include it in the {{Infobox protein family}} template.
  2. Links to list of PDB files in Pfam The advantage of this link is that provides detailed information about the precise location of the Pfam domain within each structure. I know this information is also provided in some of the other links graphically, but this particular link provides a concise text summary of this information.
  3. Links to PDBsum entries As can clearly be seen in the testcases, individual PDBsum links are included in the sandbox version. In summary, what is currently implemented in the sandbox are query links to (1) Pfam, (2) RCSB PDB, and (3) PDBe, and each of these links return all of the structures associated with a given Pfam domain. It would be nice if PDBsum could also provide such a link. In the mean time, the individual PDBsum links will still be displayed. Boghog (talk) 07:59, 10 March 2011 (UTC)[reply]
All right, everything sounds great. Let's make these modifications in template using link/query provided by Andreas. BTW, are Pfam-PDB mappings identical in Pfam and PDB databases? I had an impression that PDBe does such mapping independently, as soon as new PDB files are released. That's important because Pfam is normally updated once a year, but PDB is updated every week.Hodja Nasreddin (talk) 17:52, 10 March 2011 (UTC)[reply]
The {{Pfam2pdb}} and {{Pfam2PDBsum}} templates were created from the same pdb to pfam accession number list so they both should return identical lists of structures. These templates are currently up-to-date. Ideally I would like to replace both templates with direct query links to the external databases, but if these take a long time to implement, the templates can be updated from time to time. Boghog (talk) 22:12, 12 March 2011 (UTC)[reply]
Just to add, RCSB PDB loads PDBe-SIFTS files (which provide the mapping) as well as Pfam on a weekly basis --Andreas (talk) 22:53, 12 March 2011 (UTC)[reply]

 Done The new version of the {{Infobox protein family}} has now been put into production (diff). I made a new {{Pfam2PDBsum}} that is transcluded into the infobox and removed completely the PDB parameter (all external PDB links are now derived from the Pfam parameter and the PDB parameter has been deprecated). As soon as pfam query links to the RCSB PDB and PDBsum databases are available, these can replace the {{Pfam2pdb}} and {{Pfam2PDBsum}} templates respectively. Cheers. Boghog (talk) 16:27, 12 March 2011 (UTC)[reply]

This is serious improvement. I quickly tested it for PH domain. As expected, some of the most recently released PDB files now appear in PDB (and PDBsum), but not in Pfam link (e.g. 3pp2). However, something strange is happening with PDBe link [1]. It searches for PF00104 (Ligand-binding domain of nuclear hormone receptor), in addition to PH domain (PF00169) and therefore retrieves a much larger number of files. This should be fixed. Otherwise, great work! Hodja Nasreddin (talk) 21:33, 12 March 2011 (UTC)[reply]
Thanks for catching the bug. Hopefully it is now fixed. Sorry about that. Boghog (talk) 21:56, 12 March 2011 (UTC)[reply]
Great! PDBe search provides a nice sortable table [2], but they forget to include a field with UniProt code (this should be done as in Pfam: [3]). But this is their problem.Hodja Nasreddin (talk) 22:09, 12 March 2011 (UTC)[reply]
Perhaps we should not make collapsible three links with "Available protein structures", but only collapse the list of PDBsum files. Another question: should we remove "OPM protein" and leave only "OPM family"? I think this is something for you to decide since you work so much with this template.Hodja Nasreddin (talk) 22:28, 12 March 2011 (UTC)[reply]
Concerning the OPM family/protein links and the collapsable views, I don't have a strong feeling one way or the other. Perhaps we should experiment with the sandbox version first and leave the production version as is for a few days to see if others express an opinion. If no one objects, I would be happy to change the production version. Boghog (talk) 22:41, 12 March 2011 (UTC)[reply]
Great work Boghog, thanks for this improvement! As promised, I will post an updated RCSB link here once it is available to the public. Currently ETA is early April. If you like customizeable tables, Hodja, check the "Generate Reports" drop down at RCSB... --Andreas (talk) 22:53, 12 March 2011 (UTC)[reply]
Agree. My personal suggestion would be to keep everything as it is right now. I really like this new version. Hodja Nasreddin (talk) 23:17, 12 March 2011 (UTC)[reply]

New PDBsum hooks[edit]

The good people at PDBsum have added support for queries of their site with Pfam accessions eg [4]. These should be much more consistent with the other structure links. I hope these are of use here. --Paul (talk) 15:58, 21 March 2011 (UTC)[reply]

 Done I am very impressed with the quality of this new PDBsum layout. I have already included the new link in the template (see beta-lactamase for an example). I am open to suggestions for tweaking the display of the link (I wasn't quite sure what to call it). Thank you Paul Gardner, Alex Bateman, and especially Roman Laskowski for implementing the link and for producing this great looking report! These new pfam structure links greatly increase the value of the pfam infoboxes and at the same time, eliminate the need to update the links. A double win. Thanks again to all for your help. Cheers. Boghog (talk) 20:26, 21 March 2011 (UTC)[reply]
Looks good Boghog! All the hard work was done by Roman. I like the nice clean new boxes. Many thanks for putting this all together so quickly and efficiently. If you're looking for a job let us know. ;-) --Paul (talk) 22:54, 21 March 2011 (UTC)[reply]
  • Amazing work with PDBsum! Protein boxes on wiki probably need to be updated accordingly (compare list of PDB files for VTNC_HUMAN here and here). I will look more carefully.Hodja Nasreddin (talk) 14:38, 27 March 2011 (UTC)[reply]
PDBsum retrieves 148 rhodopsin-like GPCRs [5] but PDB itself retrieves only 67 rhodopsin-like GPCRs! Why? There are three reasons. (1) PDBsum includes theoretical models (it would be great to mark theoretical models with sign like "*" or something here). (2) PDBsum also finds complexes with GPCR peptides missed by PDB query (e.g. 2pux). This is PDB problem. (3) PDBsum finds GPCR structures that do not include transmembrane domain (e.g. 1xwd). Good job! Hodja Nasreddin (talk) 04:34, 28 March 2011 (UTC)[reply]
The problem of course with theoretical models is that they have a limited useful life time. With the recent explosion in the number of GPCR experimental structures, most of the published GPCR homology models have become very dated since much better experimental templates are now available. The PDB made a deliberate decision to exclude theoretical models from their database, a decision which I strongly support. Concerning whether structures containing short sequences or only domains that are not common to the whole family (and in fact may contain other pfam domains) is a good or bad thing is debatable and depends on what you are looking for. The PDB list contains less noise whereas the PDBsum list is more exhaustive. Boghog (talk) 05:40, 28 March 2011 (UTC)[reply]
I am using PDBsum-Pfam queries right now, but there are some minor issues with PDB-Pfam mapping (which is not a PDBsum problem). Taking lipases as an example, it seems that PDB structures 1ku0, 1ji3 and 2dsn are not assigned to any Pfam family, and other structures (like 2z8x) are assigned to only one Pfam entry, although they should be assigned to several Pfam entries as consisting from several domains. I also think it would be a good idea to look more carefully at the differences in Pfam and SCOP classifications. While I agree that family of fungal lipases (in SCOP) should be split, as it is in the current release of Pfam, this is more questionable for bacterial lipases (e.g. placing LIP_BURCE and ESTA_BACSU to different families, PF01674 and PF00561). PF00561 looks like a dump for alpha/beta hydrolases that were not assigned to other families. Other than that, having these queries is a huge asset! Hodja Nasreddin (talk) 21:57, 6 April 2011 (UTC)[reply]
One minor suggestion would be to replace words Structures in the PDB containing this domain by Proteins in the PDB containing this domain in the output of the query, because it shows other domains. For example, this tells: "Structures in the PDB containing this domain: 14", but in fact none of the structures contains this domain, only proteins do. Hodja Nasreddin (talk) 00:25, 15 April 2011 (UTC)[reply]

New RSCB PDB link[edit]

This week's RCSB PDB website update added a new Pfam ID search and link-in by Pfam ID. This is based on the discussion earlier on this page. Here an example link [6] This should be useful for simplifying the template. --Andreas (talk) 17:21, 20 April 2011 (UTC)[reply]

 Done Thank you Andreas! The new link works great and I have already incorporated it in the template. I now no longer need to worry about maintaining the link myself. I greatly appreciate that you followed up on our request. Cheers. Boghog (talk) 19:49, 20 April 2011 (UTC)[reply]
Thanks for the quick response and updating the template, Boghog! Let us know if you have any more feature requests. --Andreas (talk) 23:39, 20 April 2011 (UTC)[reply]

SUPERFAMILY link?[edit]

Is it possible to add a link to SUPERFAMILY to the automatic generation of these templates? Taking the PDB Code you can link to the database using http://supfam.org/SUPERFAMILY/cgi-bin/search.cgi?search_field=PDB_CODE_HERE -- MattOates (Ulti) 16:47, 5 January 2012 (UTC)[reply]

Doesn't the SCOP link that is already included in the {{Infobox_protein_family}} template provide the class/fold/superfamily/family information for each Pfam entry? See for example 1LBD. Boghog (talk) 20:54, 5 January 2012 (UTC)[reply]
Yes but SUPERFAMILY isn't just SCOP, it takes all SCOP classifications and PDB sequence per structure and builds profile HMMs that are scored against all sequenced genomes. You can also do phylogenetic analysis on the site and look at where the structure lies in the tree of life etc. That search result is just the landing page for analysis through the SUPERFAMILY data using a PDB structure to start with. Essentially it's a PFam like resource but starting with human annotation from SCOP. -- MattOates (Ulti) 11:54, 6 January 2012 (UTC)[reply]
OK, it wasn't immediately clear to me which individual PDB accession number to use. However since the SCOP link already uses a PDB code, we can use the same PDB accession number to link to SUPERFAMILY (see for example 1LBD). For an example of how this would look in the pfam infobox, see the example on the right hand side Template:Infobox_protein_family/testcases. Does this look OK? Boghog (talk) 12:47, 6 January 2012 (UTC)[reply]
Looks great, the CDD links look really useful too! Out of interest where do you get all the accession numbers from to begin with? Both SCOP and SUPERFAMILY have their own accession but it's not the PDB structure by default, I might be able to get hold of a data file mapping these for you if you wished? There are definitely better links that could be given for SUPERFAMILY if you had the sunid number, in the example structure the sunid is 48508 and the following link could have been used: http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=48508 Thanks for your efforts, its making wikipedia really useful for bioinformatics! -- MattOates (Ulti) 11:40, 7 January 2012 (UTC)[reply]
Thanks, but I think User:Biophys (for creating the template) and User:Alexbateman (for migrating Pfam to Wikipedia) deserve a lot more credit than me. Also I am embarrassed to say that I didn't notice the CDD link that was recently added by Kalafati (talk · contribs). I would agree that the CDD link is very useful. One way to find the CDD link is to search the CDD database using the Pfam accession number (see for example PF00104). I agree that using a pdb accession number is not the ideal way of linking to SCOP and SUPERFAMILY. If you could provide me a mapping, we could add new parameters to the template that would use the native SCOP and SUPERFAMILY accession numbers and I could use a bot to replace the old pdb based links with new native accession number links. Cheers. Boghog (talk) 14:07, 7 January 2012 (UTC)[reply]

Edit request on 21 January 2012[edit]

Sync with {{Infobox_protein_family/sandbox}} to include changes as discussed here and here. (Also see testcases for test of requested changes.) Thanks.

Boghog (talk) 12:35, 21 January 2012 (UTC)[reply]

 Done. I also took the opportunity to strip some extraneous whitespace,adjusted the image handling code to fall back to the user's thumbnail size rather than a hard-coded 220px, and moved the auto-categorisation out of the main template logic for ease of maintenance. If there are any problems please let me know ASAP. Chris Cunningham (user:thumperward) (talk) 15:49, 1 February 2012 (UTC)[reply]
Thanks for implementing my requests and for the additional tweaks. You are amazing! Cheers. Boghog (talk) 06:01, 6 February 2012 (UTC)[reply]

Edit request on 25 April 2012[edit]

In the "Available protein structures" section, the link to the Pfam structures list is broken; since an intermediate tab was added some time ago, it shows the wrong tab in the Pfam page. Rather than linking to "tab8", it should link to "tab9". Alternatively, a more future-proof version would be to remove "#tabview=tab8" and add "?tab=pdbBlock", e.g. pfam.sanger.ac.uk/family/AAA?tab=pdbBlock. Hope that makes sense. Jgtate (talk) 12:56, 25 April 2012 (UTC)[reply]

 Done: I went with the second option as a permalink is preferable to an anchor which relies on user agent behaviour. if there are any problems with this please let me know ASAP. Chris Cunningham (user:thumperward) (talk) 14:38, 25 April 2012 (UTC)[reply]
Looks great. Thanks for doing that so quickly. Jgtate (talk) 09:44, 26 April 2012 (UTC)[reply]

Edit request on 4 May 2012[edit]

Please replace words "OPM family" by words "OPM superfamily" in template. That is how it was named in OPM database.

This probably should be:

{{#if:| ! style="background-color: #e7dcc3" | OPM superfamily | style="background-color: #eee" | superfamily}}} {{{OPM superfamily}}} |-

But it would be better to unprotect the template. Why it was protected?

My very best wishes (talk) 15:43, 4 May 2012 (UTC)[reply]

 Done the edit; if you think unprotection is appropriate then the best thing would be to contact the proecting admin directly, who is Closedmouth (talk · contribs). Chris Cunningham (user:thumperward) (talk) 13:56, 14 May 2012 (UTC)[reply]

Placement of infobox title[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
The result was move title back into rectangular border. Boghog (talk) 09:46, 18 August 2012 (UTC))[reply]

Previously the title of this infobox was placed within the rectangular border (see test cases). This was changed apparently without discussion so that title of the infobox is now placed above the infobox (again, see test cases). I personally prefer the previous style since placing the title within the infobox removes any confusion about the connection of the title with the infobox which become an issue if the infoboxes are stacked on top of each other. For arguments pro and con, see this previous discussion. I propose that we change back to the original placement of the title . Boghog (talk) 18:55, 19 July 2012 (UTC)[reply]

I strongly agree that placing the title inside the box adds clarity. I'm a relative newbie and tend to look at most things more like your average reader and less like an editor, so you can take my opinion with that grain of salt.--Biolprof (talk) 02:54, 20 July 2012 (UTC)[reply]
I have looked through a few examples and while I agree that having the title in the box might make it clearer in some cases, I haven't seen any cases where I became confused though. Please do point out some if you know about them. One user in the previous thread seemed to be suggesting that having the title outside the box gave some benefit to partially sighted users, but I do not understand the reasons for that. Perhaps the ideal would be to work out a way to put the title into the box while addressing the issue for partially sighted. So I remain neutral on this one. Alexbateman (talk) 09:32, 20 July 2012 (UTC)[reply]
I think it looks much better as suggested by Boghog (name within the box). Let's implement the change. My very best wishes (talk) 11:55, 20 July 2012 (UTC)[reply]
As explained here, I think the advantages placing the title over the infobox for the sight impaired are over stated or even non-existent. For clarity and for consistency with other infobox, I believe it is better to have the title placed within the box instead of over it. Boghog (talk) 09:46, 18 August 2012 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Semi-protected edit request on 23 July 2017[edit]

Add link to CATH (protein structure classifiction database) for each PDB ID. Iansillitoe (talk) 11:48, 23 July 2017 (UTC)[reply]

Not done: please make your requested changes to the template's sandbox first; see WP:TESTCASES. jd22292 (Jalen D. Folf) (talk) 17:07, 23 July 2017 (UTC)[reply]
Thanks for your help, and apologies in advance for any accidental mistakes in protocol (still learning my way around). It may have been more appropriate for me to ask someone with more experience to add this change. I created a /sandbox and /testcase to preview this change and everything looked okay to me. It's entirely possible I've missed something though - any further pointers would be greatly appreciated. Iansillitoe (talk) 12:34, 26 July 2017 (UTC)[reply]
Done jd22292 (Jalen D. Folf) (talk) 16:46, 26 July 2017 (UTC)[reply]

Merops changed its url system[edit]

Sadly, http://merops.sanger.ac.uk/cgi-bin/merops.cgi?id={{{MEROPS}}} no longer works. There appears to be a different system of clans, families and peptides, which would probably require three parameters (or alternatively some {{{MEROPS_level}}} switching parameter):

https://www.ebi.ac.uk/merops/cgi-bin/clansum?clan={{{MEROPS_clan}}}
https://www.ebi.ac.uk/merops/cgi-bin/famsum?family={{{MEROPS_family}}}
https://www.ebi.ac.uk/merops/cgi-bin/pepsum?mid={{{MEROPS_peptide}}}

e.g. TEV protease would be https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=C04.004 whereas the PA clan would be https://www.ebi.ac.uk/merops/cgi-bin/clansum?clan=PA T.Shafee(Evo&Evo)talk 01:02, 14 August 2018 (UTC)[reply]

TEV protease uses another box/template. However, something like Caspase indeed uses inobox protein family, and clicking to MEROPS leads to front page of MEROPS, instead of the corresponding clan in the database. My very best wishes (talk) 19:57, 14 August 2018 (UTC)[reply]
So, I have changed it to MEROPS family because in all cases when this template is used it seems to redirect to MEROPS family, rather than to clan or protein (which is logical). I checked and it seems to work, e.g. Caspase, Metalloproteinase, etc. My very best wishes (talk) 20:10, 14 August 2018 (UTC)[reply]
This would not work only on pages like PA clan of proteases, but one can provide Pfam clan (same authors and classification). More fine tuning in the template and pages can be certainly done, but maybe by someone else. My very best wishes (talk) 20:21, 14 August 2018 (UTC)[reply]
P.S. Of course links on all pages about individual proteins like Factor IX are broken, but this is not connected to this template. All such links should be either fixed or removed. One should notify authors. They did not make and use a separate template for such links. If they did, it would be easy to fix. Of course MEROPS is a very good, helpful database. My very best wishes (talk) 20:31, 14 August 2018 (UTC)[reply]

Family versus superfamily in OPM database[edit]

Making OPM superfamily and family is fine, but then one needs to include an additional field for the family to the template. Otherwise, this is becoming a mess with current numbering in infoboxes. I can look at this later. My very best wishes (talk) 19:03, 14 March 2019 (UTC)[reply]

I see: it is misleadingly named "family" in infoboxes on individual pages, while all of them are superfamilies. But fixing this would be tedious. My very best wishes (talk) 19:13, 14 March 2019 (UTC)[reply]

RCSB PDB links no-longer work[edit]

This template uses the RCSB legacy API to query for related structure. This API has been shutdown. Please update the template to generate links using the new search API. --108.52.199.101 (talk) 18:35, 5 January 2021 (UTC)[reply]

This is fixed, there was no need to use the API. bonob (talk) 18:55, 1 May 2021 (UTC)[reply]

Three other pdb-in-membrane databases[edit]

Just today I saw RCSB PDB give me links for pdb-in-membrane databases other than OPM. We might need to expand that part somewhat.

Artoria2e5 🌉 11:23, 26 November 2023 (UTC)[reply]

Adding ECOD links to Protein family infobox[edit]

We would like to start adding ECOD links to protein family infoboxes. ECOD is an important structural classification much like CATH or SCOP. ECOD can be found here: http://prodata.swmed.edu/ecod/.

I would propose that the following option be added to the infobox: ECOD = 148.1.1.1

and this would link to ECOD in the following way:

http://prodata.swmed.edu/ecod/complete/tree?id=148.1.1.1

Any comments on this proposal?

Thanks Alexbateman (talk) 12:01, 22 March 2024 (UTC)[reply]