Wikipedia:Peer review/Folding@home/archive1
Folding@home[edit]
Toolbox |
---|
This peer review discussion has been closed.
I've listed this article for peer review because I'm eager to improve this article as much as I can, with the ultimate goal of achieving FA status. As the primary editor, this article consists almost entirely of my writing, so it's becoming more difficult for me to determine what needs to be further improved. I've already achieved GA status, and some of the criteria quickly passing without comment. In response to my inquiry regarding how I could further improve the prose, the GA reviewer, Czarkoff, later commented "The current text actually impressed me with the balance between the depth of coverage and ease of apprehension, as well as between medical and computing topics. I don't think something should be changed in this regard." (diff) Thus, I'm beginning to feel that it's getting close to FA standards but I'd appreciate a review to help me get there. I wouldn't mind a detailed and thorough analysis, but any assistance would be appreciated.
Thanks, Jesse V. (talk) 21:12, 14 May 2012 (UTC)
Folding@home review part 1 from JMiall (This is by no means my area of expertise so I am analysing it as it reads without looking at any of the refs)
- I was surprised that molecular dynamics linked to an article defining it as computer simulation. I would have naively assumed that something called x-dynamics was the general study of the dynamics of x as seems to be generally the case rather than limiting it to only computer simulations.
- 'to predict that final structure and determine how other molecules may interact with it' - to interact with the structure or the protein?
- 'The project is dedicated to understanding protein folding, the diseases that result from misfolding, and developing new methods for computational drug design' - hasn't this just been said in the paragraph or are you talking about a project within Stanford that includes F@h?
- 'Completed work units' are mentioned as returning before we know what a work unit is, how a computer gets it or what counts as completion.
- It is not clear what a 'credit point' is when it is first mentioned
- 'This global computing network' - you have not yet established that it is global, just that it uses 1000s of computers.
- 'the Pande lab has produced ninety-six scientific research papers as a direct result of the project using simulation methodology that is a paradigm shift away from traditional computational approaches' - there are 2 possible meanings to this that it would be good to clarify - have they produced 96 papers as a direct result of the methodology used? or produced 96 papers as a result of the project, which by the way happens to use a new method of simulation? Also isn't 'paradigm shift away from' something of a tautology?
- I've separated the statements, but the latter is the most accurate. The methodology is dividing up the simulation and using distributed computing, which is a paradigm shift because its not just straightforward computation. It's now "from" instead of "away from". Jesse V. (talk) 02:26, 16 May 2012 (UTC)
- 'accuracy compared to' - do you mean comparible to?
- 'it typically proceeds smoothly' - what is meant by smoothly here? particularly as not long after the process is described as 'stochastic'.
- 'where state each'
- I'm not sure I properly understand the operation of the simulation from the description:
- I'll do my best to help, but I'm not a biochemistry expert and sometimes the technical papers lose me. It was challenging to write that article's paragraph! Jesse V. (talk) 06:48, 16 May 2012 (UTC)
- How are the local minima in the energy landscape from which the simulations are started found?
- I actually don't know for certain, I only have half-guesses. Do you think its important to clarify this? Jesse V. (talk) 06:48, 16 May 2012 (UTC)
- Yes. Just finding the minima could be a massive computational challenge by itself. JMiall₰ 15:24, 19 May 2012 (UTC)
- All right. On my long-term to-do list. I'm currently trying to figure out how to access the publications from outside Utah State University. Jesse V. (talk) 06:26, 20 May 2012 (UTC)
- Yes. Just finding the minima could be a massive computational challenge by itself. JMiall₰ 15:24, 19 May 2012 (UTC)
- I actually don't know for certain, I only have half-guesses. Do you think its important to clarify this? Jesse V. (talk) 06:48, 16 May 2012 (UTC)
- How is the phase space explored? In what way is the exploration different than "waiting" for the protein to leave the minimum and evolve to a new minimum?
- By moving from one minima and finding a new one, you are exploring the phase space. At least, that's my understanding. I've reworded the statement. Jesse V. (talk) 06:48, 16 May 2012 (UTC)
- 'arbitrary resolution' - resolution on what scale?
- how is it known that no minima have been missed?
- linking to measurement uncertainty on something that is definitely not a measurement.
- 'In 2002 Folding@home used... and in 2011 they parallelized' but 'In January 2010 researchers used... Folding@home to' - Use and be used. Should this be consistent throughout?
- I'm confused as to what you're saying here. I did replace "they" with "MSMs" for clarity, and I think the tense is okay. Please clarify. Jesse V. (talk) 02:54, 16 May 2012 (UTC)
- Sorry, what I meant was sometimes in the article F@h is treated as an entity that does things by itself and sometimes as an object which is used by others. Consistency would be better.JMiall₰ 15:24, 19 May 2012 (UTC)
- Ah! Thanks for the clarification. I made changes and now F@h as a single entity when its simulating something, but make the distinction that scientists/researchers use it to study something or publish a paper. So in that sense, I will be consistence. See also Rosetta@home. Is this sufficient? Jesse V. (talk) 06:10, 21 May 2012 (UTC)
- Sorry, what I meant was sometimes in the article F@h is treated as an entity that does things by itself and sometimes as an object which is used by others. Consistency would be better.JMiall₰ 15:24, 19 May 2012 (UTC)
- I'm confused as to what you're saying here. I did replace "they" with "MSMs" for clarity, and I think the tense is okay. Please clarify. Jesse V. (talk) 02:54, 16 May 2012 (UTC)
- 'including... among others' - more tautology
- 'Cellular infection by viruses such as HIV and influenza also involve folding events within cellular membranes,[29] and computer-assisted drug design has the possibility to expedite drug discovery' - why 'also'? drugs to do what? the part after ref 29 could be generally true and reads rather like it has been bolted on to the 1st half.
JMiall₰ 00:23, 16 May 2012 (UTC)
Folding@home review part 2 from JMiall' (based on a printout from a couple of days ago)
- the article talks a lot about folding but do any proteins get created in a folded state, have to unfold and then fold into a new shape? If so does F@h simulate this?
- It mostly focuses on folding and misfolding. A protein's native state is a pretty big energy minima, maybe even the minimum, so once it gets there AFAIK it won't spontenously fold into something else. There's a YouTube video I've seen where Dr. Pande draws an analogy between protein folding and parking a car. The protein does something, realizes that its now its now in an unworkable state, and backs up and tries something a bit different. I'll research whether F@h studies protein refolding and things like that. Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- Why can't the F@h methodology be used on supercomputers? AFAIK many of these use a very parallel architecture that would seem well suited to the problem so 'strong scaling of molecular simulations to these architectures is exceptionally difficult' seems an odd comment.
- It's true that supercomputers have thousands of processors which aim to do the work in parallel. These processors are often connected by a high-speed bus. From what I've read, its difficult to get straightforward simulations to use all available processors, and the bus is heavily used. F@h's MSM methodology wouldn't need this bus much at all, since the work is very parallelizable. But the supercomputer's processors may be slower than regular consumer-grade processors, and as the article states supercomputers are very expensive to run and shared by many research groups. Anyway, I've added "traditional" before "molecular simulations" so that it's more clear. Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- You use the 'While statement x, statement y' construction a lot. For a bit of variety and simplifying the English many of these could be rewritten as 'Statement x but statement' or 'Even though statement x, statement y' etc.
- 'Once it is understood how a protein misfolds, therapeutic intervention can be the next step, which can use engineered molecules to:' - wouldn't engineering the molecules be the next step and therapeutic intervention be years down the line? I'm not sure the colon is needed either.
- 'Folding@home is dedicated to producing significant amounts of results towards protein folding' - rewrite
- 'In addition to the diseases listed below, Folding@home also studies' - there's a big list of diseases above at the start of the section. Why do we need another list here?
- The first list was about diseases that followed protein misfolding, the other list indicated other diseases that F@h studied. Since there was a great deal of overlap and malaria and Chagas Disease appear to be small pilot projects, I've removed the list entirely. Jesse V. (talk) 06:47, 22 May 2012 (UTC)
- 'As a part of Stanford University, a non-profit organization, the Pande lab does not sell' - this doesn't follow, non-profits can still sell things
- 'upon request, while some' -> 'upon request and some'
- 'so that the algorithms which benefited Folding@home will also aid other scientific areas' - will? Just sharing the algorithms doesn't ensure they will be useful.
- 'For example, in 2011 they released the open-source Copernicus software, which uses techniques developed on Folding@home to significantly improve the efficiency and scaling of molecular simulations on large clusters or supercomputers' - this contradicts earlier in the article. See previous point.
- 'The full publications are available online from a local municipal or academic library' - so if I go to my local library I'll have access?
- 'In excessive concentrations of misfolded Aβ, protein oligomers begin to form which in turn continues Aβ misfolding, while the oligomer itself slowly grows.' - how about 'High concentrations of misfolded Aβ cause protein oligomer growth. These oligomers also contribute to Aβ misfolding.' ?
- 'timescales in order of tens of seconds' - of the order of
- 'This was much longer than previously performed and significant as previous simulations had been limited to several hundreds of microseconds: six orders of magnitude short of experimentally relevant timescales' - this needs a rewrite but aren't you also saying then that tens of seconds is still not relevant?
- 'specific small molecules' - which ones? how small?
- 'as well as preparing them for future' - them = the small molecules?
- 'This is a promising approach' - is regarded as a?
- 'several small drug candidates to fight Alzheimer's Disease which' - get rid of 'to fight Alzheimer's Disease'
- 'which could aid in the development of therapeutic drug approaches to the disease' - this kind of thing has been said a lot recently in the article
- 'and although its behavior is not completely understood, it does lead to' - what are the it and its referring to? the aggregate, the excessive repeats, the excessive repeating process, the glutamine?
- link 'rational drug design'
- 'toxic aggregate formation' - mention toxicity earlier, all we've been told previously about the aggregate is that it leads (with unknown number of intermediate steps) to cognitive decline.
- I moved the word "toxic" to the Alzheimer's section, when I first mention aggregation. It's complicated stuff, but I'll look more into the "unknown intermediate steps" between aggregation and cognitive decline. I think I've already given a sufficient amount of general information about these steps in the Alzheimer's section, but it's a good idea to give more details. Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- 'the root causes of cancer' - will it tell us anything about the large numbers of cancers not associated with p53?
- 'to other p53-related diseases.' - link this?
- To what? Closest thing I found was P53#Role in disease. Jesse V. (talk) 06:47, 22 May 2012 (UTC)
- 'a form of IL-2 which is three hundred times more effective in its immune system role but carries fewer side effects' - this seems odd. if this new version is so much better why hasn't it evolved previously?
- 'which fulfills a variety of structural roles and is the most abundant protein in mammals' - swap the order of the 2 statements
- 'This complexity and timescale make' - makes
- 'hemagglutinin' links to influenza verson but then talks about HIV. Is this correct?
- Now just links to hemagglutinin. Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- Actually, I changed it back. The link was correct, it's just that the statement had some factual errors. I've now addressed them with a rewrite. I no longer talk about HIV specifically, but instead viruses in general. Jesse V. (talk) 06:19, 24 May 2012 (UTC)
- 'Researchers have also used Folding@home to study prime', 'Folding@home also took', 'The Pande lab has also used' - also, also, also.
- How did F@h do in SAMPL's blind experiment?
- I had a feeling someone was going to ask! :) In that particular instance, not too well unfortunately, since according to this statement by a scientist, they didn't get as much computations done as they would have liked due to technical difficulties, so they weren't confident that their predictions would be accurate. Their choice of a prediction algorithm apparently needed lots of computing power and the difficulties prevented them from meeting their expectations in that instance. Do you suppose ts that explanation worthy of mention? Jesse V. (talk) 06:26, 20 May 2012 (UTC)
- 'This large and powerful network allows Folding@home to do work not possible any other way, though the Pande lab has collaborated with other molecular dynamics systems such as the Blue Gene supercomputer.' - how does the 2nd half follow from the 1st?
- 'Folding@home gained popularity early in its history.' - this rather follows from the steady growth statement earlier
- I'm unconvinced by the 'PetaFLOPS milestones' section. Could the whole thing be better expressed as a graph of PetaFLOPS against time for F@h & the fastest supercomputer?
- Hmm. I suppose that's possible, though there are some important details in the paragraph that would be lost if the whole thing was a graph. I found Template:Line chart, do you think this would be good? Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- 'Similar to other distributed' - similarly
- Donors / users - is there any difference?
- No. Rosetta@home does not use "donors" but instead uses "users". I suppose "participants" also works. Changes made. Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- 'which are exceptionally demanding on a system' - so do users get more points for processing units that their system finds demanding?
- 'A user can start their own team, or they can join an existing team,[3] but existing points cannot be transferred to a new team or username.' - is this important?
- 'However, regardless of username or team affiliation, all contributions go to the same place and have the same scientific value' - you've just said that they don't all have the same scientific value
- 'and third parties may also offer additional statistics on their own site' - is this important? May?
- 'on the user's end' - at
- 'A Work Unit (WU)' - this is not the 1st use of work unit and the abbreviation is not used again for a while
- 'Once completed, the results are returned' - there's no mention of the simulation so this makes it sound like results are returned as soon as the download has completed
- 'before a final full release across all of FAH' - why abbreviate here when Folding@home has been written out in full so many times previously? There's more of these in the 'Client' section.
- 'Topics in the Folding@home forum can be used to differentiate between problematic hardware and an actual bad Work Unit' - Which topics? How can the first person to have the problem do this?
- I removed "Topics in". If a person encounters an error which has not yet been seen before, they can go to the forum, post about their problem, ask for help, and admins and regular volunteers can help narrow down the problem. The Pande lab will be reported, and meanwhile the person will mostly likely be told to wait and see if several more WUs fail on their hardware. If they do, it's bad hardware. If not, that was a bad WU. Jesse V. (talk) 07:19, 23 May 2012 (UTC)
- 'one of the fastest and most popular molecular dynamics software packages available' - we've just been told that it is a molecular dynamics program.
- 'Each client is the software with which the user interacts, and manages the other software components behind the scenes' - is the user or the client doing the managing?
- 'the diversity and power of each hardware architecture provides Folding@home with the ability to efficiently complete many different types of simulations in a timely manner, (in a few weeks or months rather than years) which is of significant scientific value. Together, these clients allow Folding@home to address biomedical questions previously considered impossible to tackle computationally' - this type of thing has been said previously and probably belongs in another section.
- '(over port 8080, with 80 as an alternative)' - is this important?
- Rosetta@home is the only article about a distributed computing project which has achieved GA status (its now FA). It mentions it. Jesse V. (talk) 06:10, 21 May 2012 (UTC)
- 'It will upload' - what is it?
- EULA - this abbreviation is never used again
- 'Folding@home's first client was a screensaver...' - can this be said earlier in the article?
- 'Folding@home also utilizes GPUs for distributed computing' - this has just been said in the previous section
- 'as of February 2012 GPU clients account for' - as it is no longer Feb this could be 'In February 2012 GPU clients accounted for'
- 'Small memory soft errors' - what are these?
- 'over PCs, power which could not' - processing power
- 'On March 23, 2007, the PS3 client was first released as a standalone application to the PlayStation 3, developed in a collaborative effort between Sony and the Pande lab' - I'd prefer 'A PS3 client that was developed in a collaborative effort between Sony and the Pande lab was released in March 2007.'
- 'as well as making Folding@home user friendly' - so is F@h not user friendly on other platforms?
- 'The PS3 also has the ability to...' - put this earlier in section
- 'these CPU cores complete single WUs over 4x faster than the standard uniprocessor client' - even on a machine with only 2 cores? or do you mean that the multi-core client runs four times faster on the same machine than the single core client does?
- 'some of the publications from Folding@home would not have been possible without this computing power' - why? wouldn't it have just taken them longer?
- 'Although it does not natively support the Linux operating system' & 'Although the clients performed well in Unix-based operating systems such as Linux and Mac's OS-X' - I had assumed that the 'it' refered to F@h.
- 'Despite these difficulties, SMP1 generated significant results that would have been impossible otherwise' - something very similar to this was said in the 1st paragraph of the section
- 'The SMP2 client also supported a bonus points system, which non-linearly rewards additional points' - is this different to the points system mentioned earlier?
- 'originally required minimum of'
- 'which that had'
- 'Comparison to other molecular systems' - why is it only compared to 2 systems?
- 'some Folding@home's projects'
- 'It is probable that a combination of Anton's and FAH's simulation methods would provide both a well-sampled simulation and completely cover the protein's phase space' - so at present F@h does not completely cover the protein's phase space? JMiall₰ 15:08, 19 May 2012 (UTC)