Talk:Research data archiving/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1

Essay

Hi Ron,

I've put a few {{fact}} and {{cn}} tags into the introduction. I've used {{fact}} where I think you are plain wrong, {{cn}} where I think you need a citation because the statement is disputable (but I don't strongly disagree). All in all, this reads like a personal essay, not even WP:OR, but just your opinion at the moment.--Stephan Schulz 12:07, 14 March 2007 (UTC)

Hi Stephan, I have provided the citations you requested. I am afraid I have not been too consistent with the conventions. When citing books, it seems best to use references. But when citing online links, it seems easiest to use standard links. I am certain that is not proper Wiki formatting and needs to be changed but it will have to happen later. I did feel odd in some places because I was providing links to policies where the appropriate quote was being quoted just below. I have also added some appropriate literature citations.RonCram 14:44, 14 March 2007 (UTC)
Stephan, you made some good edits, corrected some of my mistakes and improved the article. Thank you. You also deleted some information that carries crucial concepts. In one case, it appears you think Falsifiability applies only to theories. The Wikipedia article applies it mainly to theories but it also applies to experiments. The wiki article on Reproducibility shows this. I changed one reference from testability to Reproducibility because of your comments. They are similar concepts but the new statement is better and more accurate. You also added the word "some" in the second sentence, hinting that some journals and funding agencies do not have archiving requirements. I have retained the word but would like to know if you can name a journal that does not have such a requirement. If not, I will remove it. Thank you for contributing.RonCram 01:23, 15 March 2007 (UTC)
See my overlapping edit below for some answers. For journals without a data archiving policy:
  • The Journal of Symbolic Computation, a very respected journal dealing with deduction, computer algebra, and related fields has no such policy that I could find or have been made aware of as an author or guest editor.
  • The Journal of Automated reasoning, probably the foremost journal in the field, has no such policy that I could find or have been made aware of as an author or guest editor.
  • The SIAM Journal on Applied Mathematics has no such policy that I could find.
I could go on and on. As far as I can tell, the paper itself should be detailed enough to allow reproduction. --Stephan Schulz 01:41, 15 March 2007 (UTC)
As far as I know, Science and Nature are the only journals with an explicit data policy. Offhand I can't think of any journal in the earth sciences that has such a policy, but would be interested if someone has an example. Raymond Arritt 01:44, 15 March 2007 (UTC)
Raymond, have you heard of AGU? [13] Stephan, regarding your math journals, I am not certain what datasets might be archived if they had a policy. But I will look into it.RonCram 02:04, 15 March 2007 (UTC)
Large data sets in automated reasoning are e.g. problem sets given to theorem provers. The TPTP is a standard library of those, but hosted by a single researcher. Other problem sets may not be available at all, especially large ones from e.g. the verification of properties of a proprietary microprocessor. Similarly, I produce significant amounts of data running various theorem proving strategies on large test sets. I regularly clear out the collected data for lack of space and organization. Whoever wants to audit my work will have to regenerate them (which will be non-trivial, as e.g. computer systems are eliminated or updated, and while I archive all versions of my prover, operating systems and compilers evolve, and not all old code will work on new machines. The first prover I worked with (from 1992 to maybe 1996 or so), DISCOUNT, was developed for SunOS4.0.X, and will not even compile on any modern operating system. Back then, I had 20 MB of disk space available - barely enough to keep one large proof protocol around while working on it. There was no chance of holding hundreds). This does not make the work irreproducible, it just means that it takes a lot of work to reproduce it - possibly as much as it took to do the research in the first place). --Stephan Schulz 22:29, 15 March 2007 (UTC)
Interesting. I'd be surprised if many authors were aware of these requirements! I also wonder whether such requirements are practical these days, when it's not unusual for data sets to run in the tens to hundreds of terabytes. Raymond Arritt 02:17, 15 March 2007 (UTC)

Raymond, authors do not have any excuse for not knowing the requirements. They are clearly posted online and were first adopted in 1993. RonCram 06:00, 15 March 2007 (UTC)

Unwarranted Tag

Raymond has placed a tag at the top of the article claiming that it may contain original research or unattributed claims. He has not made any claim of this on the Talk page. I will leave the tag for the time being to give him the opportunity to explain what information he would like attributed or what he believes is original research. The only OR I am aware was entered by William Connelly. I allowed it to stand because I believe it was generally true. However, I would prefer to remove it rather than have a tag. For now, I will remove that section by William to see if Raymond has any other claims of OR.RonCram 22:25, 23 March 2007 (UTC)

Doesn't sound like a very good idea William M. Connolley 22:29, 23 March 2007 (UTC)

Popper

Hi Ron. I think you are confusing a number of things here. Poppers criteria for a scientific theory is conceptual falsifiability. That is, the theory must make predictions that are, in principle, testable, and be rejected (or at least refined) if it fails these tests. This is something very different from "falsifying" any particular result by showing errors in how it was derived. Popper does not require any archiving of data or methods. If the research could be done in the first place, it can be done again. In the case of Mann et al that we are familiar with, the major prediction is that global temperature reconstructions will agree (within error bounds) with the reconstruction of Mann et al. To conceptually falsify this, you don't need their data or methods. You just do another reconstruction (as several reserachers have done) and check if it agrees (as far as I know, nearly all do, within the claimed error bounds).

Again, the core requirement by Popper is falsifiability of the theory, not of any particular experiment. This is something very different from auditing, which indeed is concerned with a concrete piece of research (or other work).--Stephan Schulz 01:29, 15 March 2007 (UTC)

I agree that Popper discussed falsifying mainly in the context of a theory. But I also believe that Popper would say one way to falsify a theory is to falsify fraudulent or sloppy research published to support the theory. I will remove the Popper comment for the time being.
You are wrong. You can discredit a theory by showing that it's supporting evidence is wrong. But to falsify it, you need to show that its predications are wrong. If someone claims A, and A implies B, therefore B, neither showing A false nor showing the implicaton wrong will falsify B. It may well make us suspcious, of course. --Stephan Schulz 22:35, 15 March 2007 (UTC)
Regarding Mann and the Hockey Stick, you are very much in error. Almost all of the pre-Mann reconstructions show a very warm and global Medieval Warm Period and a very cold Little Ice Age.
Do you have any cite for a pre-MBH98 quantitative reconstruction of the global temperature record for the last millenium? As far as I know, the IPCC did not pick this paper (and Mann) "just so", but because it was groundbreaking.--Stephan Schulz 22:35, 15 March 2007 (UTC)
Since Mann, a number of reconstructions have been published with similar results. However, all of them have the same problems as Mann's studies - they rely on proxies that are unreliable like bristlecone pine series. They do this even though the National Academy of Sciences says bristlecones should not be used in reconstructions. In addition, they do not archive their data so their work can be fully audited. The concerted effort to "get rid of the WMP" is a little too transparent. If they really want to convince skeptics of global warming, their actions have to be above reproach. In my opinion, the Hockey Team is acting very guilty by not archiving their data - especially after archiving has been such a hot topic since 2003. RonCram 05:01, 15 March 2007 (UTC)
I guess we must agree to disagree here. Also, who is "the hockey team"? Wahl and Amman, e.g., have all data and code online. --Stephan Schulz 22:35, 15 March 2007 (UTC)
Stephan, if that is true, it is to their credit. RonCram 23:01, 23 March 2007 (UTC)
See [14] --Stephan Schulz 23:04, 23 March 2007 (UTC)

Pseudoscience

I cut the following:

When scientists withhold data from other researchers, it is known as pseudoscience because the research does not meet the requirement of reproducibility. [1]

I see two problems with this. First, the phrasing implies that the definition of pseudoscience is "withholding data", which I suspect we will all agree is not what was intended. Second, and more importantly, withholding data does not imply that the work in question is pseudoscience. In fact, the work can still be reproducible: as Stephan said above, "If the research could be done in the first place, it can be done again." Withholding data does make it harder to reproduce some of the work, but it says nothing about the possibility of doing so. In fact, reproducing the results directly from the original source of the data---rather than taking the easy way out and using the existing data sets---is more independent and therefore a stronger validation of the results. --Nethgirb 04:18, 21 March 2007 (UTC)

Nethgirb, I disagree. Your first complaint is nonsensical. "Withholding data" is defined as pseudoscience, not the other way round. Presenting fraudulent data is also pseudoscience. Regarding your second point, when we talk about withholding data and methods, we are talking about pseudoscience. Stephan's comment does not apply because it is not the question of gathering the data alone but a question of how people prepared the data for analysis, what methods were used, what was known about the history of the data collected, the source code used, the statistical analysis used, etc. Another researcher cannot come along afterwards and reproduce all of these things. Stephan works in mathematics where, evidently, no information is archived because all of the necesarry info is available in the article. That is not true with climate science or many other fields of science such as drug testing. Withholding data is pseudoscience because it does not meet the requirements of the NSF or the journals. If you are afraid to have your data and methods scrutinized, then you certainly look guilty of something. And you are. It is called pseudoscience. RonCram 13:53, 21 March 2007 (UTC)
Nethgirb, I found an interesting quote from a textbook chapter titled "Evidence-based practice and pseudoscience" for you:
Publically Verifiable Knowledge
The second principle involves the public nature of scientific knowledge. Knowledge gathered empirically does not exist solely in the mind of the scientist. In fact, it does not exist at all until the person disseminates it to the scientific community for critique, testing, and replicating of results. Knowledge or findings limited to one person or group and not verified can never have the status of scientific knowledge (Dawes, 2001). The person or group must present such findings to the scientific community in a way that others can achieve the same results. This process ensures that a particular finding is not the result of bias or error. [15]RonCram 14:07, 21 March 2007 (UTC)

You seem to me badly confused in multiple ways, Ron. For example, allow me to point you to your own reference [16] in Table 8.2 -- you appear to be ignoring secondary studies of Types I-III which do not require secondary data analysis. But I will not write a full rebuttal since regardless of what the right answer is, this whole thing is looking more and more like OR. The ref you have to back up the sentence in question does not even mention data withholding (please point that out if I missed it). And even if it did, the assemblage of statements in this article probably still constitutes OR). --Nethgirb 18:44, 21 March 2007 (UTC)

Nethgirb, I am not at all confused regarding the points I made. However, I am unclear on the point you are attempting to make regarding Table 8.2. You seem to think I am claiming that all studies need to have data archived. I do not agree. If a particular type of study can be verified without archived data, then no data needs to be archived. The point made in the article is not OR. If you wish, we can certainly change the wording so that it fits the reference. Instead of calling it "data withholding," we can call it "refusing to provide data and findings so others can achieve the same results." But isn't that really the same thing? Of course it is. On second thought, I recant. There is no reason to change the wording.RonCram 01:11, 22 March 2007 (UTC)
You're right, I did think that you were claiming that all studies need their data archived. The reason I thought so is because that's what your sentence said. It stated clearly that withholding data makes the study qualify as pseudoscience. The point is that nearly every study can be reproduced without the archived data, albeit perhaps with some difficulty---the data came from somewhere and it can be obtained from there again.
Now you have changed it to "withholding data and methods"... I ask again, regardless of our discussion of the correctness of your statement, can you please find a specific quote from a reliable source on this? and what relevance does withholding methods have in an article about scientific data archiving? It seems awfully like you're trying to link together these topics with tenuously associated threads, from global warming to data withholding to withholding methods to pseudoscience. You've created this article in order to make that association; none of the steps in that chain are well supported, and even if they were, your combination of them would constitute OR. --Nethgirb 04:43, 22 March 2007 (UTC)
Oh my gosh, Nethgirb. Did you happen to read the policies of the NSF and journals that are in the article? They clearly state that data and methods are to be archived, including source code. I noticed that I left the words "and methods" out of one sentence, even though the words are in other places in the article, and corrected it. Now you are trying to pretend that I am changing the argument to hold it together? That's ridiculous. I am trying to clarify it for you. Stephan pointed out that in his field most journals do not require archiving because all of the data and methods required to reproduce the study is made available in the article itself. Fair enough. But that state of affairs does not obtain in earth sciences or drug development research or probably most fields of research. I have provided for you a reliable source that researchers are required to provide all the information needed to reproduce the study. If that is not done, the work cannot be called science. It is pretty clear. Your attempt to obfuscate the obvious will not work. RonCram 05:10, 22 March 2007 (UTC)
I'll ask yet again: do you have a reliable source saying "withholding data and methods" constitutes pseudoscience? --Nethgirb 05:25, 22 March 2007 (UTC)
(Hint: the answer is "no.") Raymond Arritt 05:26, 22 March 2007 (UTC)

Nethgirb, I have already provided you one. The chapter I quoted is on separating science from pseudoscience. The quote states that researchers either provide all the information (data and methods) necessary to reproduce the study or it cannot be called science. If you read the Pseudoscience article, you will find other sources that say the same thing. There really is no basis for your complaint here.RonCram 05:53, 22 March 2007 (UTC)

I ask yet again: Please provide here the actual quote. --Nethgirb 06:16, 22 March 2007 (UTC)
Just to clairfy this for my understanding. Text was cut because the text did not quote exactly the source. Interesting standard and I cannot find that in the WP policies or guidelines. -- Tony of Race to the Right 16:42, 22 March 2007 (UTC)
Look harder. The text didn't quote any source which supported the statement. --Nethgirb 23:15, 22 March 2007 (UTC)
(And in case it wasn't obvious, the quote you provided above titled "Publically Verifiable Knowledge" only implicitly refers to pseudoscience, and never mentions public dataset availability.) --Nethgirb 08:03, 22 March 2007 (UTC)
Nethgirb, the citation I provided did not address archiving per se but the withholding of data and methods. If a scientist archives data and methods but his data archived was not adequate for a researcher to reproduce his findings, then the researcher will ask for additional information. If the author refuses that request, he is seen to be practicing Pseudoscience. It is possible to archive some info but still be guilty of withholding data. Do you happen to read the Pseudoscience article? RonCram 14:55, 22 March 2007 (UTC)

Its pretty clear to me that Ron is indulging in OR. Firstly, the PS article talks about fields - not individual papers. Secondly, it talks about presenting findings - not about basic raw data. As Ron himself admits, nothing there addresses data archiving. Because... reproducibility doesn't require the original data, in general William M. Connolley 15:34, 22 March 2007 (UTC)

There is no place in an article on Scientific data archiving for pseudoscience. It doesn't belong there. That is clear on the face of it. This is an attempt to redefine pseudoscience from inside an article on data archiving - nonsense! The heart of any pseudoscience is a kind of fraud, whether intended by a particular user or not, a fraud that consists of dressing up NON-SCIENCE in phony science language to get it accepted as if were science. It is NOT about a failure to follow what may or may not be a good practices in what otherwise might be a good experiment. People can have perfectly legitimate disagreements over what should or shouldn't be archived. Using the word pseudoscience in this way just destroys the integrity of the word. Steve 20:45, 22 March 2007 (UTC)
Pseudoscience is discussed because it is a proper description of data withholding - the opposite of data archiving. If you had taken the time to read the article on Pseudoscience or the citations given here, you would understand that data withholding is one of the methods used to prevent research from being reproduced. RonCram 23:06, 23 March 2007 (UTC)
If you had read my message you would have seen that we are talking about bad science or fraud but not pseudoscience. I have read the article, and withholding data doesn't make something pseudoscience. You are misusing the term. Steve 00:42, 24 March 2007 (UTC)
The problem is that you go from "X (somtimes) has feature Y" to "Everything with feature Y is an X". AstronomyAstrology is (at best) a pseudoscience, no matter how many do-it-yourself manuals get published. Nuclear physics is a real science, no matter how much information the DoD keeps secret.--Stephan Schulz 23:13, 23 March 2007 (UTC)
I sure hope you meant astrology there... Raymond Arritt 01:12, 24 March 2007 (UTC)
I sure hope that an astronomer friend of mine never reads this. Thanks for pointing it out... --Stephan Schulz 06:38, 24 March 2007 (UTC)

Stephan, I do not think I make that mistake at all. You also seem to be confused by the fact the term "pseudoscience" is often used to describe a field like astrology. But the fact is that is applies to individual papers as well. Did you read the quote from the textbook? Almost every textbook on the scientific method will have the same definition. If a researcher withholds data, you cannot call his work "bad science" because it cannot be called "science" at all. It is Pseudoscience. Your point about nuclear physics is interesting, but I do not think it disproves the definition. The fact is the number of people who are given access to the research is much smaller, but the info is shared and tested. RonCram 01:25, 24 March 2007 (UTC)

SteveWolfer, I did read your message. If you had bothered to read the Talk page you would understand why you are wrong. I just left a similar note on the Talk page of Pseudoscience, but will explain it again here. You cannot call data withholding "bad science" because it cannot be called "science" at all. Here is a textbook quote again:
Publically Verifiable Knowledge
The second principle involves the public nature of scientific knowledge. Knowledge gathered empirically does not exist solely in the mind of the scientist. In fact, it does not exist at all until the person disseminates it to the scientific community for critique, testing, and replicating of results. Knowledge or findings limited to one person or group and not verified can never have the status of scientific knowledge (Dawes, 2001). The person or group must present such findings to the scientific community in a way that others can achieve the same results. This process ensures that a particular finding is not the result of bias or error. [17]
This definition can be found in textbooks again and again. If you read the Pseudoscience article under the section "Identifying pseudoscience," you will find there a subsection - "Lack of openness to testing by other experts." The textbook by Gauch is very clear on why papers cannot be called science if they do not provide all the information necessary to reproduce the research. One of the mistakes you appear to make is thinking that Pseudoscience can only refer to a "field" of science. That is not true as the definition of Pseudoscience makes clear in the article (the very first sentences). I hope this clears up the matter for you.RonCram 01:08, 24 March 2007 (UTC)

POV section

I tagged the "climate change research section" for POV concerns. My specific concerns are:

  • The statement that NSF violated its own policies. This is a serious accusation and must be supported by reliable external sources.
  • The contention that "the findings of McIntyre and McKitrick have been largely confirmed by these reviews." The only source provided is McIntyre's own web site -- it defies belief that this can be seriously proposed as an objective source.

If these concerns are addressed I'll gladly remove the tag. Raymond Arritt 02:30, 17 March 2007 (UTC)

Raymond, the fact the NSF violated its own policy is clear from the statement of its policy above and the evidence of the letter the NSF representative wrote in the link. However, if you are claiming this is OR - you may be correct. I think it is clear the Congressmen felt it was a violation of NSF policy but I cannot give a citation. I think I should remove the parenthetical thought for the time being. Regarding the statement M&M have largely been confirmed, I have already cited both the NAS report and Wegman report, which are the main sources McIntyre uses. The only reason I chose the M&M scorecard was for the brevity. Perhaps the article should list all three? RonCram 05:45, 17 March 2007 (UTC)
Raymond, I think I have a solution to your second issue. I will add the words "McIntyre and McKitrick claim..." This makes it clear to readers that the citation comes from their website and is completely NPOV. RonCram 12:50, 17 March 2007 (UTC)
"McIntyre and McKitrick claim..." is good. Both sides have claimed victory, so you'll need to include a counterclaim for balance. Raymond Arritt 13:11, 17 March 2007 (UTC)
Can you provide a link for Mann's claim of victory? RonCram 17:04, 17 March 2007 (UTC)
Raymond, for the time being I put in the point that Mann claimed the errors did not change his conclusions. That comes from the Corrigendum. I can cite that. I was wondering if there was a claim of victory and if so if you can cite it.RonCram 18:21, 18 March 2007 (UTC)
"Claimed victory" was a poor choice of words. Mann apparently isn't as concerned as McIntyre with winning and losing as an end in itself, so it's more a recognition in the scientific community that the NAS upheld most of Mann's scientific points. Raymond Arritt 18:37, 18 March 2007 (UTC)
The NAS agreed with McIntyre that bristlecone pine series is not reliable as a proxy and should not be used. Because of thta position, the NAS also said it was not possible to say it is warmer now than in the last 1000 years. They would only go 400 years. I am not sure what scientific points Mann and supporters are claiming to have won. Can you cite an online source so the article can accurately reflect Mann's view? RonCram 20:35, 18 March 2007 (UTC)

This section appears to be in for attack reasons and is totally unbalanced. I've removed it. It needs a proper re-write before it goes back, not just a POV tag. William M. Connolley 21:47, 22 March 2007 (UTC)

Please explain what specifically is attacking what and by whom? We should not have accusations that are vague or non-specific. -- Tony of Race to the Right 17:43, 23 March 2007 (UTC)
You just have to look at the section I deleted to see William M. Connolley 21:58, 23 March 2007 (UTC)
In other words, you really do not want to explain the deletion? I cannot believe that is the case. What quotes are you considering to be attacks? Who is the attacker? Against whom are the attacks? Please, so we can understand what the actual reason is for the deletion, explain to those of us not as intelligent as you what justified the deletion. -- Tony of Race to the Right 03:34, 25 March 2007 (UTC)
Just though I should point out the example you as an experienced editor are setting. It is never the responsibility of the editor making changes, deletions, etc to explain why, it is up to everyone else to find and figure it out for themselves why the editor is correct beyond discussion. Is that right? I mean, that is why you are placing the burden of justifying a statement like, "appears to be in for attack reasons and is totally unbalanced" on me to "look at the section" to know what you consider to be an attack, rather than you explaining why you chose to ignore the policy about deleting text (If you feel the edit is unsatisfactory, improve it rather than simply reverting or deleting it). -- Tony of Race to the Right 03:46, 25 March 2007 (UTC)

Lack of data archiving as "pseudoscience"

Bottom line is: there's a lack of equivalence between the two, so I removed the following from the lead.

  • When scientists withhold data and methods from other researchers, it is known as pseudoscience because the research does not meet the requirement of reproducibility. [1] [End of removed material--check edit box for full cite material]

... Kenosis 13:57, 24 March 2007 (UTC)

Hi. My apologies, but my rv of Rons edit overlapped and removed yours [18]. I don't mind which version of the lead we use, though. The main difference is between "The journals often rely on the good intentions of scientists to supply any supplementary data that may be required." vs "These policies are generally not enforced"... I'm not sure which is better. Do you think these policies are enforced? William M. Connolley 14:02, 24 March 2007 (UTC)
Kenosis, equivalence is not the question. Equivalence would mean that Pseudoscience always withholds data. That is not true. Sometimes it supplies data but the data is fraudulent. Data withholding is just one example of a unscientific method which makes the resulting paper Pseudoscience. As Gauch writes in his textbook, there can be "no wizards behind the curtain." When researchers refuse to provide data, they are saying "trust me." That is not science. If you can make the sentence better, please edit it to your liking so we can discuss it. But the fact data withholding is one example of Pseudoscience is not really in question.RonCram 14:10, 24 March 2007 (UTC)
RE RonCram's question/statement on my talk page: :Data withholding is one thing; failure to archive all data points is another. The former is an indicator that may contribute to a judgment of a particular enterprise as being pseudoscience; the latter is not necessarily such an indicator if the operational definitions and summary statistics are intact in such a way that the relevant experiment or study can be replicated. ... Kenosis 03:30, 25 March 2007 (UTC)

William, your point is OR. It may be true but OR is not the policy of Wikipedia. If you can find a way to cite the point, you are free to add it.RonCram 14:12, 24 March 2007 (UTC)

Kenosis, thank you. From your comment, I can see that you grasp the point. I now think that perhaps the Intro needs to be clarified so more people understand it. With this encouragement, I will try again.RonCram 12:15, 25 March 2007 (UTC)
There goes my irony meter again... Raymond Arritt 00:57, 25 March 2007 (UTC)
Now Raymond, I have provided citations at every step. William has not. I think your comment is ironic.RonCram 05:26, 25 March 2007 (UTC)
Non sequitur. Using citations to construct your own argument is classic OR; you need the cites to do the OR. Raymond Arritt 06:32, 25 March 2007 (UTC)
Ron, it appears your point is OR. As mentioned above, researchers are free to collect, analyze, and prove/disprove the findings of other scientists, regardless of the archiving undertaken or deferred. A point like this does not even belong in SDA anyway. Skyemoor 01:03, 25 March 2007 (UTC)
Raymond and Skyemoor, I am not constructing anything. The refusal to supply data and methods so your research can be replicated is pseudoscience. It is plain and simple and all the textbooks agree. Raymond, you still haven't explained why William is allowed to add an OR point without citation.RonCram 12:15, 25 March 2007 (UTC)

Page protection

I have locked the page to give everyone some time for discussion. Tom Harrison Talk 13:33, 25 March 2007 (UTC)


Article Direction

It seems that this article focuses little on actual scientific data archiving and more on one particular situation that is only vaguely connected to SDA. From the start, it appears the article title should be "Rights to Access NSF Data". On top of that, the text I removed to discuss here has inaccuracies in it (having just re-read the Academy's paper again to confirm). So either a stronger case needs to be made for some shortened version of this text that stays on target of the article title, or the article title should be changed. Skyemoor 01:32, 25 March 2007 (UTC)

Let's invoke WP:SPADE: Ron started the article as a platform to promote Steve McIntyre's contempt of climate science. He copied and pasted some agency boilerplate to give the article a thin veneer of objectivity. The topic of scientific data archiving is most certainly worth an article, but the present article doesn't remotely fill the need.
Sometimes, when a building is in a bad state of disrepair the most logical course is to demolish it and build something more suitable to take its place. I think that's the best thing to do with this article. Raymond Arritt 01:40, 25 March 2007 (UTC)
What I've been missing are the real issues with data archiving: Paper records have limited lifetime, are hard to search and distribute, and take up massive amounts of real estate. Digital information solves some of these problems, but brings it's own, namely incompatible data formats, changing hardware, disintegrating media, and so on. I dimly recall a report from a few years back stating that NASA had effectively lost access to quite a lot of irreplacable data recordings from interplanetary spacecraft due to bit rot and computer system change. These topics should make up the main contents of this article. What Ron tries to do looks like it might be part of an article on scientific data sharing or accessibility of scientific data. --Stephan Schulz 01:49, 25 March 2007 (UTC)
(ec) Exactly! For example, I'm involved in a huge project that will involve archiving many terabytes of data from many different models. We're wrestling with problems like standardization of data formats, metadata, organizing data for efficient retrieval and so on. Those data archival issues arise in lots of other projects. But the present article reads like it's on a completely different topic. Raymond Arritt 01:59, 25 March 2007 (UTC)
When an article's content becomes a medium for pushing an agenda, concepts become twisted and tainted and corrupted as people get involved in a contest of weasel wording to win partisan points. When that becomes the case it is best that the article be deleted. Losing an article on data archiving is of no consequence as compared to watching language's integrity and intellectual honesty get damaged. Steve 01:57, 25 March 2007 (UTC)

I agree with the above. In deciding whether improvement or deletion is the best course of action, we should see what useful content there is. In the "Academic genetics" section, the article by EG Campbell [19] on data withholding is the kind of thing that might be useful, but it's not really relevant to an article on data archiving. The list of data archives is probably useful. In my opinion, this accounts for everything that is likely to be useful. So the plan I would propose is:

  1. Merge the Cambell reference to a more appropriate article, possibly Scientific method
  2. Merge the list of data archives into Data library (which itself is not a great article, it seems...)
  3. Delete this article

--Nethgirb 03:10, 25 March 2007 (UTC)

P.S. could also do

2. Merge the list of data archives into Repository

--Nethgirb 03:17, 25 March 2007 (UTC)

Since you are invoking WP:SPADE (like it is some kind of spell or something) and discussing merging 'worthless' articles whose sole purpose if for some kind of POV advancement, it is proper to discuss the merging of many of the Global Warming articles as many of them are POV-forks to isolate views that some want hidden and maintain seperation to allow double-standards in editing. With few exceptions the entire subject is being done a disservice due to the myopic insistance that only their views should be heard. -- Tony of Race to the Right 03:53, 25 March 2007 (UTC)
I happen to agree that much of the Global Warming material is total crap. But that is a separate issue and doesn't justify trashing this article for that end. Science needs to be honest and direct. Steve 04:25, 25 March 2007 (UTC)

The value of this article can be readily seen from this Talk page. As you read through, you will notice that Stephan Schultz, Raymond Arritt and William Connelly (all of whom have published in the scientific literature) did not know about the NSF and journal policies on data archiving. It can be assumed that if they did not know, neither will most readers (or even editors) of Wikipedia. Yet this is an important issue and helps to explain the importance of several other articles listed on National Archive of Computerized Data on Aging, National Climatic Data Center, National Snow and Ice Data Center, National Oceanographic Data Center, ESO/ST-ECF Science Archive Facility, CISL Research Data Archive andWorld Data Center. It seems these editors are mostly concerned that the article discusses the opposite of data archiving, data withholding. Data withholding is not possible if all of the supplemental information and source code is properly archived. These editors do not seem to be upset by two of the examples of data withholding, but appear to be very upset the article discusses data withholding by Michael Mann. This was an important issue that made the audit by McIntyre and McKitrick much more difficult to complete. Mann did not turn over his source code until Congress got involved. To claim this was not a notable event in science is ridiculous. These same editors also are upset by the article describes data withholding in the same manner as textbooks on the scientific method - as Pseudoscience. But that cannot be helped. Withholding data cannot be called "bad science" because it cannot be called "science" at all. Why should Wikipedia readers not be allowed access to that information? RonCram 05:06, 25 March 2007 (UTC)

If I may reiterate the point I made in response to RonCram's statement on my talk page: Data withholding is one thing; failure to archive all data points is another. The former is an indicator, one of may possible indicators, that may contribute to a judgment of a particular enterprise as being pseudoscience. The latter is not necessarily such an indicator if the operational definitions and summary statistics are intact in such a way that the relevant experiment or study can be replicated. ... Kenosis 09:57, 25 March 2007 (UTC)
You said, "Withholding data cannot be called 'bad science' because it cannot be called 'science' at all." That is fine. If the data that has been withheld keeps the research from being of value then it isn't science. But it doesn't make it religion. It doesn't make it a Thanksgiving dinner recipe. There are lots of things it doesn't make it - including pseudoscience. It doesn't matter that some text book made that mistake. You should let go of the pseudoscience thing and you would find more supporters. Is it your position that a person cannot do research with a proprietary piece of code and still publish a result that they believe would be helpful? Would that also be pseudoscience by your definition? Steve 05:47, 25 March 2007 (UTC)
Steve, if the research cannot be replicated, then it is not science. But if it is published in a scientific journal, it is nonscience pretending to be science, then it is pseudoscience. The researcher has become the wizard behind the curtain, expecting people to trust in his superior intellect and morality to believe that he would never (intentionally or unintentionally) make a mistake that would invalidate his research. Lack of archiving is not pseudoscience and the article never said that it was. Refusing to provide data is pseudoscience.RonCram 12:25, 25 March 2007 (UTC)
FYI, Steve, Ron did not actually cite a textbook that proved his point -- never has he supplied a reference stating that withholding data of a study makes that study pseudoscience. He did supply a reference from which he (incorrectly) extrapolates his point. But we have gone round and round on this discussion and I'm not sure how it could possibly come to a useful conclusion... --Nethgirb 06:13, 25 March 2007 (UTC)
Nethgirb, the textbook I cited was distinguishing between science and pseudoscience. It clearly said that refusing to supply data and methods was not science. There can only be one conclusion. In addition to that text, I also referred to the textbook by Gauch which is cited in the Pseudoscience article. It says "no wizards behind the curtain." When a researcher refuses to provide data, he is asking people to trust him. That is pseudoscience. The concept is present in any textbook that spends much time on scientific method. RonCram 12:29, 25 March 2007 (UTC)
"The value of this article can be readily seen from this Talk page." I can't for the life of me see how material on this talk page is an argument for anything except "Alleged Violations of NSF Data Access Privileges" --Skyemoor 12:13, 25 March 2007 (UTC)
Skyemoor, the value is in the fact that the article conveys important information about policies that people who work in the field did not know. That is pretty significant. Data is supposed to be archived. And in many cases, some data is archived. The World Data Center and the national data archives are not empty. As a way to explain how important the issue is and why the NSF and journals have adopted these policies, the articles also discusses what happens when a researcher refuses to provide data and methods so his work can be audited. RonCram 12:33, 25 March 2007 (UTC)
This sounds like a cross between a discussion of the merits of NSF policies and a handbook for the staff to implement them. I don't see the application to an encyclopedia any more than any other organization policy formulation discussion and employee handbook. Skyemoor 13:19, 25 March 2007 (UTC)
Skyemoor, the encyclopedia already has a number of related articles including National Archive of Computerized Data on Aging, National Climatic Data Center, National Snow and Ice Data Center, National Oceanographic Data Center, ESO/ST-ECF Science Archive Facility, CISL Research Data Archive and World Data Center. This article answers a number of questions: How does all the data get in these archives? Why is it done? Who puts it there? Who accesses the data and why? What might happen if the data is not archived? These are important and interesting questions to students and other Wikipedia readers.RonCram 13:33, 25 March 2007 (UTC)

RC added For example, mathematics journals do not have policies on data archiving. I'm curious - this has no citation to support it; and could only really be established by an extensive survey of maths journals. I'm sure its true, but I wonder if the usual OR-zealots would object to it? William M. Connolley 13:12, 25 March 2007 (UTC)

William, on this Talk page Stephan went through a number of math journals he researched and claimed they had no policy on data archiving. He also mentioned that he did not think a policy was needed as the article itself would have the data required. If you think my wording should be changed, feel free to do so. RonCram 13:27, 25 March 2007 (UTC)
Actually, two of the three are computer science journals, not math journals. And I claimed I did not find any such policy, and that none was suggested to me for the ones I edited. Anyways, the survey is of course very incomplete. I would leave it at "some" (e.g. ...) vs. "others" (e.g. ...). I don't think this is related to the field, and as far as I know, archiving requirements are, at the moment, very much the exception. --Stephan Schulz 13:39, 25 March 2007 (UTC)
Ron, you answer is besides the point. You obviously couldn't use Stephans survey as a source... the question is, do *you* think the wording you inserted is suitable? Do *you* think its well backed up by verifiable sources? Do *you* think it matters? You wrote the words, after all William M. Connolley 13:46, 25 March 2007 (UTC)
William, now I understand your strategy. You revert quickly that other editors do not get to read the latest version and do not understand what the conflict is about. Then you have the page frozen so no one can make edits. I think that is a bogus way to limit discussion. I am going to put the best versions of the sections you delete here so other editors can discuss them.RonCram 13:50, 25 March 2007 (UTC)

Is there a point to this?

I haven't studied the entire article: sorry, I really should do more than skim it.

But ... I have to wonder whether it was created to make a point or merely to describe what the standard practices are.

Editors like myself who are partisans in the global warming controversy must tread carefully here. In the interests of full disclosure, I will state that I am against the hoarding of data in excess of 6 to 12 months. Astronomers do this, to establish primacy of discoveries. But then, after they announce the discovery, they lift the data embargo.

What M&M protested was - as they claimed - that Mann made world-shaking pronounements that affected public policy and then refused to share his data or methods. They had to pry it out of him, etc. And they also protest that journals wouldn't look at their critique of Mann's data selection policy or methods of analysis.

Is that what this page is about?

When I studied science, it was drummed into me that a scientific "finding" is UTTERLY WORTHLESS until other scientists can replicate the experiments, analyses, etc. In other words, it is not "science" until it has been checked and confirmed. The scientific humor journal "Irreprodcible Results" is a cheery reminder of this.

The essence of the global warming controversy is a battle over whether AGW theories are based on evidence (i.e., data and methods and reasoning) which is reproducible. Lomborg, M&M, Singer, Lindzen, et al., vigorously deny this reproducibility. That is, they took their own look and said it was all balderdash.

When the scientific establishment vilifies skeptics, this is a breakdown is process.

I do not intend these comments as a personal essay, but as background for producing a better article an data archiving. It touches on other climate article writing, too.

Some contributing editors to Wikipedia believe that AGW has been objectively proven and apparently see no point in describing the "minority views" of Lindzen, et al. - unless clearly marked as "wrong", but this is an editorial mistake. These views should not be marked as wrong but (at most) as "being in the minority". Even then, we should cite sources who say those are minority views and not make even that judgement ourselves.

For example, say that the following sources say that the ideas of Lindzen, Baliunas, (list a few more) represent a minority view:

  • recent article by 'history of science' professor Oreskes

We should also indicate which of these sources is an advocate of any particular public policy recommendation (like the Kyoto Protocol), or just wants 'action to be taken' in general.

It's significant when we indicate sources who assesss "facts" to also indicate things like their source of funding and the policies they push. It's a commonplace belief that money corrupts: "Big Oil" wants to downplay the threat of GW! EPA policy was set by the environmentalist Clinton-Gore team! Then there's the altogether human tendency to cherry-pick the statistics to support one's preconceived conclusions. "Let's see, I want Kyoto, so which studies and assessments support it?" --Uncle Ed 13:45, 25 March 2007 (UTC)

The article is much bigger than just Michael Mann and climate science, although I think that is an important example of why data archiving is important. If all of the data and methods are archived prior to publication, there can be no withholding of data. However, the article is also about the importance of data archiving in other fields, especially drug research and public health. This article touches on the scientific method, the history of science, modern scientific data centers, current policies of NSF and scientific journals and answers a number of important questions for students and other readers of Wikipedia. RonCram 13:56, 25 March 2007 (UTC)

Signing

I apologize for my frequent failures to sign my comments. If Dr. C does it, please forgive him too. We can always go back in later and insert the proper {unsigned tag} by checking the history. It is a nuisance, but maybe the degree of our annoyance indicates the degree of our need to slow down. What's the rush, anyway? --Uncle Ed 15:12, 25 March 2007 (UTC)

Ed, no problem. I sometimes forget to sign myself and have to go back to sign. Thanks for reminding me that I can check History whenever I do not know. RonCram 16:11, 25 March 2007 (UTC)

The Intro William deleted

Scientific data archiving refers to the long-term storage of scientific data and methods. Scientific journals and funding agencies generally have policies requiring scientists to store in a public archive any of their data and methods necessary to reproduce their studies. This is considered the best practice because it insures other scientists can audit the data, replicate the research and build on their findings.

Data archiving is more important in some fields than others. For example, mathematics journals do not have policies on data archiving. All of the data necessary to replicate the work is already available in the journal article. That is not true among many fields of study. In drug development, a great deal of data is generated and must be archived so researchers can verify that the reports the drug companies publish accurately reflect the data.

The requirement of data archiving is a recent development in the history of science. It was made possible by advances in technology allowing large amounts of data to be stored and accessed from central locations. For example, the American Geophysical Union (AGU) adopted their first policy on data archiving in 1993.[2]

Prior to data archiving, the researchers who wanted to evaluate data and methods would have to contact the author of the study and rely on him to provide the supplemental information. This process was recognized as wasteful of time and energy and obtained mixed results. Information could become lost or corrupted over the years. In some cases, authors simply refused to provide the information. This type of data withholding is known as pseudoscience because the research does not meet the requirement of reproducibility. [1]

Data withholding still happens today. When researchers either fail to archive the data or they archive some - but not all - of the data they may be tempted not to provide the data. Alternatively, the data may have become lost or corrupted and unable to be provided. This is very embarrassing to the researcher and the journal who published the research.

The need for data archiving and due diligence is greatly increased when the research deals with health issues or public policy formation. [3] [4]

References

  1. ^ a b c For example, Hewitt et al. Conceptual Physical Science Addison Wesley; 3 edition (July 18, 2003) ISBN 0-321-05173-4, Bennett et al. The Cosmic Perspective 3e Addison Wesley; 3 edition (July 25, 2003) ISBN 0-8053-8738-2 Cite error: The named reference "cps" was defined multiple times with different content (see the help page).
  2. ^ ”Policy on Referencing Data in and Archiving Data for AGU Publications” [1]
  3. ^ "The Case for Due Diligence When Empirical Research is Used in Policy Formation" by Bruce McCullough and Ross McKitrick. [2]
  4. ^ "Data Sharing and Replication" a website by Gary King [3]

End of Intro. Discussion can start here. I believe this newly written Intro makes more clear the fact that a failure not to archive is not necessarily data withholding. This was the big issue with Kenosis. It is clear from the textbooks that data withholding is pseudoscience.RonCram 13:50, 25 March 2007 (UTC)

The PS bit remains unacceptable, as before William M. Connolley 14:15, 25 March 2007 (UTC)
Kenosis does not agree with you. And he edits the Pseudoscience article regularly. You cannot get around the textbook quotes, William. Besides, you did not even attempt to make the Intro better. You just did a wholesale revert. The Intro contains more information now and clarifies the common misunderstanding that lack of archiving was pseudoscience. Why not try to make the article better rather than just revert? RonCram 14:18, 25 March 2007 (UTC)
I suspect you refer to this edit. But Kenosis does not say what you claim, you just read it they way you want it to read. He later expanded his position here, and indeed shares the same position that I (and I suspect William) hold: "[Data witholding] is an indicator, one of may possible indicators, that may contribute to a judgment of a particular enterprise as being pseudoscience" (my emphasis). Data withholding alone is neither sufficient nor required for something to be pseudoscience. --Stephan Schulz 17:13, 25 March 2007 (UTC)

This article

I am unconvinced that this article covers an encyclopedic topic. It seems much more like an argument in a debate. To me, it reads like this: "The main thing you need to know about scientific data archiving is that research whose data is not publicly archived is suspect to the point of being pseudoscience, and this is especially important when the research concerns topics relevant to public policy, like.... global warming! McIntyre and McKitrick found problems in a GW paper whose data was not publicly archived; therefore, implicitly, important parts of climate science are pseudoscience." And then some policy statements from Science and Nature are copied & pasted in to make things look more respectable.

However I am happy to be convinced that there is a worthy topic here. Perhaps, Ron or others, could you describe the key goals for what information should ideally be presented in this article? I'd like to see discussion of the overall goals and their notability, not details like some of what has been discussed above, so that if there is a worthy topic to be found here, we can work on improving the article towards that goal. --Nethgirb 05:25, 22 March 2007 (UTC)

There's the germ of a worthwhile topic here, but it's poorly developed as you have described. There are lots of interesting issues that could be covered: data formatting, cost of data distribution for the very large datasets used in modern science, implications of intellectual property rights, and so on. But none of those things are addressed in the article. Ron could turn this into a decent article, but he'll need to focus on more difficult and subtle issues than he has so far shown an inclination to address. Raymond Arritt 05:40, 22 March 2007 (UTC)
You guys are certainly welcome to make improvements. I do not own the article. Information on the cost of data archiving would probably be interesting to readers as would other topics. This subject is very important and poorly understood, as this Talk page attests. I invite you to make edits as long as your goal is to make the subject well understood to readers. If you want to explain how difficult the policies of the NSF and journals are to the authors, that information would be interesting. But I think it is going to be difficult to convince readers that business people should be required to put together due diligence packages for investors, but that scientists cannot be bothered to do the same. RonCram 05:50, 22 March 2007 (UTC)
If you decide to read the Pseudoscience article, you can start with the first two sentences. Then you may want to focus on the section "Identifying Pseudoscience," especially the subsection "Lack of openness to testing by other experts." The quotes you have been given are from commonly used textbooks. These are not new definitions or descriptions of pseudoscience. And yes, "lack of openness" is the same thing as "withholding data." RonCram 06:14, 22 March 2007 (UTC)

This article... contains too much OR. At least it does in Rons version. It also betrays a lack of familiarity with these things in practice: the policies may exist; but they are rarely used William M. Connolley 09:07, 22 March 2007 (UTC)

William, your point that the policies are not generally enforced is OR. I happen to think it is true mainly because the journals do not have time to determine exactly what data and methods info would be required for reproducibility. They are overly dependent on the authors on these issues. So I am willing to let the point stand. One point the article does not make clear yet is that the journals will get involved when authors refuse to provide researchers with data. Also, the article should not make it sound as if all of the data archives listed in the article (and this is only a partial list) are empty. They are not. A great deal of data and methods info is stored in these archives and is accessible by researchers. I am not willing to cut the well-sourced point that withholding data and methods is Pseudoscience. Any commonly used textbook on the scientific method will agree that researchers have to make their data and methods known to other researchers. A refusal to do so means their work is not science. RonCram 14:49, 22 March 2007 (UTC)
Ah. You're equating "make their data and methods known" to "provide the specific data set and line-by-line computer code." You're saying if they don't do the latter, they aren't doing the former. Therein lies the crux of the biscuit. Raymond Arritt 15:41, 22 March 2007 (UTC)
Raymond, I am saying the authors have to make available whatever is necessary to reproduce the study. If you read the article, you will know the NSF and the journal policies require authors to archive their source code. I'm not the one saying it. The NSF and the journals are saying it. RonCram 15:45, 22 March 2007 (UTC)
So then your argument is that work that doesn't adhere to NSF policy is "pseudoscience." What about science performed in other countries where NSF regulations do not apply? Raymond Arritt 16:08, 22 March 2007 (UTC)
Raymond, your statement is not exactly correct. NSF policy is in place to guard against pseudoscience. It is possible for science to be conducted where such a policy is not established. All the researcher has to do is make his data and methods known to researchers wishing to reproduce his study. As a matter of "best practices," NSF and journals have determined that archiving data prior to publication is the best way to prevent data loss or corruption that may prevent a study from being reproduced. RonCram 16:28, 22 March 2007 (UTC)
William, What part of the Pseudoscience article do you think does not apply here? please explain yourself and also please see Wikipedia's rules on Reversions Help:Reverting--Zeeboid 16:05, 22 March 2007 (UTC)
(edit conflict) Just correcting a pet peeve I have. Referring people to a policy and not narrowing it down for the readers gives no guidance to either side in case there are conflicting interpretations, a misinterpretation or exceptions being assumed. I believe Zeeboid is referring to the When to Revert section, specifically the Do's (which most rv's do not qualify) and Dont's (which most rv's directly violate). He can correct me if I am wrong. -- Tony of Race to the Right 17:31, 22 March 2007 (UTC)
Nope, thats correct. Sorry for not making it more clear, I was following poor examples given by some of the admins here. my mistake.--Zeeboid 17:38, 22 March 2007 (UTC)

Reproducibility AEB

Which part do you think does apply? The only sentence that seems to be germane that I can find is "Failure to provide adequate information for other researchers to reproduce the claimed results" from the list of "indicators of poor scientific reasoning". This is only one of the listed indicators, and as far as I can tell, very few (if any) of the other listed indicators are even claimed to apply. MBH98, e.g., was extensively peer-reviewed, has been verified, reproduced, and extended, and, indeed, some of the "sceptics" make the very conspiracy claim that is listed as one of the other indicators. In other words, first, there is sufficient data for reproduction (if not for "auditing"), and secondly, restricing access to data may be an indicator of pseudo-science, but it is neiter sufficient (otherwise any secret service would engage in pseudo-science) nor required (astrology is not less of a pseudo-science even if all tables and calulations are openly published). --Stephan Schulz 17:26, 22 March 2007 (UTC)

(edit conflict) Stephan Schulz asks, "Which part do you think does apply?". The burden of justifcation should be on the initial reverter, since a rv should only be done for specific reasons (not for content disagreement) and logically should be explicitly justified the burden of proving a lack of (or existance of) "application" should rest on the person making a potentially improper rv, not on the person attempting to correct a candidate for improper rv. Given the policy states a rv "is used primarily for fighting vandalism, or anything very similar to the effects of vandalism" the initial rv amounts to an implication of vandalism. The burden of proof/justification thus is only appropriately upon the prosecutor to justify why the rv was necessary (iow, how the reverted edit qualified as vandalism--and then each of those rv should properly be accompanied with actions laid out in the Dealing with Vandalism policies. -- Tony of Race to the Right 17:48, 22 March 2007 (UTC)
Mu. --Stephan Schulz 17:53, 22 March 2007 (UTC)
Thats easy. Please refer to the first sentance of the Pseudoscience article which refers to the Scientific method#Elements of scientific method.--Zeeboid 17:35, 22 March 2007 (UTC)
I did, and I again found nothing applicable. --Stephan Schulz 17:53, 22 March 2007 (UTC)
Huh, thats strange. let me try to simplify it for you. Come now, Stephan Schulz, we'll go through this together. Is something proven by the Scientific Method Reproducable? (for a hit, please refer to the wiki article Reproducibility--Zeeboid 18:24, 22 March 2007 (UTC)
The scientific method does not prove things, of course. It generates and validates (or disproves) scientific theories. However, reproducibility is one requirement for scientific experiments and observations. I can't wait for the next step... --Stephan Schulz 18:35, 22 March 2007 (UTC)
Excelent. We agree that reproducibility is important to the Scientific Method. In fact, I would go so far as to say, if something is not Reproducable, it would be impossible to justify it by the Scientific Method. Correct?--Zeeboid 20:42, 22 March 2007 (UTC)
No, even in the way you formulate the question. In fact, this claim displays a stunning lack of understanding of the scientific method. Reproducibility refers to experiments and observations. Justifiability refers to scientific hypotheses and theories. How should a theory be "reproducible"? --Stephan Schulz 21:30, 22 March 2007 (UTC)
So, You are saing Reproducibility is not only Not important to the Scientific Method, but to say so shows a "Stunning lack of understanding for the scientific method," yet Reproducibility refers to experiments and observations? Are Reproducable Experimints not important to the Scientific Method? Have you read reproducibility?--Zeeboid 13:38, 23 March 2007 (UTC)
I have said no such thing. I said (and continue to say) that the fact that you think that reproducibility (which applies to observations and experiments) and justifiability (which applies to theories and hypotheses) somehow apply to the same concepts displays a stunning lack of scientific understanding on your behalf. Reproducibility of experiments and observations is of course important. But notice that again this an abstract ideal. SN 1987a only explodes once. You can "reproduce" the observations only imperfectly, by looking at other supernovae. That does not make astronomy a pseudo-science. --Stephan Schulz 14:34, 23 March 2007 (UTC)
"Reproducibility of experiments and observations is of course important." Yes it is important. So much so that Reproducibility states: "Reproducibility is one of the main principles of the scientific method, and refers to the ability of a test or experiment to be accurately reproduced, or replicated, by someone else working independently." So, then if Something is not Reproducable, it can not be proven by the Scientific Method, Correct? Because, as you well know, Falsifiability is "a gradual process that requires repeated experiments by multiple researchers who must be able to replicate results in order to corroborate them." Which is why the Big Bang Theory is just that, A Theory. Correct?--Zeeboid 16:01, 23 March 2007 (UTC)
Do you know what a "theory" is? Raymond Arritt 16:16, 23 March 2007 (UTC)
Zeeb, I find it amazing how you jump from "reproducibility [...] refers to the ability of a test or experiment to be accurately reproduced[...]" to "if Something is not Reproducable". There are more somethings that tests or experiments, you know. And for the approximately thousands time: The scientific method is unable to prove anything. And your sentence about "falsifiability" makes no sense at all. Falsifiability is a philosophical concept. Do you mean falsification? But even in that case, the statement would be wrong - scientific theories can be falsified rather rapidly. --Stephan Schulz 16:32, 23 March 2007 (UTC)
You guys must have made very difficult students. Let me bottom line it for you by just jumping to the end. Belieft in man caused Global Warming is Pseudoscience because Pseudoscience is any body of knowledge, methodology, belief, or practice that claims to be scientific but does not follow the scientific method. Was that easier to understand?--Zeeboid 19:21, 25 March 2007 (UTC)

Pseudoscience AEB

From above: Firstly, the PS article talks about fields - not individual papers. Secondly, it talks about presenting findings - not about basic raw data. As Ron himself admits, nothing there addresses data archiving. Because... reproducibility doesn't require the original data, in general. I've just looked at Scientific_method#Elements_of_scientific_method as recommended by Zeeboid. I find nothing there about archiving. I notice that Z is being very coy about which bit he considers relevant, and has failed to answer Stephans questions. William M. Connolley 19:25, 22 March 2007 (UTC)

William, the Pseudoscience article does not discuss only "fields." The first line says "Pseudoscience is any body of knowledge, methodology, belief, or practice that claims to be scientific but does not follow the scientific method." Notice that it includes methodology or practice which relates to individual papers. So your first point is void.
Your second point is also invalid. The Pseudoscience article references the definition in the Oxford American Dictionary as "pretending to be scientific, falsely represented as being scientific." This definition fits those researchers who publish articles but then refuse to provide the data and methods necessary to reproduce or audit their findings. The Pseudoscience article also has a section on "Identifying Pseudoscience." This section has a subsection - "Lack of openness to testing by other experts." Any researcher who refuses to provide data and methods on a published paper is guilty of lack of openness. The last two points of these subsection are particularly telling:
  • Failure to provide adequate information for other researchers to reproduce the claimed results.[1]
  • Assertion of claims of secrecy or proprietary knowledge in response to requests for review of data or methodology.[2]
Data archiving is a "best practice" to prevent researchers from losing data or refusing to provide data. It is embarrassing to journals when a researcher refuses a request for additional data and method information. It is a sad truth that scientists (especially climate scientists) rarely seek to reproduce or audit the work of others. If it was done more often, researchers would better understand what information is necessary to complete an audit and they would better understand what info they should archive. Archiving it one time is much easier for the researcher than having to reply to multiple requests for information.RonCram 00:43, 23 March 2007 (UTC)
Ron, I despair of your inability to read what you quote. PS says any body of knowledge - not individual papers. There is nothing about archiving in Sci Meth - its about reproducibility, which is different. You are leaning too heavily on the PS article for fringe claims William M. Connolley 09:38, 23 March 2007 (UTC)
William, we all despair of your inability to use consistant standards when it coems to maintaining a NPOV. A body of Knowledge is built up of individual papers. I would ask you to follow allong with me here. You seam to have an issue with reading only what you want to believe. Ok, here we go. The first line of the Pseudoscience article says "Pseudoscience is any body of knowledge," Notice the Comma. that means we continue. "methodology," okay, there is more. "belief, or practice that claims to be scientific but does not follow the scientific method." I believe even you can agree that individual papers qualify as "methodology, belief, or practice," So your first point is void.
RonCram is also accurate as to why your second point is invalid. Please click the hypertext link and read about Pseudoscience.--Zeeboid 13:48, 23 March 2007 (UTC)
William, are you going to start with the personal attacks again? I know how to read quite well. I will explain to you the meaning of the first sentence. "Pseudoscience" can be a proper and accurate description of "any body of knowledge, methodology, belief, or practice." A paper will employ any number of "methodologies" or "practices." A paper may contain both scientific and unscientific methods. If any part of the paper is infected or contaminated with methods or practices that are not scientific, then it is pseudoscience. You seem to be arguing that methods cannot be unscientific. Such a position is ridiculous. Of course "methods" can be pseudoscience. Notice the term "scientific method." The methods used are important. "Practices" can also be scientific or unscientific. The key unscientific practice here is the practice of withholding data. Such a lack of openness is contrary to the scientific method and is Pseudoscience. RonCram 22:13, 23 March 2007 (UTC)
Sigh. Methods can be pseudoscience. But since the methods aren't at issue, why do you bring it up? Withholding data isn't pseudoscience, however much you may dislike it. You are leaning far too heavily on a thin piece of one article William M. Connolley 22:29, 23 March 2007 (UTC)

William - Good! We are making progress. Earlier you were trying to argue that pseudoscience only applied to "fields" and not papers or the methods used in papers. The "issue" at hand is data archiving. Data archiving is a "best practice" that has been adopted as policy by the NSF and most journals publishing in fields that rely on supplemental information. As a way to highlight the importance of data archiving, the article also discusses the opposite - data withholding. Data withholdin is Pseudoscience because it does not meet the objective of openness that the scientific method requires.RonCram 22:54, 23 March 2007 (UTC)

No, we are making no progress, you are simply making the same unsupported assertions as before William M. Connolley 13:01, 24 March 2007 (UTC)
William, my assertions are well supported. I am not relying on "a thin piece of one article." In addition to the definition in the Pseudoscience article (which was hashed out in a previous discussion like this), I have also quoted the textbook by Mohr [20] above. We also have the textbook by Gauch Scientific Method in Practice that is referenced in the Pseudoscience article when he talks about "no wizards behind the curtain." We also have the textbook by Hewitt "Conceptual Physical Science" and the book by Bennett "The Cosmic Perspective" which are both referenced in the Pseudoscience article. Almost any commonly used textbook that discusses the scientific method in much detail is going to say the same thing. Science requires openness. If researchers do not provided data to others, it cannot be called science.RonCram 14:02, 24 March 2007 (UTC)
William, I forgot to mention the 2001 book by Dawes that Mohr referenced. It is titled Everyday irrationality: How pseudo-scientists, lunatics, and the rest of us systematically fail to think rationally. RonCram 17:06, 24 March 2007 (UTC)

Data withholding

I did a little googling and found some mentions of data withholding - all in peer-reviewed journals like Science - and all saying it's "not good". Should we have an article on this, or just make it a section of Scientific data archiving, or what?

Do we already have enough coverage on withholding? --Uncle Ed 16:52, 25 March 2007 (UTC)

I think if you have some info from science journals talking about data withholding, it will probably fit well into this article. Please introduce it here along with the citation so we can see for ourselves. Thanks! RonCram 17:16, 25 March 2007 (UTC)
All of this stuff about data withholding is in the wrong place at this article. The alleged connection is "if the data had been archived, then it would not have been withheld." But then data withholding also belongs in articles about "Scientific honesty", "Responding to email requests for data in a timely manner", "Altruism", etc. Point is, data withholding deserves at most a "See also" bullet point in an article about data archiving, or perhaps a single sentence in the introduction giving one of several reasons that data archiving is a good thing. It does not deserve a whole section about the horrible things that may hypothetically have been avoided had data been archived.
It does seem to be info that could fit somewhere (e.g., the E.G. Campbell article that Ron found is interesting) but I'm not sure which article -- would be happy to see suggestions. --Nethgirb 22:09, 25 March 2007 (UTC)

See talk:data withholding for science journals + one regular newspaper. Seems to be an ongoing problem with genetics research, but not confined to that field. --Uncle Ed 00:02, 26 March 2007 (UTC)

I would be okay with an article dedicated to data withholding. We could move some of the info from this article and then link to the data withholding article. It is going to require some work, Ed, but I would be willing to work with you on it.RonCram 03:09, 26 March 2007 (UTC)

Pardon me for laughing. But I hadn't known about the Scientific American being involved in a data withholding controversy. (Sounds kinda like saying "What, you don't trust our statiistical analyses of our own data?? So do your own experiments!!" or "Oh yes, it can all be found securely kept in a basement vault in the Ural mountains") Computers being what they are today, though, there appears today to be less justification for this kind of thing than in the past. ... Kenosis 10:07, 26 March 2007 (UTC)

The Climate Change Research section William deleted

In 1998, Michael Mann, Ray Bradley and Malcolm Hughes published an article on paleoclimatology.[21] In 2003, Steve McIntyre and Ross McKitrick decided to audit the published findings of Mann et al. Dr. Mann refused access to data and his source code.[3] After a long process - in which the National Science Foundation had supported Mann's effort to withhold the code - the code was finally turned over.[4] It happened because Congress investigated. In June 2005, Congress required Mann to testify before a special committee. Pursuant to the powers of Congress, the chairmen of the committees wrote a letter to Mann requesting he provide his data - including his source code. [5] When Mann complied, all of the data was available for a complete audit. Congress also requested that third party science panels review the criticisms of McIntyre and McKitrick. The Wegman Panel [6] and the National Academy of Sciences [7] both published their reports. McIntyre and McKitrick claim their findings have been largely confirmed by these reviews. [8] Dr. Mann published a Corrigendum in which he admitted some errors but denied others. Mann claims that the errors found made no difference to his conclusions.[9] Without access to the author’s data, methods and source code, a full audit could not have been made.

In 2006, Martin Juckes et al submitted an article to Climate of the Past which was then made available for comment on the Internet. The article claimed the source code used by McIntyre and McKitrick was not archived. McIntyre responded that the accusation was false and may be academic misconduct, with an implicit threat of legal action against Juckes and coauthors. [10] False claims regarding data archiving are usually easy to establish. Juckes blamed the inaccurate statement on a misunderstanding. [11]

References

  1. ^ Gauch (2003) op cit 124 ff"
  2. ^ Gauch (2003) op cit 124 ff"
  3. ^ "Mann on Source Code" by Stephen McIntyre[4]
  4. ^ "Title to MBH98 Source Code" by Stephen McIntyre [5]
  5. ^ "Letter from Congress to Dr. Mann dated June 23, 2005" [6]
  6. ^ "The Wegman Report" [7]
  7. ^ "Surface Temperature Reconstructions for the last 2,000 years" by National Academy of Science [8]
  8. ^ "A Scorecard on MM03" by McIntyre and McKitrick [9]
  9. ^ See Corrigendum of "Global-scale temperature patterns and climate forcing over the past six centuries" by Mann et al [10]
  10. ^ Potential Academic Misconduct by the Euro Team" by Stephen McIntyre [11]
  11. ^ Martin's Big Day by Stephen McIntyre [12]

End of section on Climate Research data withholding. Discussion can start here. I will note that William has no problem with the other two sections that talk about other scientists who did not archive and withheld data, but he has a problem with this one. He is not willing to make any improvements or even say what is wrong. He just says it is too unbalanced to be saved. That is bogus. Every point is factual and well sourced. If any information is left out, he is welcome to add it. RonCram 13:50, 25 March 2007 (UTC)

(This comment is out of order on purpose - bear with me) => Good timeline of events, but as William said below it must not be one-sided. Even a well-referenced passage which is one-sided can be "biased" and thus fail to be in accordance with WP:NPOV policy. It's a good start, though. Now tack on the opposing views. You might glance at my essay, Wikipedia:Writing for the enemy, which is so well-regarded that jossi (?) linked to it recently. --Uncle Ed 14:51, 25 March 2007 (UTC)
You can't seem to appreciate the obvious - thats its written as an attack piece and unacceptable for that reason - its written entirely from M&M's POV. As to comparisons with the other two sections... notice the disparity. The heart stuff starts with the researcher - not the people attacking him. Indeed, they aren't even mentioned. How curious. Rather than using fraudulent directly it would be better quoting from something, though. The genetics one seems quite nice - a proper study. If you could find something similar for climate, it would be very useful. William M. Connolley 14:37, 25 March 2007 (UTC)
If there is a controversy - outside of Wikipedia - about M&M's response to Mann, then perhaps it should be described in Michael Mann (scientist) or in an article on M&M. I would prefer the latter, or even a spin-off article on Criticism of the hockey stick.
Once we whip this into shape - i.e., agree that it has been written neutrally - we can incorporate it as a section of 'hockey stick' or 'Mann'. If it's too long, then we can write neutral summaries where needed and use {{main}} to refer to 'criticism'.
Errm - there you have the problem. This is all within hockey stick controversy anyway William M. Connolley 14:49, 25 March 2007 (UTC)
All in favor, say Aye!
Aye to William's comment. If it's already in the Hockey Stick controversy, then it is completely redundant here (and a thinly veiled POV fork). --Skyemoor 20:38, 25 March 2007 (UTC)
Someone is not signing their posts with four tildes like this ~~~~. Without the signature, it is impossible to know who is writing and difficult to know if it is one or more persons. Please sign your comments. William, your comment genuinely seemed to be trying to improve the article. I hope that attitude continues. I can certainly rewrite the piece so that it begins talking about Mann first. However, you are incorrect to say that the people investigating the others are not mentioned. The medical journal investigated Dr. Singh. I agree that a proper study into data withholding in climate science would be valuable. AFAIK, one has not yet been done. The fact Dr. Mann was guilty of data withholding is not seriously in doubt by anyone who knows the facts. It was not until Congress got involved that Mann finally agreed to turn over his source code. This event was very notable and deserves to be in the article.RonCram 15:01, 25 March 2007 (UTC)
Re: The fact Dr. Mann was guilty of data withholding is not seriously in doubt by anyone who knows the facts. If this is beyond dispute, there would not be a Hockey stick controversy. Surely somebody out there in science-land or media-world or the blogosphere disputes it. Anyway, this is an encyclopedia - or it's trying to be - so let's include some verifiable references, okay?
(edit conflict) Indeed. And of course Ron is conveniently forgetting that this is MBH, not M, as paper authors. But he (& M&M) like to personalise this stuff William M. Connolley 16:03, 25 March 2007 (UTC)
Ed, I understand that you are Wikipedia:Writing for the enemy (BTW, nice essay) but I need to correct you on one point. You write: "If this is beyond dispute, there would not be a Hockey stick controversy." This is not accurate. The HS controversy is about Mann's statistical methods and Mann's choice to use the bristlecone pine series which was determined by the National Academy of Sciences to be a poor temperature proxy and should not be used. McIntyre and McKitrick could not have performed a complete audit without gaining access to all of Mann's data and methods. The last thing to come in was the source code, something Mann had said he would never turn over. It was not until Congress convened hearings into the matter and required Mann to testify that Mann finally turned over his source code. This is a major event. That point is not in the article at this point because I need to relocate the reliable source again. Mann did everything he could to stall the audit by M&M. This is a fascinating subject, Ed. And one you would do well to study. I hope you will help me find reliable sources to better explain to readers the events relating to the data withholding.RonCram 15:57, 25 March 2007 (UTC)
There is a lot of tendentious stuff there which is not worth answering. Save the propaganda for elsewhere William M. Connolley 16:03, 25 March 2007 (UTC)
William, that always seems to be your answer when you don't like article content. The following facts seem established:
  • There is a controversy in regards to Mann et al. not releasing data/source code for review.
  • This controversy is well noted in external sources and was taken to congress for resolution.
  • There is a section on this page that attempts to chronicle notable controversies regarding scientific data archiving and/or the release of said data.
The logical conclusion seems to be that this is one of the most notable examples of what the article is about. It should contain a section and a link to hockey stick controversy. What's the debate here? If you don't like the way the section was written, feel free to add your own well-sourced sections. You can't just call it POV because you don't like the topic. Oren0 16:18, 25 March 2007 (UTC)
You don't seem to be reading what I wrote. I accept its notable. Im not calling it POV because I don't like the topic. But the current version is too badly one-sided to be tolerable William M. Connolley 19:32, 25 March 2007 (UTC)

And it is badly off-topic of the general Scientific Data Archiving subject. It should be retitled to "Hockey Stick Controversy: Data Access Fork". --Skyemoor 20:38, 25 March 2007 (UTC)

William, I am only trying to draw a distinction between the act of refusing data and the controversy itself. Ed seems to be confusing the two. Regarding my use of Mann's name rather than Mann et al, I am happy to make the change in the article, although I do believe Mann led the charge in the refusal to provide data and his source code. RonCram 16:09, 25 March 2007 (UTC)

William and Ed, I have found a reliable source for Congress requesting Mann's source code. I will add it to the section above.RonCram 16:49, 25 March 2007 (UTC)
William, your deletion of this section is obviously an attempt at a cover up of the pseudoscience behind global warming - a subject you have completely taken over on Wikipedia. Your job and your POV global warming blogs alone should make you recuse yourself from these articles if you were at all honest about your heavy involvement and financial self interest. However it is clear from your vanity page (William Connolley) that such considerations as fairness and NPOV are far from your methodology. This is more of your censorship - pure & simple. ~ Rameses 14:48, 26 March 2007 (UTC)

spinoff poll

Non-binding poll to get a quick read on the value of spinning off the M&M critique

Do a spinoff:

Do not spin it off:

  1. Oren0 16:18, 25 March 2007 (UTC)
  2. ~ Rameses 14:37, 26 March 2007 (UTC)

Leave it in the Hockey Stick article:

  1. Skyemoor 20:31, 25 March 2007 (UTC)
  2. Nethgirb 02:20, 26 March 2007 (UTC) -- It already has been spun off into Hockey stick controversy
  3. Kim D. Petersen 03:34, 26 March 2007 (UTC) -- i agree with Nethgirb - its already covered.
On the contrary, I just read the Hockey stick controversy article and none of the information about Mann's refusal to turn over data is covered in the article. In addition, the article is in severely POV and badly in need of a rewrite.RonCram 13:57, 26 March 2007 (UTC)
OK, but if the data withholding accusations are worth including, then the right thing to do is put it in Hockey stick controversy rather than starting a new article. --Nethgirb 18:08, 26 March 2007 (UTC)

Other:

Because of the controversy above, I have not had time to work on developing the data archiving aspects of this article. It appears that can be resolved by moving that info to a Data withholding article, so I want to return to discussing data archiving. Here are some areas of interest, along with some links. Please help me mold this into a cohesive, useful and interesting article.

Organizations and Policies

  • American Psychology Association [22]
  • CODATA [23]

Data archiving in Astronomy and Astrophsyics

The future of astrophysics is tied to the progress in data archiving. One of the most interesting and ambitious goals in the field is to develop a "virtual observatory." [24]

Who can access the data?

Each data archive has its own rules about data can be accessed. Most archives require a researcher to register. The Defense Technical Information Center has both classified and unclassified information. [25]

Problems in data archiving

Problems include changing formats, difficulty in cataloging and accessing data, etc. The problems are not short-term. [26]RonCram 04:24, 26 March 2007 (UTC)

I'm not yet convinced "Data withholding" deserves its own article -- a subsection of another article perhaps, though as I've said I'm not sure which one.
The above topics all look interesting and worthy of inclusion in Scientific data archiving, though. As far as shaping this article goes, I think it would help to refer to an external article on scientific data archiving (as opposed to an article on a particular field's archive) -- have you seen such a thing, Ron? --Nethgirb 07:47, 26 March 2007 (UTC)
By "article" do you mean in a science journal or popular magazine? This article from just above in interesting but maybe not as general as you want. [27] This piece from NAS is also interesting. [28] Gary King from Harvard links to a number of articles on data sharing and replication. [29] This article looks very interesting but is not free. [30] As does this one. [31]RonCram 14:17, 26 March 2007 (UTC)
The first one looks interesting. Second one doesn't seem to be about data archiving. Haven't looked at the others yet. Thanks for the links, Ron. --Nethgirb 18:16, 26 March 2007 (UTC)
Nethgirb, I thought the second one was interesting for a possible section on "History of data archiving" describing some of the philosophical underpinnings and realities of science since the 17th century that led to data archiving. RonCram 01:51, 27 March 2007 (UTC)