Talk:Entropy in thermodynamics and information theory

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Entropy in thermodynamics and information theory[edit]

This article was created from content copied from the information theory article. I believe this topic justifies its own article for the following reasons (130.94.162.64 02:29, 5 December 2005 (UTC)):[reply]

  • There is popular interest in the topic. (As seen from various comments on talk pages and so forth.)
  • There are deep, undeniable connections between thermodynamics and information theory.
  • People have been contributing information on this topic into various articles where it doesn't properly belong, and that information should have its place.
  • It will take a considerable amount of space to adequately explain the topic.
  • It will avoid the need to duplicate this information and clutter up other related articles such as entropy, reversible computing, information theory, and thermodynamics.
  • Those who wish to deny certain connections between thermodynamics and information theory will have a place to contribute verifiable information that supports their viewpoint, rather than just deleting information that would tend to oppose it.
  • There are without doubt some proposed connections between information theory and thermodynamics that simply are not true. (Zeilinger's principle looks like one example to me.)
  • The Wikipedia community has successfully written articles from NPOV about far more controversial topics than this one.

Just a reminder: Wikipedia is not the place for Original Research. If you want to add something, you need verifiable sources to back it up. I'm not that familiar with Wikipedia policy, but it's probably O.K. to mention recent and ongoing research from a Neutral Point Of View, as long as one places it in proper context and does not make grandiose claims about it. (BTW, I'm the same person as 130.94.162.64; my computer just died on me.)-- 130.94.162.61 19:03, 6 December 2005 (UTC)[reply]

I feel this article is incomplete without at least a reference to Jaynes work on the relation between statistical mechanics and information theory. — Preceding unsigned comment added by 98.235.166.181 (talk) 13:02, 31 March 2016 (UTC)[reply]

Zeilinger's Principle[edit]

Does anyone want to write an article on this one? I know we don't have an article on every Principle that comes along, (nor do we want to), but it looks like some researchers went to a lot of trouble to refute it. I'm not sure how important it is. Maybe someone can just explain it a little more in this article. -- 130.94.162.64 03:24, 5 December 2005 (UTC)[reply]

On second thought, I do think some more explanation of Zeilinger's principle is in order. I seem to recall it was popular a number of years ago. -- 130.94.162.61 18:55, 6 December 2005 (UTC)[reply]

Difficulty of Research[edit]

The whole subject of information theory is murky, difficult to research, and shrouded in secrecy, (especially, for some reason I am as yet unaware of, as it relates to the things discussed in this article). Little academic research of note has been published on it for the last forty years. The journals that once covered information theory now cover mainly coding theory, its canonical application.

er shrouded in secrecy? If it's a scientific topic, then there are published papers on it. If there are no published papers on it, then it's not a scientific topic. This whole section reeks of POV. Take this off until more external citations are available. It's not that I'm not interested in the topic, it's just that I don't buy the whole shrouded in secrecy argument on any scientific topic.

4.249.3.221 (talk) 16:06, 28 June 2009 (UTC)[reply]

I would like to show, by a little example, why I believe this is so. Let me tell you, from an information-theoretic standpoint, that information surely is not quantized. Consider what Shannon's theorem tells us: that information can be transmitted across a noisy channel at any rate less than the channel capacity. That channel capacity can be ever so small as one wishes to make it, so that for example only a thousandth of a bit at a time can be transmitted through it over the noise, and yet by and by that information can be accumulated and put back together at the other end of the channel with near perfect fidelity.

Now consider a certain TLA (Three Letter Agency) that must work continually with infomation that it wishes to keep out of reach of an adversary. Every employee of that TLA (and there are many many thousands of them) is, in effect, a communications channel that conveys classified information to the adversary. Employees are human. They talk in their sleep. They publish papers. They have friends. What they know influences their actions, perhaps in ever so subtle degrees. An academic might avoid mention of a certain subject in a paper or conversation, and inadvertently make it conspicuous by its absence. People continually let little bits of infomation slip that, taken individually by themselves, would be harmless slips.

But consider a top secret memo that must be circulated widely in the TLA. Now the redundancy of that information is very high, and those little bits of information that inevitably slip can by and by be aggregated by the adversary. A powerful adversary or a determined researcher can conceivably find, aggregate, and reconstruct much information in this way. And the aggregate "leakage" channel capacity of all those employees is no doubt high indeed for that TLA.

I hope that one potential use for this theory was made clear that might explain some of the difficulty in researching it.

As another user put it in another talk page:

"Information theory leaks information."

-- 130.94.162.61 08:10, 7 December 2005 (UTC)[reply]

This example sounds like a lengthy description of Steganography. However, the fact that Information Theory can help with the detection of hidden messages does not mean that that the whole subject is "murky, difficult to research, and shrouded in secrecy"; At most, the example suggests that some applications of the subject may be "difficult to research." Similarly, the fact that Number Theory is the foundation of all public key ciphers does not imply that Number Theory is "murky, difficult to research, and shrouded in secrecy."
That said, the topic addressed by this particular page lies at the intersection of Statistical Mechanics, Information Theory and perhaps Thermodynamics, so the number of experts who can do the topic justice is probably small.

StandardPerson (talk) 04:33, 6 July 2011 (UTC)[reply]

Updated, re-editing needed[edit]

I expanded the text on this subject in the Information theory article, then spotted this page.

So I've copied it all across to here. Some editing may be needed to fit it into the evolving structure of the page here; it would then probably also be worth cutting down the treatment in the Information theory.

But I'm calling it a night just for now. -- Jheald 00:01, 10 December 2005 (UTC).[reply]

I had to make a slight correction. We cannot speak of a "joint entropy" of two distributions that are not jointly observable. The "joint distribution" formed by considering them as statistically independent random variables is a completely artificial (not to mention rather misleading) construction. (It completely fails to take into account which variable we observe first!) Maybe someone who knows more about this will expand on it. All we can really include in the article is information from verifiable sources, such as Hirschman's paper. If you have Original Research on this topic, you will have to write a paper on it so that we can refer to it here in the article. But you might think obut the Difficulty of Research before you do this. :) -- 130.94.162.64 15:20, 20 December 2005 (UTC)[reply]
We need an Expert on the subject to help with this article. -- 130.94.162.64 00:21, 19 March 2006 (UTC)[reply]


the coninuous case[edit]

I see no difficulties in this representation as long as f(x) is a probability density function, p. d. f.. It is the mean value of information of f(x) and its exponential represents a volume of the region covered by a uniform disgtribution (in analogy to the cardinality in the discrete case). Why would it not represent the logarithm of volume covered by any p. d. f.? A Gaussian p. d. f. covers a volume equal to

sqrt( 2 pi e variance )

Kjells 09:06, 29 March 2007 (UTC)[reply]

continuous case needs explanation[edit]

It easy to see that generalizing the Shannon formula to continuous case leads to an infinite entropy. Most irrational numbers yield an infinite information quantity. In fact, in the integral formula, the differential term dx witch is supposed to appear twice has been removed from the log argument, avoiding an infinite result. I am surprised that this trick is never discussed.

Entropy anecdote[edit]

I already mentioned that no history of the concept would be complete without this famous anecdote: Shannon asked John von Neumann which name he should give to the new concept he discovered: . Von Neumann replied: "Call it H." Shannon: "H? Why H?" Von Neumann: "Because that's what Boltzmann called it."

Algorithms 20:04, 7 June 2007 (UTC)[reply]

Do you have a source for this anecdote ?138.231.176.8 Frédéric Grosshans (talk) 13:15, 28 June 2011 (UTC)[reply]
"My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage."
Conversation between Claude Shannon and John von Neumann regarding what name to give to the “measure of uncertainty” or attenuation in phone-line signals: M. Tribus, E.C. McIrvine, Energy and information, Scientific American, 224 (September 1971), pp. 178–184 . Cuzkatzimhut (talk) 22:34, 26 February 2012 (UTC)[reply]

Units in the continuous case[edit]

I think there need to be some explanition on the matter of units for the continuous case.

f(x) will have the unit 1/x. Unless x is dimmensionless the unit of entropy will inclue the log of a unit which is weird. This is a strong reason why it is more useful for the continuous case to use the relative entropy of a distribution, where the general form is the Kullback-Leibler divergence from the distribution to a reference measure m(x). It could be pointed out that a useful special case of the relative entropy is:

which should corresponds to a rectangular distribution of m(x) between xmin and xmax. It is the entropy of a general bounded signal, and it gives the entropy in bits.

Petkr 13:38, 6 October 2007 (UTC)[reply]

Paragraph about evolutionary algorithms should be removed[edit]

Paragraph cited below is very confusing and should be removed. Not only does it fail to clarify anything about the topic of the article it actually makes it much harder to understand by invoking a whole lot of concepts from the area of evolutionary algorithms. For readers with no knowledge from that area this paragraph will be completely incomprehensible. If an example is needed it should be as simple as possible.

Average information may be maximized using Gaussian adaptation - one of the evolutionary algorithms - keeping the mean fitness - i. e. the probability of becoming a parent to new individuals in the population - constant (and without the need for any knowledge about average information as a criterion function). This is illustrated by the figure below, showing Gaussian adaptation climbing a mountain crest in a phenotypic landscape. The lines in the figure are part of a contour line enclosing a region of acceptability in the landscape. At the start the cluster of red points represents a very homogeneous population with small variances in the phenotypes. Evidently, even small environmental changes in the landscape, may cause the process to become extinct.

Enemyunknown (talk) 03:48, 17 December 2008 (UTC)[reply]


Paragraph should be expanded & clarified, or given its own topic

While I agree that the paragraph quoted above is very dense, surely this is an argument for expansion and clarification, rather than deletion.

This paragraph addresses questions that are widely misunderstood or misrepresented in Creationism / Intelligent_design and Evolutionary Biology, yet a clear understanding of the issue should be pivotal for intellectually honest and well-informed participants in the debate about design.

The excision of this paragraph would amount to intellectual cowardice, especially when while stubs to other topics (e.g. Black Holes) remain.

StandardPerson (talk) 05:15, 6 July 2011 (UTC)[reply]

Proposed merger[edit]

FilipeS (talk · contribs) has proposed that the content of the article be moved into History of thermodynamics.

I would oppose this merger, because:

  • This is a distinct topic, of current interest and current disagreement, important in a full current understanding of thermodynamics, and well worthy of a full discussion at the length presented here. (See also the points made at the top of the page by the original anon who created this article).
  • A one-paragraph WP summary style mention would be appropriate in the history article; anything more would be excessive in that context. But a one-paragraph overview, directing the interested reader here for the full story, would be good both for there and for here. Jheald (talk) 09:04, 7 July 2009 (UTC)[reply]
Upon reflection, I must agree. I will withdraw the proposal. FilipeS (talk) 11:24, 7 July 2009 (UTC)[reply]

Information is physical[edit]

I think it would be apropriate with a link to Reversible computing in the section with "Information is physical", since this is an area of (atleast remotly) practical implications.

Original research?[edit]

I like this article, but the sentence

"This article explores what links there are between the two concepts..."

confuses me.

Isn't an exploration of links between two concepts just original research? — Preceding unsigned comment added by 80.42.63.247 (talk) 10:21, 13 September 2013 (UTC)[reply]

Negentropy[edit]

This section needs work to clarify the disparity between what I think was Brillouin's initial over-broad belief (expressed in his 1953 book "Science and Information Theory") that any operation one bit of information had a thermodynamic cost of kT ln 2 (where k is Boltzmann's constant), and our current understanding (due largely to Rolf Landauer) that some data operations are thermodynamically reversible in principle, while others have a thermodynamic cost. This understanding was already implicit in Szilard's 1929 analysis of his one-molecule engine (nicely explained at the end of the preceding section) where he showed that the cost is associated not with any one step, but with the whole cycle of the engine's operation, comprising the acquisition, exploitation, and resetting of one bit of information about the molecule.CharlesHBennett (talk) 20:12, 16 February 2014 (UTC)[reply]

Isn't there a more explicit deep connection?[edit]

From Shannon's booklet: "Thus when one meets the concept of entropy in communication theory, he has a right to be rather excited - a right to suspect that he has hold of something that may turn out to be basic and important. ... for unless I am quite mistaken, it is an important aspect of the more general significance of this theory."

H = - K*sum( pi*log2(pi) ) which corresponds directly to the classical general form of entropy S = - kB*sum( pi*ln(pi) ). With K=1 this H is Shannon information in units of bits (the base 2). S has units of kB=Joules/temperature. But temperature is a measure of the average kinetic energy per molecule, Joules/molecule, so the Joules in kB can cancel. This means S has a more fundamental unit of "molecules", a count very much like "bits". To precisely make the count in terms of molecules, the molecules would be counted in terms of the average kinetic energy. A slower-than-average molecule would get a count a little less than 1, a faster molecule would get > 1. The result would be converted to base 2 to make the count in bits. So I'm making a bit-wise count of physical entropy. I don't know how to get absolute entropy, but it should work for entropy changes, i.e. ΔS=ΔH*ln(2), where H is counted before and after as described, and ln(2) is just a log base conversion. This is the converse of what Szilard said in 1929, 1 bit = kB*ln(2). But my method of counting bits obviates the need for kB and shows the ln(2) to be a log base conversion factor instead of being explained as 2 required states. Ywaz (talk) 21:49, 21 February 2015 (UTC)[reply]

About negative entropy[edit]

    About negative entropy Erwin Schrödinger gave opinions in his book "What is Life" from 1944, recent edition 2004 Cambridge University Press pages 72 and 73, derived from work of Boltzmann and Gibbs. It begins with a familiar residual from the third law of thermodynamics derived from statistical mechanics.
  entropy = k * log D 

defined for natural logarithm base e, Boltzmann constant k and, disorder of the system D.

Then Schrödinger defined order of the system as (1/D) and gave a calculation for negative entropy.

-(entropy) = k * log (1/D)

In modern terms the first D has become Ω the number of possible disordered states that can be randomly filled or emptied of energy. The order of the system (1/D) can be expressed as N the number of states that are prevented from being randomly filled or emptied of energy. Then net entropy is given.

  S = k * Ln (Ω/N)

Examples of physical objects that obey this rule are electronic diodes producing dark currrent from heat or vibration, and parabolic reflectors focusing radiant energy to a hot spot. Radiant energy is always present because of the microwave background in space

When the number of non random states is larger than the number of random states, locally negative entropy can occur like in the examples which are not in violation of the second law of thermodynamics because of the non random construction. The examples can be easily verified, although the power generated is small in both cases. Random and non random states can also be represented as degrees of freedom in a set of variables.

Unlike the claims of Szilard, the method of Schrödinger does not require intelligent intervention, although the information is essential in both cases. Not just any information is sufficient to change entropy. The information must be linked to a counting method of non random action on the system. Astrojed (talk) 21:51, 28 June 2015 (UTC)[reply]

Landauer, Szillard, Brilliuon[edit]

Its very confusing to figure out what Landauer said that Brilliuon did not already say, or that Szilard said even earlier. So we have this sentence: In 1953, Brillouin derived a general equation[4] stating that the changing of an information bit value requires at least kT ln(2) energy. Leon Brillouin (1953), "The negentropy principle of information", J. Applied Physics 24, 1152-1163. which suggests that the claim that Landauer said it first is incorrect. Perhaps Landauer indpependelty obtained this result, as stated it in a way that engineers could undrstand, in an engineering journal? Certainly, very few comp-sci researchers would ever read Briliuon... or is it something else? 67.198.37.16 (talk) 21:18, 23 September 2015 (UTC)[reply]

Boltzmann's constant[edit]

Yes, Boltzmann's constant is expressed in units of energy per unit temperature, and the choice of temperature scale affects its particular value. As with any universal constant, various scales of measurement can be chosen such that its value is unity. For a light beam, Δx=cΔt, but setting c=1 does not reveal that space and time are identical (Introducing relativity concepts here obscures the point, so perhaps its not the best analogy). Given that we are dealing with thermodynamics, thermodynamic entropy and information entropy are perfect measures of each other, but they are not identical, just as, given that we are dealing with a light beam, Δx and Δt are perfect measures of each other, but they are not identical. PAR (talk) 22:08, 26 March 2016 (UTC)[reply]

Pseudoscience[edit]

The Szilard Engine is perpetual motion, and the linked Arxiv article claims that it has actually been built. This is obviously a fraud, and calls into question the entire theory. — Preceding unsigned comment added by 107.77.194.190 (talk) 03:21, 3 October 2016 (UTC)[reply]

No, it is not a perpetual motion machine. It takes energy to reset at every stage. See: https://www.scientificamerican.com/article/the-fundamental-physical-limits-of-computation/ 173.75.1.12 (talk) 02:26, 10 February 2017 (UTC)[reply]

"Criticism" section looks like utter nonsense[edit]

I can't put my finger on what's wrong with it. As a mathematical abstraction (which every physical model is) there is no material difference between information theoretic entropy and stat-mech entropy. Information theory doesn't care what constant you scale entropy by: That's why the distinction between nats and bits as units of entropy isn't important in information theory. As far as dimension theory goes, dimensions (like that of Boltzmann's constant) are just a heuristic to check if your equations are put together properly. You can't make a physical argument from it. I'm very close to deleting this section. --Svennik (talk) 17:49, 21 December 2021 (UTC)[reply]

Deleting now. Feel free to undo and explain. --Svennik (talk) 20:58, 21 December 2021 (UTC)[reply]
Good. I agree. Finally. kbrose (talk) 02:11, 22 December 2021 (UTC)[reply]

Entropy of What?[edit]

In information theory entropy is a property of a random variable, where such random variable holds a randomly chosen message, such as the most recent message to arrive on the channel. Messages themselves have information, not entropy. The article does not say what it is that has the property of entropy. 88.127.1.197 (talk) 21:49, 13 December 2023 (UTC)[reply]

I think it's the entropy of the microstate – which is unknown – given the macrostate – which is known. In a Bayesian framework, ignorance can be modelled using a probability distribution, which gives you the random variable you asked for. This answers your question. I don't know how easy it is to make this completely rigorous. --Svennik (talk) 15:55, 19 December 2023 (UTC)[reply]