Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
It took me 10 years to understand entropy (cantorsparadise.com)
398 points by dil8 on April 28, 2022 | hide | past | favorite | 283 comments


I don't understand entropy and this article did not change it. The issue I take is with the definition of "the most likely state".

Think of a series of random bits that can be either 0 or 1 with equal probability. How likely is it that they are all 0 or all 1? Not very likely. There is exactly one configuration. How likely is it that they have a specific configuration of 0 and 1? Equally likely. All states are equally likely. If you randomly flip bits you go from one state to another state but each one is equally likely to occur. There is no special meaning to a specific configuration if you don't give it one.

If you look at the average of all bits you start grouping all states together with the equal number of 1s. If you talk about the average there is only one configuration that is all 1s but most configurations have roughly 50% 1s. If you now start flipping bits you will meander through all possible bit-states but the average will most likely be close to 50% 1s most of the time.

In physics we usually look at averages such as the average velocity expressed as temperature. Therefore it makes sense to group together all states using the average and then the states with very low or very high averages are few.

But if you look deeper than that averaging it stops making sense to me. It's a completely different world. I don't know what Entropy is supposed to mean on the level of individual states/configurations. I don't understand what kind of macroscopic "averaging" function we may use to group up those states. There could be more than one possibility - from that would follow that there is more than one definition of macro-Entropy. Ideally there should be one general definition of how we have to look at those microstates and from that follows our general definition of Entropy. Sadly I didn't study Physics and this topic still continues to confuse me. The usual explanations fail to enlighten me.


> But if you look deeper than that averaging it stops making sense to me. It's a completely different world.

I think you're less confused than you think you are!

As I posted elsewhere, it helps to think of entropy as a quantity that actually depends on how much you know about the system in question.

Typically when you calculate the entropy of a system at temperature X, that means all you know is that you stuck a thermometer in it and measured X. You don't know anything more than the average temperature. It could be in any state consistent with that temperature.

If you know more about the system, it has less entropy. If you know it down to the exact microstate, it has zero entropy.


This is how I have come to understand entropy. The words disorder and order are a proxy for information content.

> If you know more about the system, it has less entropy.

One question though. When you say "it" does it include you as well as the system or just the system? To me "it" includes both because by it is "you" who's state has changed by acquiring more information. It could be in the form of neuronal rearrangement or bits being stored in some digital media etc., A new information content has thus been created.

There's an interesting side effect if one thinks deep enough here. The system will keep changing its state so the information one is out of date thus leading to more disorder (i.e., information loss) and increased entropy. One can keep the information updated but it takes energy. And I read somewhere that the energy thus used will lead to increase in overall entropy of the universe and thus the 2nd law.


>This is how I have come to understand entropy. The words disorder and order are a proxy for information content

Does information content mean this? ... "How many bits of random-number generator would I need to make the number of micro-states in the macro-state?"


That is my mental model, yes. More bits are needed to capture more detailed (or micro-states as you called it elsewhere in this thread, or finer-grained) information.

Let's say there's a stone, we want to know its details. If all we want to know is whether it weighs more than 100KG or not then one bit will do. 1 means > 100KG and 0 means < 100KG. If we want to know its colour (as one of 7 WIBGYOR) as well then we need 4 bits; 3 bits to encode 7 colours and 1 bit to encode yes/no for the weight. And so on..as we gain more and more information we need more bits to store that.

This is just for the storage though; in order to gain the information we need to expend energy. More information requires more energy leading to more disorder as expending energy releases heat and thus 2nd order of thermodynamics as well as arrow of time. IMO our perception of time is purely based on memory which is information content of event stored in Neurons.

Quite a bit of hand-wavy. But this is a mental model I've developed over the years of thinking and reading (and listening to lectures) about entropy, information, arrow of time, and energy and how they are interconnected.


The article does say that some crystalline structures can have more entropy (information) than their fluid state. How could that be? Any ideas on what that fluid state might be? The information content in a crystal is really low.


Unfortunately the author doesn't explain it beyond sharing a reference to this paper[1] which is way beyond my competence.

[1] https://www.nature.com/articles/nature08641


Beyond me too, but I'm going to assume that this is sort of an edge case where fluid crystalline structure can have less entropy than the static version (sort of sounds like the laminar flow state has less complex structure than its packed solid state). I doubt it contradicts your description above (which is similar to my understanding as well).


Entropy (differences) are an objective quantity which can be measured, there is no subjectivity about it. It is not which parameters you know it is about which parameters you hold fixed.


Fascinating discussion. I see some parallel here to the Bayesian vs Frequentist view of probability.

They are perhaps both valid points of view depending on the situation.

If you take a frequentist view of an unbiased coin, then the probability that it will land heads on the next flip is objectively, by definition 50%. So the resulting calculation of entropy (log 50% = 1 bit) is also objectively defined. But if your 50% probability represents a subjective belief, the resulting entropy calculation should also be considered subjective, I would think.


Leonard Susskind disagrees with you. See his lectures on statistical mechanics, he is very clear that entropy is a matter of knowledge about the system. It has to be.


Susskind says that entropy is determined by selecting a macro-state. He doesn't claim that the entropy of a macro-state depends on whether we know which macro-state the real system is really in.

If we happen to know, then, sure. For example we could pick a weird-ass observable state, and when we saw it we would know the entropy of the system was low. But the entropy of each macro-state just depends on how many micro-states we define it to contain. It doesn't depend on our knowledge of the system state.


The concept of entropy wasn't invented so that we could calculate entropies of macrostates, it was so that we could calculate entropies of real systems and understand their behaviour. Macrostates are an accounting tool that helps us do this. You seem to be treating the calculation of macrostate entropy as an end-goal in itself, but also allowing yourself to somehow freely choose any macrostate you want. When it comes to applying thermodynamics in practice, you'll have to calculate the entropy of a real, or at least hypothetical, system.

The point of macrostates is that you ought to know which macrostate a given system is in. That's the thing that you know. You don't know which microstate it's in, but you do know which macrostate it's in.

For example, if I say "a cubic metre volume of air at room temperature and pressure", I've described a physical system. I've also described a macrostate.

If you're calculating the entropy of macrostates that are not consistent with a description of a system -- if you've defined your macrostates such that you don't know which macrostate a given system is in -- then in order to calculate that system's entropy you have to sum up over all such possible macrostates anyway, so you haven't saved yourself any work or earned any insights along the way.

So yes, you can calculate the entropy of a macrostate without knowing what macrostate a real system is in, but it kind of sounds like you're arguing that log(x) is not a function of variable x, because log(3) is a constant and log(4) is a constant, and you can divide up any x into constants of your choice.


We seem to be stuck in a loop of explaining basic first-year statistical mechanics back and forth to each other repeatedly. I'm not sure why.

I'm making a pedagogical point. The OP addresses how difficult entropy is to understand. I'm responding to that. We don't need to talk about "knowledge" when you define entropy, or in an initial explanation of entropy. We could, but we could decide not to.

The log(x) example is a good one. First-time students who are learning about logarithms don't need to be told that a logarithm depends on 'knowledge' or on 'information.' It's ok to just tell them how logarithm is defined.

Sure, there is information. I'm saying it's confusing and unnecessary to introduce more big ideas like information, when the topic is "entropy is difficult to understand" or "logarithms are difficult to understand."


Alright. I agree we seem to be stuck in a weird loop where we agree about all the observable facts but somehow are on different wavelengths in spite of that.

And I totally agree, entropy is a property of a macrostate. The information step comes inseparably in when you go from a system description to a macrostate. And you might just shuffle the confusion from not knowing what entropy is to not knowing what a macrostate is.

If you think it's clearer to teach students by explaining that the entropy of a macrostate is an objective property of that macrostate, that's fine. Just don't leave them believing that the entropy of a brick is an objective property of that brick.


Why shd the entropy of a brick not be an objective property (apart from a constant). I mean you can measure it, isnt it basically the integral of C/T dt?


That would be a circular calculation, because the foundational definition of temperature is rooted in entropy: T = 1/(dS/dQ).

You can measure the temperature of a brick with a thermometer, but then you should understand what that thermometer is really telling you: https://bayes.wustl.edu/etj/articles/theory.1.pdf

And when you do that, you'll see why a thermometer doesn't do a good job of measuring the effective temperature of, say, a pumped laser crystal.


I think you might enjoy cosma shalizi's paper "What is a macrostate?" https://arxiv.org/abs/cond-mat/0303625


I've been trying to reconcile these perspectives, and I think it really is both. And they are both physically relevant.

Consider the subjective entropy perspective. If you know the exact microstate of a system, then you can in theory play the part of Maxwell's demon. You could have a little gate that you open only for fast particles, and using your knowledge of the microstate, you can predict exactly when they will arrive.

But consider the objective perspective. If you take this very same system and put it in thermal contact with another system, then an objective entropy perspective is the relevant one. Those systems will equilibrize and your subjective knowledge is irrelevant to that process.

I haven't fully wrapped my head around it yet, but I do think that acknowledging both is a step in the right direction at least.


> If you take this very same system and put it in thermal contact with another system, then an objective entropy perspective is the relevant one. Those systems will equilibrize and your subjective knowledge is irrelevant to that process.

The subjective view handles this scenario just fine, though, and makes more accurate predictions than the objective view.

For example, there are systems where some aspects of the original microstate survive thermal contact with another system. We use such systems to store data! I bet your hard drive is in thermal contact with its environment right now! It's very hard to reconcile this with an objective take on entropy.

And there are some systems that will rapidly be scrambled. The subjective perspective has no problem admitting that your knowledge of a system can become inaccurate and useless. Even without thermal contact, you'd need to perform a tremendous amount of (perhaps reversible) computation in order to make a functioning Maxwell's demon with your initial microstate conditions, because the microstate will evolve in time in a complicated way. The subjective view is still totally consistent with entropy of a system increasing over time!


I wrote two replies to this that I both deleted. Then I had a good long thunk, and here's what I came up with.

The temperature of an object can be determined through 1/T = dS / dE. What is this S? How can it exist if you know the system perfectly? And here is where the great insight comes. The thermometer! You apply a thermometer to a system you perturb it! The system may have started in one particular microstate, but the very nature of thermal contact involves random influence. Those random tiny influences from the thermometer allow the object (harddrive in our case) to enter a bunch of microstates with certain probabilities. And that's what S measures.

So our subjective knowledge does actually not matter. (Classically speaking) the system is in a particular microstate we may know it or not, and it still manages to have a temperature. That is due to the states it could hypothetically enter (but haven't yet)!

If we think back to the harddrive and it's contents: Very gently touching a harddrive with a thermometer while not scramble its contents. So we may say that microstates corresponding to different files than the ones you put there are actually not accessible. And they don't contribute to the entropy we used for the temperature.


No, it is subjective. We just only have such blunt instruments for practically measuring states, relative to the gargantuan amount of entropy in most real systems, that the subjective nature of entropy is easy to miss. But in a world where the frontiers of thermodynamics have moved from steam engines to lasers, computers, DNA, and black holes, the difference is increasingly obvious and important.

With steam engines, we got away with treating a volume of gas as having not only a few parameters that we knew and cared about, like mass, temperature and pressure, but we could further deceive ourselves into thinking that those were the only parameters that existed to describe the system. The only parameters that were knowable. But Boltzmann knew better.

Look at Boltzmann's formula, S = kB log W.

For any single particular system you describe to me, W will be 1, and so S will be 0. So it's only if you describe an ensemble of systems -- that is, if you describe a system vaguely, such that I am left to imagine the details -- that we have nonzero entropy. If you ask me to calculate the entropy of that "system", that macrostate, that ensemble, then sure, I'll end up with nonzero entropy. But if I ask you to keep transmitting more data about the scenario, then with each further description, you'll be narrowing the state space and thereby decreasing the entropy.

Look, since the entropy of a macrostate is nonzero, but the entropy of any single microstate which is consistent with that macrostate is zero, it's clear that entropy is not an intrinsic property of any real system. It's a property of how many other possible non-existent systems could be swapped out for the one in front of you, without you noticing the change.

If I swap out the air in your room for an equal volume of air at equal temperature and pressure, you probably won't notice.

If I swap out the hard drive in your laptop for an equal volume of hard drive at equal temperature and pressure, you probably will!


Maybe better to say that the universe does not appear to pick out a single coarse-graining or fine-graining procedure for practically any system.

For instance, following your Boltzmannian example, I think one would notice swapping 1 µm³ of the r/w head and 1 µm³ of the recording surface of a new, freshly powered on HDD more than one would notice substituting the entire HDD for a new one of the same model and turning that on. And here I am already using units of length (cf. "equal volume"), and we know neither units nor lengths are generally picked out by the universe.


Very few people know this but.

Information entropy and statistical mechanical entropy are two different things.

They share the same equation and the same name but they are two unrelated concepts. You have conflated the two. The person you are responding to is referring to statistical entropy.

Basically in this entire thread nobody, including you, is fully grasping the situation.


They are not at all unrelated. It is not easy to grasp, so I understand the confusion. https://en.m.wikipedia.org/wiki/Landauer%27s_principle

Fun rabbit hole would start with classic paper by jaynes

Many more recent examples relating bit erasure costs of computation. Some names to look up if interested include charlie Bennet,Dave wolpert, James crutchfield, Susanne still, for starters.

Edit -- a collection of ideas related to this problem and mixing in "complexity" can be found in SFI proceedings called "Complexity, entropy, and the physics of information"


I respectfully disagree. Perhaps you'd like to present more than a mere assertion to make your case. I did.

If it helps, here's a paper that explains my stance in more detail. https://bayes.wustl.edu/etj/articles/theory.1.pdf


If you think there is no relation between the different things called entropy apart from the name maybe you're not fully grasping the situation either.


U, the internal energy is objective.

The free energy F = U - TS is the maximum amount of work you can extract from the system. This depends on how much you know about the system. S does indeed depend on what you know about the system.

See the Gibbs Paradox for more information.


If two people disagree on the maximum amount of work that could be extracted from a given system (with both of them basing their figure on their own evaluation of S), are there any cases where it would it be impossible to empirically demonstrate that at least the proponent of the lower figure was wrong?

If only changes in S (and F) have measurable consequences, would that not merely mean that assigning an absolute value is an arbitrary choice, which would not mean the same as it being subjective (there could still be an objective conversion between one basis and another, as there is for kinetic energy in different inertial reference frames.)

In the Gibbs Paradox, there is no subjectivity in whether the gases being mixed are the same or different, and no subjectivity in what the change of entropy is in either case. The paradox is that it does not feel right that identity makes an objective difference between the two cases, but the empirically-demonstrable distinction between fermions and bosons shows that this intuition does not hold in general. I believe Von Neumann came up with a QM resolution of the paradox.


> are there any cases where it would it be impossible to empirically demonstrate that at least the proponent of the lower figure was wrong?

Isn't it more interesting to examine a situation where it would be possible to empirically demonstrate that the proponent of the lower figure was wrong?


In that case we could objectively say that one value of S does not yield F for that system (given that F is defined as a maximum), but this would not resolve the general question of subjectivity.


If that doesn't, I'm not sure what would. Maybe it would help if I taboo the word "subjective". Are you familiar with Maxwell's demon? Let's set up a variation of that experiment.

I have a partitioned box full of air at room temperature and pressure in both partitions. There's a frictionless door that can be open and closed by an ultrafast servomechanism. The servo is connected to a computer which will read a very long bitstring from a magnetic hard-drive platter at a high frequency and open the door when the bit is '1' and close it when the bit is '0'.

Admittedly, this mechanism would be hard to construct in practice, but I hope it's clear enough as a thought experiment.

Now if you're familiar with Maxwell's demon, you'll agree that there are particular, albeit rare, joint configurations of gas microstate and hard drive bitstring, such that after the servo has finished its last motion, the gas will have been separated into hot and cold on either side of the partition. This temperature difference can be used to extract work.

For each possible bitstring on the drive, there are certain corresponding microstates of gas that will maximize the free energy extracted by this process.

And for each possible microstate of the gas, there are certain corresponding bitstrings that will maximize the free energy extracted by this process.

(For the vast majority of other combinations of hard drive bitstring and gas microstate, the operation will have no effect).

The claim "entropy is subjective" is basically just an acknowledgement that the energy extractable from the gas is dependent on both the state of the gas itself and also the data written to the hard drive. It means that two experimenters, tasked with writing the initial data on the hard drive to extract as much work as possible from the gas, will have different levels of success depending on whether they know the particular microstate of the gas (and can thus select the corresponding optimal bitstring) or if they don't know the microstate of the gas beyond "a box at room temperature and pressure", and have to guess a bitstring based on only that. And when the operation is successful, we can describe the data on the hard drive as "information about the gas microstate that was used to extract work".

This experiment, of course, is so impractical that it sounds ridiculous. But we can make a more controlled version of it on the small scale, with excited trapped atoms, and actually make it work.


> If that doesn't, I'm not sure what would.

It is not clear to me in which direction you think the question would be resolved, given a situation in which it would be possible to empirically demonstrate that the proponent of the lower figure (for S) was wrong. Maybe that is because I do not see the connection between your thought experiment and this issue: neither of the candidate values for S entail a particular distribution of states, let alone a particular sequence of future times when a molecule will approach the gate in a particular direction.

As I see it, your experiment is a difficult-to-perform way to demonstrate that, due to the inherent randomness of thermal processes, the entropy of a closed system may decrease when the conditions are right. This is explicitly covered in the article (see also "Monkeys typing Hamlet.")

Furthermore, in the case where the microstates of the system are measured in detail and the arrival times and velocities of the molecules at the gate are computed, one must add in the change in entropy resulting from those measurements and calculations. I am pretty sure this has been done, and is in accordance with the 2nd. law.

Your definition of 'subjective' in your penultimate paragraph is contrary to both common usage and what is being discussed in this thicket of threads, and appears to be closer to 'stochastic'. The outcome of the spin of a roulette wheel does not become subjective when different gamblers place different bets on it, or even when someone who has recorded statistics for its outcomes is able to place better-than-random bets.


I think he is confusing the usage of entropy in physics and computer science. In computer science entropy is conditional probability and depends on what we know about a system.


As it does in physics!

"which parameters [thermodynamic variables] you know" ~ "which parameters [thermodynamic variables] you hold fixed"

(or know in average, like the energy for a system in a heat bath where the temperature is fixed)

https://bayes.wustl.edu/etj/articles/theory.1.pdf

http://nicf.net/articles/thermodynamics-statistical-mechanic...


You can observe the movement of molecules beyond macro properties like temperature.


Sure, at least in principle. And if you knew what every molecule was doing the entropy would vanish.


I've thought that for a while, but I'm not a physicist. Do you know any prominent physicists that hold that view? It seems to contradict at least the popular narrative about entropy as a property of a system.


> Do you know any prominent physicists that hold that view?

The view that the entropy of a microstate (i.e. a perfectly defined physical state) is zero?

All of them, hopefully.


I only have an undergraduate degree in physics, but I think the point you may be missing is that information requires some physical medium to store it.

So entropy is very roughly speaking the property of the system that determines how big a hard drive you need to store a description of that system.


No this is completely and utterly wrong. Entropy is not a function of knowledge.

Two people with varying and different levels of knowledge of a system does not mean the system has two different entropy values. Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

Entropy does rely on what your picked configuration of macro states and microstates. Temperature is an arbitrary choice of macrostate.


> Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

It actually does! You would disagree with the other person about the temperature of that water. But I agree that this is admittedly not obvious at first.


No it does not. The thermometer does not change based off of my knowledge or opinion.


A thermometer doesn't measure temperature any better than a meterstick measures length. And we all know what Einstein had to say about the relativity of metersticks.

To paraphrase from the paper I linked in another reply to you, a thermometer is just a heat bath equipped with a pointer which reads its average energy, whose scale is calibrated to give the temperature T, defined by 1/T = dS/d<E>.

You can read the thermometer if you like, but if you know the exact microstate of the water to begin with, the thermometer reading will tell you much less than you already knew about the water. And precise knowledge of the water's microstate will (theoretically) allow you to extract much more work from that water than you would be able to with only the thermometer reading.


But entropy does not change with this knowledge.


You seem pretty convinced. Let me see if you're talking about the same pedantic distinction that oh_my_goodness was.

A: "an urn containing either a white ball or a black ball".

B: "I notice that the ball in the urn A is white".

I would say that initially the entropy of our ball-urn system is 1 bit, and that with observation B, we have reduced the entropy of our ball-urn system to 0 bits.

But if you are going to take the view that even knowing the ball in this particular urn is actually white doesn't change the fact that the entropy of <<"an urn containing either a white ball or a black ball">> is 1 bit and not 0, and that that's the entropy that we're discussing, then I won't argue about it any further.

Except for the obligatory xkcd: https://xkcd.com/221/


No I'm not saying that.

The entropy of the system was always 0 bits. Knowledge is irrelevant.

If the urn actually contained nothing and would materialize a black or a white ball randomly then this can occur with or without your knowledge. When the ball materializes and nothing more can be done THEN the entropy has changed. Because there's no more possible microstates.

You not having knowledge about microstates DOES NOT change available microstates. You seem to think that if you don't know about something, anything goes.

You're really arguing abstract philosophy. Did a tree in the forest fall if no one was around to see it? Yes it did dude. Your knowledge of it has nothing to do with whether it fell. Same with entropy. And if you deny the fact that a tree in the forest never fell, then you're the one going off onto a pedantic tangent.


I'm surprised to hear that "the entropy of the system was always 0 bits."

Let's say that the urn contains a ball that changes its color from black to white and viceversa every thousand years (relative to January 1, 1970 midnight UTC).

Given that "macrostate" there are two possible and equally probable "microstates". The entropy is positive. If I look into the urn and find that the ball is white was the entropy of the system always zero? Or is it always positive in this example?


Positive always based on the choice of our macrostate.


Ok, that’s at least more coherent with the others things that you wrote.


Don't appreciate that comment at all. Rude.

What I'm seeing in your other reply is actually you not even reading my reply. Your making statements on things I already touched upon.

It makes you seem not intelligent. But that would be a rude thing to say would it? There's no point. If you want to have a discussion probably smart to say things that will keep the other person engaged rather then pissed off.

This is actually bad enough where I demand an apology. Say your sorry, genuinely, or this thread won't continue and you're just admitting your mean.


I'm sorry, I really didn't intend any offense.

I really mean what I wrote: that answer is consistent with other things that you wrote indicating that you view the macrostate as a theoretical collection of microstates which is defined under some assumptions which may be dettached from what is known about the state of the system.

So you think that it's still somehow meaningful to refer to the macrostate "the ball may be black or white" and the associated entropy even if the color of the ball is known.

The coherence is in discarding the knowledge of the microstate / the future outcomes of the die / the color of the ball and claiming the original models are still valid (which they may be for some purposes - when that additional knowledge is irrelevant - but not for others).

If you had said "the macrostate I originally chose becomes meaningless because I know the microstate and the entropy is zero now [or was zero all along, as in your previous comment]" it would have been less coherent with the rest of your discourse - and it would have merited a longer reply.


>So you think that it's still somehow meaningful to refer to the macrostate "the ball may be black or white" and the associated entropy even if the color of the ball is known.

Yes. A probability distribution can still be derived from a series of events EVEN when the outcome of those exact events are known prior to the actual occurrence.

I believe this is the frequentist view point as opposed to the bayesian viewpoint.

The argument comes down to this as entropy is really just a derivation from probability and the law of large numbers.


I would say that - in this particular example - knowing that the ball is white and will remain white for the next 950 years and will then be black for the next 1000 years, etc. makes the macrostate "the ball may be black or white" irrelevant.

However, I agree that one can still define the macrostate as if the color was unkown - and make calculations from it. (It's just that I don't see the point - it doesn't seem a good or useful description of the system.)


Probability is invariant to the arrow of time.

If I can look at a probability distribution from data collected from the past and generate a probability. Why can't I look to the future and do the same thing? Even with knowledge of the future you can still count each event and build up a probability from future data.

Think of it this way. Nobody looks at a past event and says that the probability of that past event was 100%. Same with a future event that you already know is going to occur. The probability number is communicating frequency of occurrence along along a large sample size. Probability is ALSO independent of knowledge from the frequentist viewpoint. Entropy is based off of probability so it is based off this concept. Knowledge of a system doesn't suddenly reduce entropy because of this.


That's not the only defition of probability - and it's a very limiting one.

Statistical mechanics is based on the probability of the physical state now and not on the frequency of physical states over time. Sometimes we can assume an hypothetical infinite time evolution and use averages over that but a) it's just a way of arriving at the averages over ensembles that are really the object of interest, b) the theoretical justification is controversial and in some cases invalid, and c) doesn't make sense at all in non-equilibrium statistical mechanics.

Don't be offended, but "Nobody looks at a future event that he already knows is going to occur and says that the probability of that future event is 100%" is a very strange thing to say. Everybody does that! Ask astronomers what's the probability that there is a total solar eclipse in 2024 and they answer 100%, for example.


>Don't be offended, but "Nobody looks at a future event that he already knows is going to occur and says that the probability of that future event is 100%" is a very strange thing to say.

Not offended by your words but I am offended by the way you think. Maybe rather then assuming I'm wrong and your superior, why don't you just ask questions and dig deeper into what I'm talking about.

Probability is a mathematical concept separate from reality. Like any other mathematical field it has a series of axioms and the theorems built off the axioms are independent of our typical usage of it in applied applications.

Just because it's rare for probability to be used on future events that are already known doesn't mean the math doesn't work. We tend to use applied math, specifically probability, to predict events that haven't occurred yet but the actual math behind probability is time invariant. It can be applied to events that have ZERO concept of time. See Ulam spirals. In Ulam spirals prime numbers have higher probability in appearing at certain coordinates. This probability is determined independent of time. WE have deterministic algorithms for calculating ALL primes. Yet we can still write out a probability distribution. Probability still has meaning EVEN when the output is already known.

That means I can look at a series of known past events and calculate a probability distribution from there. I can also look at a series of known future events and do the same thing. I can also look at events not involving time like where prime numbers appear on a spiral and calculate a distribution. Just look at the math. All you need are events.

English and traditional intuitions around probability are distractions from the actual logic. You used an english sentence to help solidify your point but obviously our arguments are way past surface intuitions and typical applied applications of probability.

Look up frequentist and bayesian interpretations of probability. That is the root of our argument. You are arguing for the bayesian side, I am arguing for frequentist.


I'm aware that there are different interpretations of probability. I said as much in the first line of my previous message!

You may be content with an interpretation restricted to talking about frequencies. I prefer a more general interpretation which can also - but not exclusively - refer to frequencies.

Even from a frequentist point of view I find perplexing your suggestion that nobody says that the probability of something is 100% when they are able to predict the outcome with certainty.

Probability may be a mathematical concept separate from reality but when it's applied to say things about the real world not all probability statements are equally good - just like the "moon made of cheese" model is not as good as any other even if it's mathematically flawless. This has nothing to do with Bayesian vs frequentist, by the way, empirical frequencies are not mathematical concepts separated from reality.

It's a perfectly frequentist thing to do to compare a sequence of probabilistic predictions to the realised outcomes to see how well-calibrated they are.

The astronomer that predicts P(total solar eclipse in 2023)=0%, P(t.s.e. 2024)=100%, P(t.s.e. 2025)=0%, P(t.s.e. 2026)=100%, etc. will score better than one who predicts P(t.s.e. 2023)=P(t.s.e. 2024)=P(t.s.e. 2025)=P(t.s.e. 2026)=2/3 or whatever is the long run frequency.

A weather forecaster that looks at satellite images will score better than one that predicts every day the average global rainfall. Being a frequentist doesn't prevent you from trying to do as well as you can.

I did already agree that you _can_ keep your model where the millenary-change ball may be either white or black and make calculations from it. (It just doesn't seem to me a good or useful description of that system once the precise state is known. You _can_ also change the model when you have more information and the updated model is objectively better. I think we will agree that from frequentist point of view predicting a white ball with 100% probability and getting it right every time is more accurate than a series of 50%/50% predictions. And the refined model can calculate the loooooong-term frequency of colours just as well.)


>It's a perfectly frequentist thing to do to compare a sequence of probabilistic predictions to the realised outcomes to see how well-calibrated they are.

Yeah but it's a philosophical point of view. The bayesian sees this calibration process as the probability changing with more knowledge. The frequentist sees it as exactly you described a calibration... a correction on what was previously a more wrong probability.

>A weather forecaster that looks at satellite images will score better than one that predicts every day the average global rainfall. Being a frequentist doesn't prevent you from trying to do as well as you can.

Look at this way. Let's say I know that the next 100 days there will be sunny weather every day except for the 40th day, the 23rd day, the 12th day, the 16th day, the 67th day, the 21st day, the 19th day, the 98th day, and the 20th day. On those days there will be rain.

Is there any practicality if I say in the next 100 days there's a 9% chance of rain? I'd rather summarize that fact then regurgitate that mouthful. The statement and usage of the worse model is still relevant and has semantic meaning.

This is a example of deliberately choosing a model that is objectively worse then the previous one but it is chosen because it is more practical. In the same way we use approximate models, entropy is the same thing.

Personally I think this is irrelevant to the argument. I bring it up because associating the mathematical definitions of these concepts with practical daily intuition seems to be help your understanding.


> Even if I knew the exact position of all atoms in a cup of water, the temperature of that water does not change due to that knowledge.

If you knew the exact position of all atoms in a cup of water you wouldn't assign any temperature to it. Not a thermodynamic temperature at least.


The number of microstates does not change, even if you KNOW the the cup of water is in a specific microstate.

The boltzman equation is based on total accessible microstates.


"accessible" means something only given a set of constraints.

Like the temperature, if you keep the temperature of the water fixed. And the number of molecules if instead of a cup you have a close container to prevent it from evaporating. Then what you have is water at some temperature that you control. And you could have the water at a different temperature with exactly the same microstate.

Or imagine gas at some fixed temperature within a cylinder with one movable wall. If you knew the location of every molecule of the gas it wouldn't make sense to talk about its pressure - you could compress it (reducing the number of accessible microstates) without doing any work.

Edit: In summary, thermodynamics loses its meaning if you know the microstate and can act on that knowledge.


>it wouldn't make sense to talk about its pressure -

If I have a pressure gauge that reads the same thing regardless of my knowledge how is pressure meaningless? The tool that reads pressure gives me an accurate pressure number regardless of what I know or don't know. This number is correct.

Your argument is basically saying that the pressure gauge becomes wrong once you have more knowledge of the system. No it doesn't. The pressure gauge is still giving you a number defined as "pressure."

The gas in that cylinder is at a specific microstate within the macrostate defined as pressure.


> The pressure gauge is still giving you a number defined as "pressure."

As long as you define “pressure” as “the reading of the manometer” and not as “the variable that together with temperature specifies the state of the gas and measures the quantity of energy required to compress it further”.

Thermodynamics is based on state variables giving a complete description of the system. Statistical mechanics is based on looking at the ensemble of microscopic descriptions possible given what is known about the system and their probabilities.

If all you know is a handful of thermodynamic variables that ensemble is huge. If you know already the microscopic description of the physical system your ensemble has one single possible configuration in it.

As in jbay808’s xkcd example, if you have a random number generator and you know the sequence of numbers that will be generated, do you have a random number generator? The random number generator is still giving you a number defined as “random”, right?

I guess that it’s still random if you “forget” that you know it in advance and that the macrostate is still meaningful as a complete description of the physical system if you “forget” that you have a perfect knowledge of its state.

Edit: the GPS receiver in my phone is giving me some coordinates defined as “position” that happen to be in the middle of the road. However, I know precisely where I am. Don’t you think that the meaning of that “position” is somehow affected by this additional information?


>Edit: the GPS receiver in my phone is giving me some coordinates defined as “position” that happen to be in the middle of the road. However, I know precisely where I am. Don’t you think that the meaning of that “position” is somehow affected by this additional information?

No it is not affected by it. The meaning of position is never changed. Your knowledge of your position can change, but your actual position exists regardless of your knowledge or inaccuracies of your tools.

>As in jbay808’s xkcd example, if you have a random number generator and you know the sequence of numbers that will be generated, do you have a random number generator? The random number generator is still giving you a number defined as “random”, right?

Random number generators are a rabbit hole. There's not even a proper mathematical definition for it. We're not sure what a random number is... we just have an intuition for it. Case in point, the xkcd article could not define it mathematically. This is the reason why the joke exists, because we're not even truly sure what it is or if random numbers are a thing. We have intuition for what a random number is but this is likely some kind of illusion similar to the many optical illusions produced by our visual cortex. If formalization of our intuitions are not possible then there is likelihood that the intuition is not even real.

>Statistical mechanics is based on looking at the ensemble of microscopic descriptions possible given what is known about the system and their probabilities.

ok take a look at this: https://math.stackexchange.com/questions/2916887/shannon-ent...

They're talking about deriving the entropy formula for fair dice. But they talk about it as if we don't have knowledge about physics, momentum and projectile motion. We have the power to simulate the dice in a computer simulation and know the EXACT outcome of the dice. The dice is a cube and easily modeled with mathematics. So then why does the above discussion even exist? What is the point of fantasizing about dice as if we have no knowledge of how to mechanically calculate the outcome? The point is they chose a specific set of macrostates that have uniform distribution across all the outcomes. It is a choice that is independent of knowledge.


Thanks for your reply!

You didn't address the first line in my comment about the definition and meaning of "pressure" so maybe we actually agree.

To ellaborate a bit, one may define "pressure" as the reading of a device that measures its exchange of momentum with the particles of gas averaged over time. The last bit is important because those microscopic impacts are discrete events. If we know [in a classical mechanics framework] the state of every particle in the gas we can predict when they will happen - and succesfully calculate the (averaged) "pressure" measurement.

However, one may also define and interprete "pressure" as a variable that - together with volume and temperature - characterizes completely the behaviour of an ideal gas in equilibrium. But if we have a precise knowledge of the physical state we could in principle do impossible things - like compressing the gas without effort or creating a temperature gradient.

If we have a fish contaminated with mercury and the concentration of 0.01% characterizes completely its toxicity we won't eat it. If we also know that the mercury is only on the surface we won't eat it either but in principle we could if we are careful. The content of arsenic in the fish remains the same although the meaning of that number changes - but of course if we're a bear unable to clean our fish the additional information doesn't change anything at all.

> They're talking about deriving the entropy formula for fair dice. But they talk about it as if we don't have knowledge about physics, momentum and projectile motion. We have the power to simulate the dice in a computer simulation and know the EXACT outcome of the dice. The dice is a cube and easily modeled with mathematics. So then why does the above discussion even exist? What is the point of fantasizing about dice as if we have no knowledge of how to mechanically calculate the outcome? The point is they chose a specific set of macrostates that have uniform distribution across all the outcomes. It is a choice that is independent of knowledge.

I can make a model where the moon is made of cheese. That model is independent of any knowledge about the true nature of the moon. But if I visit the moon and find that - surprisingly! - it's made of lunar rock I may re-evaluate the pertinence of that model.

The model where all the outcomes of the die are equally likely it's particularly useful when all the outcomes of the die are equally likely. If you have no additional knowledge - apart from the number of outcomes - you have no reason to prefer one outcome to another. All of them are equally likely - to you. You can calculate the entropy of one event assuming that there are six equally-probable possible outcomes.

If I know exactly the future outcomes of the die - 4, 2, 5, 1, ... - I can also calculate the entropy of each event assuming that there is one single possible outcome that will happen with certainty. You have one model. I have one model. Are all models created equal? If we play some game you'll painfully realize that my model was better than yours - or at least you'll believe than I'm incredibly lucky.


All mathematical formulas representing physical phenomena are called models. Some models are more accurate then other models.

Entropy is one such model. The mathematical input parameter that goes into this model is a macrostate. We are also fully aware that the model is an approximation Just like how we're aware newtonian mechanics and probability itself is an approximation.

If you feel entropy is too vague of a description then you can choose to use another model for the system. One with billions of parameters and can record the exact state of the system. Or you can use Entropy, which has it's uses just like how classical mechanics still has uses.


Ok, we agree then. Models may or may not represent a physical reality. They may be in conflict with reality - as in "the moon made of cheese". They may be incomplete - as in "the fish is 0.01% mercury". Those inaccuracies may or may not have practical relevance. Fundamentally it makes a difference though. In principle, someone with a better model of the die can consistently win bets contradicting the predictions of the "fair die" model and someone with a better model of the gas can do things forbidden by the "entropy is a measure of the energy unavailable for doing useful work" interpretation.

To reconcile those views in the context of your first comment: "Entropy is not a function of knowledge."

Entropy is a function of the macrostate. The macrostate is defined by state variables (the constraints on the system). Those state variables represent what is known about the system. Given P1, T1 we calculate S(P1, T1). Given P2, T2 we calculate S(P2, T2). The entropy obviously change with our knowledge in the sense that if we know that the pressure is P1 and the temperature is T1 we calculate one value and if we know that the pressure is P2 and the temperature is T2 we calculate a different value. If we don't know P and T we cannot calculate _one_ "entropy value" for the system at all because the corresponding macrostate is not defined.

"Two people with varying and different levels of knowledge of a system does not mean the system has two different entropy values."

What is the “entropy value of the system”?

Imagine that the system is composed of two containers with equal volumes of an ideal gas at the same temperature and pressure that are then put together - the volume is now the sum of the volumes, the pressure and temperature don’t change.

Alice can calculate S1 and S2 and the final entropy is SA=S1+S2.

Bob knows something that Alice ignores: that it was hydrogen in one container and helium in the other. They will mix and he can calculate that in the end SB>S1+S2.

What is the “entropy value of the system”? It seems to be more a property of the description of the system than of the system itself.

I'll say more about that in a reply to https://news.ycombinator.com/item?id=31201129 (somehow I've missed that comment until now)


>What is the “entropy value of the system”? It seems to be more a property of the description of the system than of the system itself.

Yes. That is what entropy is as defined.

>If we don't know P and T we cannot calculate _one_ "entropy value" for the system at all because the corresponding macrostate is not defined.

If the input is macrostate. And you don't know the macrostate. Then you can't calculate the value. That's pretty basic and this applies for ANY model. If you don't know the input variables, you can't calculate anything. Nobody talks about mathematical models this way. This applies to everything.

I don't think you picked up on my model argument either. You seem to think you made progress on us agreeing that entropy is a "model." I'm saying every single math formula that representing physical phenomena on the face of the earth is a "model." Thus it's a pointless thing to bring up. It's like saying all mathematical formulas involve math. If entropy uniquely has a parameter called knowledge that affects it's outcome, citing properties universal to everything doesn't lend evidence to your case.

Let's "reconcile" everything:

You're implying that there is some input parameter modeled after knowledge. And that input parameter affects the outcome of the entropy calculation. I am saying no such parameter exists. Now your saying that knowledge of the input parameter itself is what your talking about. If you don't know the input parameter you can't perform the calculation.

The above is an argument for everything. ANY model on the face of the earth if you don't know the input parameters you can't derive the output. Entropy is not unique for this property and obviously by implication we're talking about how you believe entropy is uniquely relative to knowledge.

>Alice can calculate S1 and S2 and the final entropy is SA=S1+S2.

Who says you can add these two entropies together? S1 and S2. The macrostates are different and Mixing the two gases likely produces a third unique set of macrostates indpendent of the initial two.


> You seem to think you made progress on us agreeing that entropy is a "model.

I thought we had agreed that entropy is something you calculate with a model, in fact.

> You're implying that there is some input parameter modeled after knowledge.

I was trying to say that the inputs to S(...) are the things that we know because we did measure them or set their values. It seems that we agree on that because it's extremely obvious.

Hopefully we also agree that if there are other other relevant things that we know in addition to the inputs to that model we could refine our model. I fully acknowledge that we may choose to ignore the additional knowledge and keep using the old model - and it may be good enough for some uses. (We may also choose to incorporate the additional knowledge. Maybe it rules out some microstates and we could be using a smaller macroset to represent what we know about the system.)

When all we know is the macrostate, the macrostate is the most detailed description - and gives the most precise predicitions - available to us regarding the system. However, if we know more the original macrostate is no longer "complete". Because we do know - and we can predict - more precise things. There is a fundamental change from "the macrostate represents all we know and is the basis of everything we can predict" to "not the case anymore".

Which also seems obvious. Probably we agree on that as well! (Sure, it applies to everything. Anytime one ignores information one has a suboptimal model compared to the model one could have. The improved model may or may not be better for a particular purpose.)

> Who says you can add these two entropies together? S1 and S2.

Alice, who considers two equal volumes of an ideal gas at the same temperature and pressure.

> The macrostates are different

They were the same in my example. Same volume. Same temperature. Same pressure.

> and Mixing the two gases likely produces a third unique set of macrostates indpendent of the initial two

For an ideal gas doubling the volume and the number of particules (so the pressure remains the same for a fixed temperature) doubles the entropy. If you have two identical systems the total entropy doesn't change when you put together the two containers resulting in a single container twice as large with twice as many particles.

If you thought that the number of microstates - and the entropy - increases when you bring toghether two identical systems because they will mix with each other that's not correct. (Even though there are still debates about this issue 120 years later.)

https://en.wikipedia.org/wiki/Gibbs_paradox

The entropy would increase however if they are different ideal gases (it doesn't matter how different). Bob - who knows that they are different - would calculate the correct entropy.

It could be the other way. Maybe they're actually the same gas but Bob treats them as different because he isn't aware and keeps the general case. He calculates an increase in entropy due to the mixing. While for Alice, who knows that they are the same gas, the total entropy hasn't changed.

Ax Maxwell wrote: "Now, when we say that two gases are the same, we mean that we cannot separate the one from the other by any known reaction. It is not probable, but it is possible, that two gases derived from different sources but hitherto regarded to be the same, may hereafter be found to be different, and that a method be discovered for separating them by a reversible process."

If we think that the two gases are the same the entropy is 2S but if we discover later a way to tell apart one from the other the entropy is higher (there are more microstates for the same macrostate that we thought initially).


>I thought we had agreed that entropy is something you calculate with a model,

We did agree. I never said otherwise. Where are you getting this idea? I'm saying our agreement on this fact is useless. Why don't you actually fully read what I wrote.

>I was trying to say that the inputs to S(...) are the things that we know because we did measure them or set their values. It seems that we agree on that because it's extremely obvious.

I spent paragraphs remarking on this ALREADY. I get what your saying. You're not even reading what I wrote. Every mathematical model has this property you describe. It is not unique to entropy. If you don't know the parameters of even the Pythagorean theorem, then you can't calculate the length of the hypotenuse. Does this mean the pythagorean theorem depends on your knowledge of the system? Yes but kind of a pointless thing right? If this is the point your trying to make, which I highly doubt, then why are we focusing only on entropy? Because knowledge of any system is REQUIRED for every single mathematical model that exists or the model is useless.

I don't think your clear about the argument either. If your not talking about knowledge as a quantifiable input parameter then I don't think your clear about what's going on.

>I fully acknowledge that we may choose to ignore the additional knowledge and keep using the old model - and it may be good enough for some uses.

Entropy is used with full knowledge that it's an fuzzy model. It's based on probability. It doesn't matter if we "ignore" or don't know the additional properties of the model. The model doesn't incorporate that data regardless of whether that information is known or not known.

>They were the same in my example. Same volume. Same temperature. Same pressure.

No. The boltzman distribution changes with gas type as well. The models are different.

>For an ideal gas doubling the volume and the number of particules

In this case yes. But only for an ideal gas. I don't recall if you mentioned the gases were both ideal. Let me check. You did mention it. But then you mention the gases are different. Hydrogen and helium. Neither gas is technically ideal, and the quantum mechanical effects would likely influence the boltzman distribution when mixed. There are contradictions in your example that make it not clear.

>https://en.wikipedia.org/wiki/Gibbs_paradox

The article you linked explains it away. It's the choice of Macrostates, effects the entropy outcome. The article says it's subjective in the sense that it's your choice of Macrostates. The Macrostates don't change based off your knowledge. You choose the one you want.


>>I thought we had agreed that entropy is something you calculate with a model,

>We did agree. I never said otherwise. Where are you getting this idea? I'm saying our agreement on this fact is useless. Why don't you actually fully read what I wrote.

It was a minor correction. I wouldn't say that entropy is a "model". But essentialy we agree, that's what I meant. We agree that we agree!

>> I was trying to say that the inputs to S(...) are the things that we know because we did measure them or set their values. It seems that we agree on that because it's extremely obvious.

> I spent paragraphs remarking on this ALREADY.

Again, I was stressing that we had also reached a clear agreement on that point. (Except that I don't know what do you mean by me implying something about "some input parameter modeled after knowledge" if every input corresponds to knowledge and that's a pointless thing to discuss.)

> The Macrostates don't change based off your knowledge. You choose the one you want.

And if you want, you can choose a new one when your knowledge changes! One that corresponds to everything you know now about the physical state. Then you can do statistical mechanics over the ensemble of states that may be the underlying unknown physical state - with different probabilities - conditional on everything you know. In principle, at least.

[it was an interesting discussion, anyway]


>Again, I was stressing that we had also reached a clear agreement on that point.

And I'm stressing the agreement was pointless and even bringing up the fact that entropy is a model doesn't move the needle forward in any direction. You haven't responded to that. I still don't understand why you brought it up. Please explain.

I also don't understand how zero knowledge of the input parameter applies as well. This argument works for every model in existence and is not unique to entropy. Again not sure why you're bringing that up. Please explain.


I never said it was unique to entropy. (And i still don’t understand if there some meaning that escapes me in calling entropy a “model”. Are temperature and pressure also “models” or is it unique to entropy?)

If I have a system in thermal equilibrium with a heat bath and know the macrostate P,V,T I can calculate the energy of the system only as a probability distribution - it’s undefined. If I knew the state precisely I could calculate the exact energy of the system.

If I define lycanthropy as “error in the determination of energy” it’s positive given the macrostate for the system in a heatbath and zero given the microstate of the system. Of course, given the microstate one can know the energy but can also pretend that the energy is still indeterminate.

While the distribution of energies - and the whole thermodynamical model - may still be useful its meaning would change. It would no longer be the most complete description of the system that encodes what can be predicted about it. Of course if that was never the meaning for you, you’ll see no loss. But I thought we were talking about physics, not mathematics. The meaning of thermodynamics is a valid point of discussion. The interpretation of the second law remains controversial.

I think this discussion has run its course - I may no longer reply even if you do. Thank’s again, it was interesting.


If the cup of water is in a specific microstate at time t=0, and evolves over time according to deterministic equations of motion, how will it "access" other microstates that aren't along that specific trajectory in phase-space?


It can't. But you're not typically defining ONLY microstates along that trajectory as accessible. You are defining all accessible configurations according to your defined macrostate.

Knowledge of future microstates does not change what was already defined as a macrostate. The definition and the rules you used to construct a macrostate are independent to knowledge of the system.

If you gain knowledge of the system and you would like to change your macrostate, then be my guest. You can certainly do that, but "entropy" as we know it does not actually change with more knowledge unless you change the parameters according to your gained knowledge.

Think of it this way. The thermometer ALWAYS reads the same thing EVEN if you have 100% knowledge of the current microstate. You can build a new thermometer using some other mechanism to get a different reading and to take advantage of your new found knowledge... but you'd be changing the definition of your macrostate.


I think that works ok. But I think it's an unnecessarily tricky explanation. Entropy per macro-state decreases as we look at finer-grained macro-states. It feels simpler to associate the entropy of each macro-state with that macro-state, rather than assuming we know which macro-state the system is in, and then attributing the lower entropy to our knowledge of the macro-state.

I think it can probably be expressed either way. I just think the "knowledge" part is tricky and can be left out.


Can you elaborate on the difference between a "fine-grained" macrostate and a macrostate that is not fine-grained?

I think you will find it hard to separate the concept of a macrostate from the state of knowledge (or ignorance) of an individual subjective observer.


Sure. A fine-grained macro-state contains fewer micro-states. A coarse-grained macro-state contains more micro-states.

Say I flip 8 coins and I don't look at the results. A fine-grained macro state is TTTT TTTT. A coarser-grained macro state is TTTT xxxx. The one has 4 bits more entropy than the other. It works the same way in statistical mechanics. Call them spins.

We're just talking about some ensemble of micro states, and then we divide the ensemble up into macro-states. To do statistical mechanics at all, I think I have to define some macro-states according to which micro-states they contain. That doesn't mean I necessarily have any information about which macro-state the system is actually in.


> A fine-grained macro state is TTTT TTTT.

This macrostate seems fine enough to be a microstate, but sure. For the macrostates with at least one 'x' in it, that 'x' seems to be a placeholder for the concept of subjective ignorance.

> That doesn't mean I necessarily have any information about which macro-state the system is actually in.

But the entire purpose of the exercise of assigning microstates to macrostates is so that you can match up a description of some system to a microstate ensemble and calculate its entropy! Otherwise there's no point to arbitrarily labelling various groups of microstates.

To follow your example more practically, let's say you have an 8-spin system, whose net spin is zero. (You know because you've measured its overall magnetic moment or something). I've just described a system that is in one of the following possible microstates:

TTTT HHHH, TTTH HHHT, ..., HHHH TTTT

Now you can go ahead and define the macrostates as fine-grained as you want, where TTTT HHHH and HHHH TTTT are in different macrostates, but to calculate the entropy of this system, you're going to have to sum up all of those macrostates anyway to get the one that's consistent with the described system.


Good review of common ground. At this point hopefully the active folks in the discussion can see that we're all describing statistical mechanics exactly the same way.

What I'm saying is pedagogical. We need to define our macro-states. We don't need to go on and talk about our definitions being information or 'knowledge.' We could just use the definitions and calculate. We can talk about 'knowledge' but we don't need to.

The exception is when we actually have some information about what macro-state some system is really in. Obviously we then have to build that information into our model, and the entropy changes. What I'm saying is this: it's not necessary to mix that into our definition of entropy. That definition is not going to help folks who don't understand entropy, and it's unnecessary.


How is a macrostate TTTTxxxx different from having the information about the TTTT part and not about the xxxx part?

Talking bout the entropy of the macrostate TTTTxxxx makes sense only conditional on the TTTT information.


It's different because I can define a macro-state without any information about which macro-state any system is actually in. As I think you're also saying, the only information I need is information about how I've defined my own macro-states.

If we just define the macro-states, we're good to go. We don't need to talk about 'knowledge'. We can talk about 'knowledge', it's fine, but that lets in unnecessary woo.


I'm not sure I see what's the point of that distinction.

The entropy of a macrostate is a measure of the indetermination about the microstate conditional on the macrostate. If you don't want to call that 'knowledge' the substance of the matter doesn't change.

A macrostate is not an intrinsict property of a physical system. It's related to our description of the system. In general, the same microstate of the system of interest may be compatible with multiple macrostates.

Given the thermodynamical description of some system I can calculate the entropy if T=T_1 and the entropy if T=T_2 without knowing what's the actual temperature specifying the macrostate. But in the first case the calculation is conditional on the hypothetical information T=T_1 and in the second case conditional on T=T_2.


The point is pedagogical. Entropy takes a lot of time for people to understand clearly. That is the discussion from the OP.

Adding "knowlege" to the definition (or to an initial explanation) of entropy makes that learning process even more difficult. And it's unnecessary. It's better than the older talk about "disorder" but it's distracting. We can bypass 'knowledge' and come back later, with no penalty and plenty of time savings.

Apart from that single pedagogical point, we seem to be saying the same things back and forth to each other in different words. I'm not sure why.


I think that the "microstate counting" approach - if that's what you are defending - doesn't allow to understand entropy clearly because only works for the microcanonical description. It doesn't make sense to count the microstates for a volume of gas at some pressure and temperature. (Which is the standard thermodynamics problem.)

The concept of how much can we tell about the microstate given only the pressure and temperature seems quite natural and a better starting point. Boltzmann's entropy is a nice illustration but there is no reason to avoid the general concept.


> It doesn't make sense to count the microstates for a volume of gas at some pressure and temperature.

But nobody does that since the total value of entropy isn't important. What you do is count the factor difference in count of microstates between two volumes, that is what you care about, and it is easy to see how the number of microstates changes when you double the volume or other similar changes.


Is it easy to see how the number of microstates changes when you increase the temperature - everything else being equal?

How would you say that it changes then?

I'd say that the number of compatible microstates doesn't change. The probability of each microstate does change though.


Your statement doesn't make sense, temperature is defined in terms of entropy changes, you can't calculate temperature without first calculating entropy changes.


Have you heard of thermometers? I can have a container with 1l of some gas at room temperature T1 and proceed to heat the room - and the container - to temperature T2.

How do you calculate the number of microstates for the sample of gas before and after? How do you think these numbers are related? You said it was easy!


Thermometers is a way to measure temperature, it isn't its theoretical definition. Temperature is defined as energy required per change in entropy. There is no other reasonable way to define temperature, since at its core it measures which way energy flows when two macro systems are connected. Temperature tends to go up as you add energy to things, but not always, for example temperature doesn't go up when you melt ice, it starts and ends at 0 degrees C.

> How do you calculate the number of microstates for the sample of gas before and after? How do you think these numbers are related?

If you added energy to the gas by heating it, lets say you doubled the energy, then you now have twice as many energy packets to distribute between the particles. This adds a lot more microstates that wasn't available before, and none of the old microstates are now possible since all old microstates had a total energy level half of what each new microstate has. You can calculate the change in states yourself, it is just discrete normal probability theory. Note that the base rate isn't interesting, you care about the change of the logarithm of number of states.


> This adds a lot more microstates that wasn't available before, and none of the old microstates are now possible since all old microstates had a total energy level half of what each new microstate has.

As I'm sure you know, the microstates of that sample of gas at some fixed temperature don't have all the same energy. For each temperature there will be a distribution of possible energies. If the temperatures are close enough there will be a large overlap between those distributions.

You cannot just count the microstates of the sample of gas. (You can count microstates of the gas plus reservoir system though.)


> As I'm sure you know, the microstates of that sample of gas at some fixed temperature don't have all the same energy.

Depends if you do classical statistical physics or quantum. If you do classical they all have the same energy. If you do quantum you have to weight the states according to their probability densities, and the probabilities that the energy deviates are very small which is why classical works fine even when ignoring those.

But quantum statistical physics is way more complex, you should learn the classical statistical physics first before you try to discuss quantum statistical physics. Classical works a lot like the computer science version where you just count states, quantum doesn't.


> Depends if you do classical statistical physics or quantum. If you do classical they all have the same energy.

That's just completely wrong. With all due respect, you should (re?)learn the classical statistical physics :-)

https://en.wikipedia.org/wiki/Boltzmann_distribution


That is for a subset of a larger system, not for a closed system. A closed system has fixed energy in classical.

So for example, if you have a gas as a closed system, takes a part of that, then you can measure that parts energy distribution by just accumulating the different micro states, yes. But if you view that part as a closed system, then if it has higher energy then it has higher temperature (if it is in the same phase) and thus energy will on average flow out of it to the other parts, if it didn't have higher temperature then it would be a stable system and that part would just have higher energy, which we know doesn't happen in for example gasses, (but it can happen with phase transitions, like an ice cube in water).

Edit: And it doesn't make sense to talk about temperature of subsets of systems. Temperature only deals with what happens when you connect two large systems, it is a macroscopic property, so when you calculate temperature you always deal with calculate that via entropy of closed systems.

So Boltzmann distribution happens when you have calculated the temperature for a macroscopic system as if it was closed, and then you start to calculate properties of some subsystem of that, that is how you get Boltzmann distribution. Not sure how you think these things were derived, have you tried reading a physics book on the subject and gone through how the formulas are derived?

Also, in these calculations you never ever mix two temperatures in a single system, as that isn't stable. The formulas only works for stable states. So if you have two different temperatures you have two different systems.

Edit Edit: I can't post more since I sometimes post stupid political stuff. Anyway, I answered your post already, the formulas we are talking about here doesn't deal with the case where systems of different temperatures interact. They only work for closed systems. If you heat a subset of a room to a higher temperature, then you now have a dynamic system where energy flows out of that subset, none of the formulas we have discussed here applies in that scenario. You can use them to calculate approximate properties of the systems by treating them as if they were closed though, still the Boltzmann distribution doesn't apply there for the entirety of those subsystems, since it assumes that everything surrounding it has the same temperature, which in your scenario the surroundings do not.

Anyway, all of these formulas are derived based on completely closed systems with no transfer of energy or particles. that is the source of entropy calculations, if you want to understand entropy you have to deal with such systems. Boltzmann's law is an example of a formula derived from entropy calculations, you can't use that to talk about entropy.

(Note, I learned these things in another language than English, so I might use the wrong words for some things, but I know how statistical physics, temperature and entropy works, are derived etc)


> A closed system has fixed energy in classical.

Wrong again!

https://en.wikipedia.org/wiki/Closed_system

"In thermodynamics, a closed system can exchange energy (as heat or work) but not matter, with its surroundings."

Maybe you meant "isolated" instead of "closed" - but then I don't know what we have been talking about. (When I say "we" I mean "you".)

> It doesn't make sense to count the microstates for a volume of gas at some pressure and temperature.

A volume of gas at some pressure and temperature is a closed system but not an isolated system.

> I can have a container with 1l of some gas at room temperature T1 and proceed to heat the room - and the container - to temperature T2.

The containers in that example are closed systems but not isolated systems.

> the microstates of that sample of gas at some fixed temperature don't have all the same energy

That sample of gas at some fixed temperature is a closed system but not an isolated system.

Hopefully you'll agree with everything I wrote with the understanding that we were not talking about isolated systems.

[ have _you_ tried reading a physics book? :-) ]


And of course the "knowledge" part - the entropy being a function of the probability distribution for the microstate conditional on the macrostate - is there just the same in the microstate counting approach. (Where the latter is applicable!)

If given the macrostate all microstates are equally probable we can just count them. The more there are the higher the entropy.

In general we have a probability distribution for microstates conditional on the macrostate. To have a clear understanding of entropy that should be at least mentioned.


Of course you can count micro states of a gas within dE or delta-E of some total energy. The density-of-states approach is exactly that.

I thought we were discussing statistical mechanics.


> I thought we were discussing statistical mechanics.

This is from the message that you first replied to in this thread:

"Typically when you calculate the entropy of a system at temperature X, that means all you know is that you stuck a thermometer in it and measured X. You don't know anything more than the average temperature. It could be in any state consistent with that temperature."

Will you tell students to count the microstates consistent with the temperature?


I think in practice we select macrostates based on stuff we can easily measure. And stuff we can easily measure gets pretty close to intrinsic properties.


Oh my god, this explanation is gold. Thank you. I'm going to save it and refer to it in the future.


Yeah the author is conflating low entropy with a low number of microstates, which is consistent with the thermodynamic assumption that maximal entropy means a uniform distribution of microstates, but is confusing.

The purest mathematical justification for why low entropy means a low number of microstates probably comes from the fact that (classical) physical systems are a dynamical systems that preserve the measure induced by the standard metric of the phase space. The measure theoretic definition of entropy then implies the entropy of a partition of the phase space (i.e. a set of macrostates) is indeed the average of the logarithm of the number of microstates.

So indeed if W is the number of microstates in the current macrostate then entropy = log(W) (on average).

And using the typical set you can show that the probability that the average of log(W) over n samples is within 'epsilon' of the 'exact' entropy goes to 1 as n goes to infinity. This is the mathematical justification for the second law of thermodynamics.

The trick is that all of this is true no matter how you partition phase-space. Though that does mean that what is and isn't a high entropy state depends on your perspective.


>The trick is that all of this is true no matter how you partition phase-space. Though that does mean that what is and isn't a high entropy state depends on your perspective.

That seems to be correct.

>Yeah the author is conflating low entropy with a low number of microstates

Since entropy is found by counting micro-states (for example your third paragraph), that should be ok. What am I missing?


The equivalence is an important theorem (important enough to be engraved on Boltzmann's gravestone), and if you're switching back and forth in an explanation of what entropy is then you're skipping over some important details that answer what it means for something to be a 'low entropy state'.


Here's a concrete example of entropy with just two macro-states, 'broken' and 'unbroken'. If it becomes unclear or unconvincing, can you point out where that happens? It's intended to be ELI5, clear enough to discuss coherently.

Question: Why is it when I drop a vase it smashes into a million pieces; however when I then drop the million pieces it does not form a vase?

Answer: Stop! Don't drop any more expensive vases. Start with these simpler systems that do repair themselves sometimes when you drop them.

Take a coin and align it so the 'heads' side faces up. 'Heads' means 'unbroken'. (The reason for that will become clearer as we do more experiments.) Now drop the coin on the floor. How often is the 'heads' side still facing up? Now if you drop it again, how often does it 'repair' itself so that the 'heads' side is up? (Really do this.)

Try the experiment with 2 coins. Align them all heads-up, drop them, then see if your pattern is 'broken'. ('Broken' means not-all-heads-up.) Drop the 'broken' coins again. How often do they 'repair' themselves? ('Repaired' means all heads-up.) ( Don't think about it! Don't solve for it! Do it! )

Try again with 5 coins. How often does a 5-coin system 'break' when you drop it? How often does a broken 5-coin system 'repair itself' when you drop it again?

How about 10 coins? How often does a broken pattern of 10 mixed heads/tails repair itself to all heads when you drop it again? Sometimes it does, but you'll have to be very lucky or patient to see it happen.

I think from here you can probably see (part of) the answer to your question about the vase. The word people use for this kind of thing is 'entropy'. With enough coins, the 'broken' state is much more probable than the 'repaired' state. The log of a probability is called 'entropy.'

https://www.quora.com/Why-is-it-when-I-drop-a-vase-it-smashe...


Have you looked at Huffman coding? I'd recommend the (free) book by David MacKay, it is secretly the "hackers guide to thermodynamics".

http://www.inference.org.uk/mackay/itila/book.html


This book is phenomenal and I highly second this recommendation.


> I don't know what Entropy is supposed to mean on the level of individual states/configurations. I don't understand what kind of macroscopic "averaging" function we may use to group up those states

I find it helpful to think of entropy as a property of not of the system, or any individual state (micro- or macro-), but as a property of the "compression" process that summarizes microstates with a coarser-grained macro-description.

Given a choice of compression, classical physics says a system will tend to spent most of their time in the most likely compressed state. Different choices of compressions can lead to different macro descriptions, with different "entropies" and different dynamics among their macroscopic variables.

In this light it's not meaningful to think of the entropy of individual states. You could think about the "identity" compression, but you would end up with a description that was exactly as complicated as the full micro-state time-evolution dynamics; you wouldn't end up with any smaller set of variables that could describe the equilibrium of the whole system (really this would not admit an "equilibrium" at all)


First, entropy is a macroscopic property, it makes no sense to talk about the entropy of a single particle. Second, entropy is not a fundamental property, it depends on what the observer cares about. Take the common example of a gas in the corner of a box, in that case we care about the density distribution in the box, a macroscopic property. To make this more concrete, one way to quantify the density distribution could be to tessellate the box with cubes of some size, find the ones with the lowest and highest density, and use the difference between those two densities as a measure of density variation.

Once we have decided that all we care about is the density variation in the box, we can go through all possible microscopic states and group them together by their density variation. Finally a low entropy macroscopic state is simply a macroscopic state - a certain density variation - for which there are only a few microscopic states that have the corresponding density variation. On the other hand a high entropy macroscopic state is a macroscopic state for which there are many microscopic states that have the corresponding density variation. You can also call the microscopic states low or high entropy but only with reference to the macroscopic property you use to group them, in themselves microscopic states are not low or high entropy.

If you observe a low entropy macroscopic state, then you know a lot about the microscopic state, after all there are only very few. If you observe a high entropy macroscopic state, then you know a lot less about the microscopic state, there are much more possibilities even though they are microscopically indistinguishable. And if there are no limiting constraints on how the microscopic states can evolve, if the evolution is essentially a random walk through all possible microscopic states, then the entropy of the system will increase with high probability as it is much more probable to randomly walk into one of the many microscopic states associated with a high entropy macroscopic state than to walk into one of the few microscopic states associated with a low entropy macroscopic state.


>there are much more possibilities even though they are microscopically indistinguishable.

You meant "macroscopically" ?


True, but to late to edit.


This is a very sensible confusion. The forms of macroscopic averaging functions which are useful and valid cannot be made up arbitrarily, but are determined by the microscopic physical laws of the system. There is a reason that the law of increase of entropy is the second law of classical thermodynamics, with conservation of energy being the first law. To state it explicitly: energy is a globally conserved quantity, which can be freely exchanged among the interacting microscopic parts of systems. So we can bring a test system (called a thermometer) into interaction with our system under study, (indirectly) observe the average energy per degree of freedom of the thermometer, and call that observation the temperature of the system under study. Similarly, it is a known physical phenomenon that a gas confined to a container will exert a steady average outward force per normal unit area on the walls of the container; we have ways to measure this force, and we call it pressure. And so on, and on: every useful macroscopic averaging function is a relatively stable, measurable quantity which is determined by the physics of the systems under study. If we discovered some new measurement technique tomorrow which enabled us to measure the "quintessence" of physical systems, and this measurement was stable and reproducible, and could be meaningfully aggregated from the microscopic parts of the system and measured on the macroscopic scale, our definition of entropy would change, to account for "quintessence".


It is also because “the system” is intrinsic to the notion: as you say, any configuration of bits is equally likely; this only takes into account the “system of the bits “. The moment your system is “the bits and their mean value” everything changes, as there are systems with a single possible configuration.

That is what happens when he starts the first example: “a system of particles INSIDE A VOLUME. The volume is what makes the entropy larger or smaller. The particles in a different volume (or just by themselves) have a different entropy.


Not a physicist either, and I don't claim to understand entropy that well either but maybe it would help to consider that entropy may not be a universal variable of systems in the universe.

I think you should rather consider it as a mathematical construct that applies to some systems where the microscopic quantities are well defined, and where the 'averaging' that we can observe is also well defined. So if you look at thermodynamics, entropy is well defined, but you may be totally right, that what we call "microscopic states" in a gas can be broken down further in elementary particles, that may or may not behave in quantic ways, and what not, and counting the micro-states considering the elementary particles is a whole different game.

But it doesn't really matter. What matters is that at the scale we're at and with the microscopic/macroscopic relation that's defined, entropy works. The calculations that give some numbers to entropy show that it looks like entropy cannot decrease. They call it a universal principle of thermodynamics, because there is nothing (to my understanding), that explains it microscopically.

And it works for a variety of situation in physics, such that it seems that it's a universal property of nature. But it's mostly mathematical. It seems to say that "given a system we know everything about, there is no way to go to a system that has some unknown things to us".

Anyways. I mostly wrote this to see if I could articulate it to myself, hopefully it helps you as well.


Thanks for writing that. From the perspective you’ve articulated I sometimes wonder whether the idea of the heat death of the universe is a matter of perspective, it only applies to the matter and properties of the universe that we consider significant, are we living within the heat deaths of past forms of the universe in which physical interactions we have overlooked dominated?


>It seems to say that "given a system we know everything about, there is no way to go to a system that has some unknown things to us".

That's backwards: information is the negative of entropy. The 2nd law says that entropy never decreases, so information never increases (it can only be preserved or lost).


>I don't know what Entropy is supposed to mean on the level of individual states/configurations.

The entropy is a property of a probability distribution, not of a state. Entropy is defined as H = -sum(p_i log(p_i)). A 'state' implicitly defines a probability distribution: uniform probability over all the microstates compatible with the state description.[0] In the case of a microstate, the entropy of the probability distribution over microstates consistent with that state is zero - there's only one compatible state, so p_i = 0 for all other states, and log(p_i) = 0 for the compatible state. In the case of a macrostate, the entropy of the probability distribution over microstates consistent with the macrostate works out to -sum((1/N) log(1/N)) = log(N), where N is the number of consistent microstates. That's the Boltzmann entropy.

Sometimes people will write about the entropy of a 'state' in such a way that it sounds like they're talking about the entropy of a microstate -- but what they're probably talking about is "the entropy of the macrostate that this microstate belongs to." It's sloppy to talk like that, because "the" corresponding macrostate isn't unique. There are many sets of macrostates that could contain a microstate, depending on what properties of the microstates one considers 'macro.'

(Ex: 10100101 is a member of both "symmetric bit strings of length 8" and "bit strings of length 8 that average to 1/2". The entropy of "symmetric bit strings of length 8" is 4 bits, whereas the entropy of "bit strings of length 8 that average to 1/2" is ~6.1 bits. And of course, the entropy of "the bit string of length 8 that is exactly 10100101" is zero.)


For information theoretic entropy (I don’t know anything about thermodynamics):

The first thing you describe, that if you draw many symbols from your source distribution at random then you see a distribution of symbols that is equal to the probability distribution of the source, is called the asymptotic equipartition principle.

I think your confusion comes from two things. First, conflating bits and symbols and second assuming symbols are equiprobable.

Take English text where our symbols could be letters of the alphabet. These are not equiprobable, if you select a letter from a book at random you get a different distribution than 1/26. If you took as your symbols the individual bits that would encode those characters in ascii you would get something closer to equal probable symbols. Another choice for symbols would be the words in the book.


I suppose the problem is because entropy is a proxy for the number of states of an abstract configuration space which has the same observed quantity as the concrete object that you take to measure its entropy. So, for example, if you know that your object with mass M and temperature T then to measure its entropy you take all the posible states for an abstract object with mass M and temperature T and the logarithm of that number of states is the definition of the entropy of one object that has mass M and temperature T. So the more you know about the the concrete object the less number of posible states for the abstract model and so the entropy is not a property of the concrete object rather is a property of an abstract model with same fixed global properties.


>There could be more than one possibility

This seems to connect with the idea behind Chaitin's incompleteness theorem. Making specific statements about the reducible complexity of something is not always possible.


>Think of a series of random bits that can be either 0 or 1 with equal probability. How likely is it that they are all 0 or all 1? Not very likely. There is exactly one configuration. How likely is it that they have a specific configuration of 0 and 1? Equally likely.

Well, there are only 2 states with all 1 or all 0.

But there are 2^N states of mixed 1 and 0.

Even if you treat the sets of bits as opaque items, and pick one from a bucket, I'd expect getting one of the 2^N - 2 configurations to be a far more likely outcome than one of the 2 remaining.

In fact, we could bet on it...


But there's only one state that is 10010001111110101000.


Sure, but that's irrelevant.

There are billions that are similar, and only one that's all 0.


"similar" is in the eye of the observer! This fact should be especially clear in the context of bit strings. If 10010001111110101000 is my login password, don't be surprised if other permutations fail to grant you access, even if you have the correct number of 1's and 0's.


>"similar" is in the eye of the observer!

Nope, also similar in algorithmic complexity theory (e.g. counting compressibility).


That's a very good, and deep, point, and I agree that there is something important there. However, algorithmic complexity is still only defined relative to an arbitrary reference computer. If my reference computer happens to, in hardware, XOR its inputs and outputs with the bitstring 10010001111110101000, then (in terms of bits that end up represented on the drive), that will be the one that has the lowest algorithmic complexity, although the algorithm might think that it's outputting all 0's.


In computer science a high entropy means a high informational value. So any information that is not the expected value has a high content of information and therefore a high entropy.

In thermodynamics the case is a bit to the opposite as a high entropy means a low state of energy and therefore less internal processes within a system or none at all.


Another thing that contributes to the confusion which you have noted but not fully realized is that there are two different concepts that use the same equation and the same word: "entropy".

Information entropy and statistical entropy are two different things.


A specific string of bits has zero entropy. It may be a sample from a distribution which has some entropy.

Unless you are selecting a random single bit from that string, in which the entropy of that selection process is -p1log p1 - p0log p0.


High entropy = less predictable system. Low entropy = highly predictable system.


To stay within your bits analogy, I imagine an increase in entropy would be the equivalent of each bit becoming base-3, base-4, and so on, hence increasing the number of possible states (and reducing your ability to predict them).


One aspect of entropy that I always find counterintuitive is that unlike mass, charge, etc. it is not a physical quantity. In fact, from the point of view of an experimenter with perfect information about a physical system, the entropy of the system is exactly conserved over time (as made precise by Liouville's Theorem). The Second Law survives in this setting only in the most trivial sense that a constant function does not decrease.

It's only when you start making crude measurements---lumping positions into pixels, clouds of particles each with their own kinetic energy into a single scalar called "temperature," etc---that you start to see a nontrivial entropy and Second Law. Different ways of lumping microstates into macrostates will give you different (and inconsistent) notions of entropy.


The way to make sense of entropy is to treat it as a subjective quantity. A subjective quantity is a function where the observer's state of knowledge is one of the input arguments.

The article describes it as a measure of hidden information in a system, which is a good description. But that's not a property of the system itself, it's a property of the observer, from whom the information is hidden.

So different observers with different information about a system will have different opinions about its entropy.

My password, for example, to me has zero entropy. I know its microstate. But it's quite secure from someone trying to guess it, and they will think it's quite high in entropy.

If all you know about a system is that it's a kilogram of air at room temperature, it will seem quite high in entropy to you, as many possible microstates are consistent with that description. But if you have godlike knowledge of the exact configuration of every particle in the container, it will seem very low in entropy to you, and that's more than just an accounting difference. Indeed you can use that information to operate a Maxwell's demon and turn the system into a heat engine, splitting the cold and hot molecules into separate spaces and extracting work as though the system really had low entropy to start with. Because it did. To you.

Most of the confusion about entropy comes from what Jaynes calls the mind projection fallacy: the tendency to treat our uncertainty about a system as a property of the system, rather than a property of ourselves.


> The article describes it as a measure of hidden information in a system, which is a good description. But that's not a property of the system itself, it's a property of the observer, from whom the information is hidden.

Was hoping to see someone point this bit out.. I wish references to entropy included this piece of information more frequently. When I was first trying to understand the concept I kept thinking of it as something objective, but as you say it’s a property of the observer


Speaking of observers always rubs me off the wrong way... I don't want to touch on the observer problem, but just to mention something that should be obvious: there's ALWAYS hidden information in any system where time exists. Any "observer" can only know what the world looks like within its light cone. Because quantum mechanics shows that determinism is not possible, it's not possible for any "observer" to know the exact future state of the world outside what was observable within its light cone up until that moment. There's also the problem that you can only store a limited amount of information even given perfect theoretical storage... hence again, some information must be forgotten by whatever the "observer" is... talking about a "perfect observer" that knows all there is to know makes absolutely no sense.


Since entropy seems to be a measure of our ignorance, then there maybe there is no point in discussing a perfect observer (of something that is boundless). Edit: > talking about a "perfect observer" that knows all there is to know makes absolutely no sense.

if there was such a thing as a perfect observer, let’s say it is you, then you would still choose to measure some things, but not others. Unless by “perfect observer” you are referring to something that knows the states of all things simultaneously at all times, in which case that (to me) doesn’t sound like an observer at all, would just be someone/something that knows. So like you said “there is always hidden information” but that information is hidden to someone that is doing the observing and so is dependent on them. Otherwise from what or whom would the information be hidden? What would information even be without an observer?


Whatever is the driving factor behind the laws of physics seems to have “perfect information”. Not implying anything religious.


Entropy in thermodynamics is a statistical effect which acts like a "force" because of the immense number of particles and sub-states in play. A perfect simulation of gas particles bouncing in a two-chamber system will result in the "pressure" equalising because that is overwhelmingly the most likely state to end up in.

To be honest, I hadn't heard of Louisville's Theorem before but it doesn't seem to imply what you're saying -- in fact it is used to prove the fluctuation theorem which quantifies the probability of entropy spontaneously decreasing (as thermodynamic entropy is a statistical effect).


Liouville's Theorem does indeed seem to imply that entropy doesn't change:

https://physics.stackexchange.com/questions/202522/how-is-li...


> One aspect of entropy that I always find counterintuitive is that unlike mass, charge, etc. it is not a physical quantity. In fact, from the point of view of an experimenter with perfect information about a physical system, the entropy of the system is exactly conserved over time

True of energy as well. It can't be directly measured except as a relation between two states.


> One aspect of entropy that I always find counterintuitive is that unlike mass, charge, etc. it is not a physical quantity.

Those physical quantities might be intuitive, but as a physicist Brian Greene once wrote, no one really knows what mass is. We only know that mass bends space-time curve, hence gravity.


> no one really knows what mass is. We only know that mass bends space-time curve, hence gravity.

Mass is much better understood by its role in inertia. Basically mass is the amount of energy you need to exchange with a thing to change its current speed. This observation works from Newtonian mechanics to QM and GR as well.

Now, why do things have mass? The famous E=mc² explains this for most things: they have mass because something inside them has potential or kinetic energy. This works all the way down to the atomic level - the mass of a proton for example is almost entirely explained by the potential energy of the quarks being held together in a small volume; the total mass of the quarks themselves is only a small fraction of that. Now, the mass of the elementary particles is somewhat more complicated, but the Standard Model does have explanations for those - symmetry breaking for fermions, and the Higgs mechanism for the massive bosons.

The next mystery is: why is inertial mass equal to gravitational mass? GR has essentially explained this, by showing that acceleration is equivalent to gravitational attraction depending on your frame of reference.

So overall, I'm not sure what Brian Greene means by that - mass is at least as well understood as other basic properties of particles (charge, spin, color charge).

This lecture by Leonard Susskind explains most of these things about mass in a way I found easy to follow:

https://www.youtube.com/watch?v=JqNg819PiZY


If I can pick your brain, there’s a related concept — entropy production. How does that relate with these ideas?


Not the GP, but entropy production is any process that increases your ignorance about a system (and usually, if you're doing it intentionally, everybody else's ignorance as well).

To produce entropy you have to grow the number of possible microstates that are consistent with available knowledge of the macrostate.

Usually you accomplish that by converting stored energy into heat somehow. A charged capacitor has low entropy compared for the energy it holds; discharge it through a resistor, and you produce a bunch of entropy because that energy can now be distributed in a lot more ways among a lot more degrees of freedom, and nobody can possibly keep track of them.


It's a property of information so if you assume perfect information, of course it becomes trivial.


I would ask why you don't have the same problem with energy?


Not really. Even from a point of view of an experimenter with perfect information, the entropy of the system declines over time as fewer and fewer bits are needed to describe the system.

For example, start with a Glas of warm water and an ice cube in it. Over time, the ice will melt and the range of different temperatures of the molecules decline. Consequently, you need fewer and fewer bits to describe the complete state of the system. It takes fewer Bits to encode all the velocities of a million molecules that all move at a similar speed than to encode all the velocities of a million molecules that move at very different speeds.

The more similar the state of the molecules becomes, the shorter a text becomes that has to describe the complete state of a system. Therefore, entropy is decreasing even from the point of view of an observer with perfect information.


Because ice is solid you can argue it takes less information because the particles aren't moving at all or in together in unison, so it will take less buts to encode.

Furthermore, velocity is also a product of direction as much as speed, so if you take into consideration a solid object may vibrate it's particles in the same direction while a liquid can have it's particles in infinite directions, you're talking about way more information you have to encode.


What is perfect information?

I understood perfect information to include infinite precision knowledge of non-quantized values like position and momentum. To store that information we'd almost always need infinite bits to express the state of even one particle.

(...and cough ignoring uncertainty...)

Using a finite number of bits was described in the comment above as "lumping positions into pixels".


Surely glass of warm water + ice cube is lower entropy state compared to melted icecube mixed in the water.


This reminds me of a great article that I saw on Hacker news that really helped explain the concept of Entrophy to me. Linked here:

https://news.ycombinator.com/item?id=24140808


Gotta love these interactive websites.


I thought entropy (in the Shannon sense) was a property of discrete and finite probability distributions. It's essentially a measure of how random a sample from such a probability distribution is. Notably, continuous probability distributions don't have meaningful entropy (or in some sense, their entropy is always infinite). It's worth considering the similarities and differences between entropy and standard deviation.

I thought the 2nd law of thermodynamics was saying that with incomplete knowledge, the probability distribution of possible states becomes more and more spread out as time goes on. It's almost a limit to how you can make predictions or simulations of physics when the initial state of the system is not fully known. Equivalently, it's a banal statement about chaos in the sense of chaos theory.

The only thing I don't get is how physicists get around the discrete and finite restriction. Maybe the state of the system is not what has entropy. Rather, one can define an arbitrary function f from the system to a finite set S, and then talk about the entropy of f(System at time t), because this is indeed a discrete and finite probability distribution which you can take the entropy of.

Hmmm. Maybe I understand entropy.


> The only thing I don't get is how physicists get around the discrete and finite restriction.

Actually, they don't! When you start doing the math about states in a quantum sense (i.e. statistical mechanics), the basic premise is that the available range of states _is_ discrete. Particles are quantized - so they can only possess certain allowable discrete energy levels. The broader laws of thermodynamics fall out of that and appear to be continuous as you scale up to the macro world across a huge number of microstates.


I think this comment is significantly more insightful than the article.

As for the thing you don't get: quantum mechanics means that the state space is actually discrete, which means there is no need to pass to a continuous distribution. And finiteness is not really a concern either: first of all it is not strictly necessary for the (Gibbs) entropy to be defined, and secondly the space state is actually often finite once e.g. the total energy in the system is fixed.


> I thought entropy (in the Shannon sense) was a property of discrete and finite probability distributions. It's essentially a measure of how random a sample from such a probability distribution is. Notably, continuous probability distributions don't have meaningful entropy (or in some sense, their entropy is always infinite).

True, but for continuous distributions you can use the KL divergence against a uniform distribution :)


One of the properties of entropy H(X) of a random variable X is that if f is a bijective function then H(f(X)) = H(X).

For relative entropy (or "KL divergence" as some people call it), we have that H(X||Y) = H(f(X)||f(Y)). But if you fix Y to have a continuous uniform distribution, then you lose this critical property because f(Y) may no longer have a continuous uniform distribution.


Apparently this "critical property" is not so important to all the people who use relative entropy as a generalization to a continuous distribution defined on a space with an underlying measure.

Why would they care about arbitrary transformations mapping points in the space to other points in the space?


What I think it means, is that if you take two different parametrizations of the same physical phenomenon, then you get two different entropy values.

E.g. if you have a bunch of particles with fixed mass. You could look at the distribution of speeds and get one entropy. Then the distribution of kinetic energy (basically speed squared). Uniform speed means non-uniform speed squared so the entropies would disagree.

This sounds like it could pose issues.


Physical entropy is defined from the probability distribution over states. Velocities or squared-velocities are not states, they are derived quantities. Points in a phase space would describe states. Physical states are discrete anyway when you consider quantum physics :-)

As for the entropy of probability distributions in general, I think relative entropy is invariant under reparametrizations because both the probability of interest and the reference probability transform in the same way [1]. But I don't remember what does it mean exactly. [And I am not sure if that makes ogogmad wrong, I may not have understood well his comment.]

([Edit: forget this aside. You probably were talking about speeds as positive magnitudes.] By the way using an example analogue to yours discrete entropy wouldn't be invariant either: if you have a distribution {-1,1} and square it it collapses to a zero-entropy singleton {1}.)

[1] https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Pr...


+1. The commenter above also wanted cared about bijective mappings, and squaring a random variable in [-1, 1] is not bijective. Squaring a random variable defined over positive real numbers would lead to a bijective mapping and the distribution would still remain uniform.

Actually, I find it hard to come up with a bijective mapping that leads to a non uniform distribution that's useful for anything practical.


Ok so first to have a uniform distribution we have to have a bounded set. Maybe you can do something clever with limits but lets not overcomplicate things. Lets say we have 0 <= v < 10. Define E = v^2. Then 0 <= e < 100

Uniformity of v would mean that p(0 <= v < 1) = 1 / 10

Uniformity of E would mean that p(0 <= E < 1) = 1 / 100

But by construction p(0 <= v < 1) = p(0 <= E < 1). So it's not possible for both to be uniform.


It's not necessary to have p(0 <= v < 1) = p(0 <= E < 1). Only that P(f(X)) is uniform.

But this does bring up a good point. H(X||Y) = H(f(X)||f(Y)) for any bijective f if the distributions are discrete. When they are continuous this is not true, even with a bijective f. For example f = x^2 doesn't work even though it yields a binary distribution. Interestingly however, affine transformations work.


(For non-negative v) v^2 is less than one if and only if v is less than one. That's why the probabilities have to agree in our specific case.


Yeah, you also have to transform the "reference" function, and then the entropy stays the same. I prefer to think of it as the "density of states" -- it's necessary to make the argument of the logarithm dimensionless, after all.


> I thought the 2nd law of thermodynamics was saying that with incomplete knowledge, the probability distribution of possible states becomes more and more spread out as time goes on. It's almost a limit to how you can make predictions or simulations of physics when the initial state of the system is not fully known. Equivalently, it's a banal statement about chaos in the sense of chaos theory.

I'm not sure I understand what do you mean by "as time goes on". Classical thermodynamical entropy is defined for a system in equilibrium and it doesn't change with time. It changes when you do things to the system.


I don't think statistical mechanics entropy is limited in this way. I think the (incorrect? oversimplified?) definition given in the article is only valid under the conditions you've given. But I'm not sure.


Then it maybe depends on what you meant by "the 2nd law of thermodynamics".


In Shannon’s 1948 paper, part V deals with continuous sources. The key is to realise that you cannot measure a continuous signal exactly, and so you can define a rate of information relative to the fidelity of your measurement. (I only skimmed that part years ago, and never studied it carefully. But it makes perfect sense.)


If you mean differential entropy (which Shannon supposedly suggested as a generalisation to continuous random variables), this is not a good generalisation of entropy to continuous random variables. It lacks all the interesting properties of entropy.

The "proper" generalisation of entropy to continuous random variables is something called relative entropy, or in some books it's called KL divergence. But this is now a property of how two probability distributions relate to each other, rather than a property of a single probability distribution alone.

I'm not an expert in probability theory or physics, but this is what I've learnt from a brief study of these areas.


Relative entropy? KL? Ah, found it – Kullback–Leibler divergence, it’s called. Thanks, I’ll put that on my list of stuff to learn about.


>how physicists get around the discrete and finite restriction

By turning a sum into an integral. The probability 'density' is p(x), and the 'density of states' is n(x), so then entropy is then integral of p(x)log(p(x)/n(x)) over dx.


Right, it requires a sort of alphabet of discrete specific states. Discrete locations in space, discrete numbers of things and discrete kinds of things.


yeah. i think of minimum entropy as a dirac delta distribution and maximum entropy as a flat uniform random distribution.

i never really understood the physical definition, but always handwaved it away with "things dissipate over time into an undetectable signal, or a flat distribution"


You basically nailed it!


Nice writeup! BTW statistical thermodynamics has a name for that set of possible microstates, perhaps the most pretentious sounding name in all of physics, the "canonical ensemble".


Nah, Ultraviolet Catastrophe is worse, then there's wavefunction collapse. And there's gotta be something in particle physics that puts these terms to shame. Given that the particle names are drawn from literature and whimsy.

I looked up the dictionary definition of pretentious, and so it doesn't really apply, but the hERG channel is a critical ion channel and various drugs block this and cause Really Bad Things. hERG stands for "the human Ether-a-go-go-Related Gene" - a pretty bloody stupid name but whimsey is not restricted to particle physics.

Canonical ensemble, along with microcanonical and grand canonical ensembles are all over statistical mechanics. And I suspect there's not one bit of whimsey in their naming. There was not any humour in my stat mech course aside from me trying to make sense of it.


There's actually a sort of general principle in medical science that people should avoid whimsical names for things, since in all likelihood someone with a life-altering or fatal condition or their family shouldn't be told that it's due to a mutation in the Sonic Hedgehog protein [1]

[1] https://en.wikipedia.org/wiki/Sonic_hedgehog


"Ultraviolet Catastrophe"'s problem is that it sounds way cooler than it is. I mean, that's the title of a cyberpunk novel; more hyperbolic (and disappointing) than pretentious. Eigenthings would be second on my list of pretentiousness (oddly, not gedankenthings; I'm inconsistent I guess). Standard Model names strike me whimsical to the point of being undignified.


I'm still mad about top and bottom when truth and beauty were right there!


If I'm not mistaken, the "official" names are just "t" and "b", so top and bottom, just like truth and beauty, are more mnemonics. So referring to them as truth and beauty would be just as correct.


Thought one was their names and the other their attributes?


Seems like you're right and I misremembered.


Ultraviolet catastrophe is cool! It’s why we needed quantum physics, otherwise there would be catastrophic infinite energy emerging from random high frequency oscillations!


Well that's not quite what it is, first of all, and second of all I don't like the practice of calling something a catastrophe when its just predicted by a theory, but didn't happen. I mean, bad theories always predict something horrible should be happening. What's next, do flat earthers get to have an "ocean water catastrophe" because their theory implies all the water drains away? Or a "super luminal" catastrophe because their theory requires infinite unending linear acceleration of the earth (and moon and sun)? I mean, it really is a catastrophe I guess...for the theorist.


My favorite is "gravothermal catastrophe" with "violent relaxation" being a close runner-up.


"well, I never!"

—the Grand canonical ensemble


> the most pretentious sounding name in all of physics

Also, "the free will theorem" and "the god particle".


the "god particle" was actually the "goddamn particle" https://www.businessinsider.com/why-the-higgs-is-called-the-...


Like the great Von Neuman once quipped, “ Why don't you call it entropy”, von Neumann suggested. “In the first place, a mathematical development very much like yours already exists in Boltzmann's statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage.”


The typical measure of entropy (Shannon or Gibbs, and let's spare details for later and after you've read up on the theory of large deviations) is

- sum (p log(p))

which is not that different than the formula for the mean

sum (p 1/n)

the critical difference is the normalization constant is based on the probability of the state rather than assuming a uniform probability over all states.

So, in effect, the entropy is a measure of the mean. It is a measure adopted to the case where "mean" is ill-defined because the number of modes and/or the variation around those modes is not handled well by simpler metrics.


If there was anyone who taught you this then they should be fired.

More constructively, principal among the many things wrong with your comment is the formula for the mean; sum_i p_i = 1, so sum_i p_i / n = 1 / n. The mean would instead be sum_i p_i x_i.


Perhaps I'm misunderstanding or missing something, but I'm afraid this seems completely wrongheaded to me. (My apologies for being so blunt, but right now your comment appears to be the most-upvoted, and I therefore think it needs some pushback.)

[EDITED to add: I was looking at an old version of the page; by the time I wrote this the parent was no longer the top comment. I'll leave the bluntness in, especially as at least one other person was even blunter.]

You refer to "the mean" and I think you mean the mean of the probabilities. Now, when you've got a probability distribution, by far the usual thing for "the mean" to mean is the sum of Pr(x) x -- the mean of the values. Taking the mean of the probabilities is a really strange thing to do.

One reason why it's a really strange thing to do is that this thing you call n is really kinda meaningless. There's no difference between these two probability distributions: (a) 1, 2, 3, or 4, with probabilities 0.1, 0.2, 0.3, 0.4 respectively; (b) 1, 2, 3, 4, or 5, with probabilities 0.1, 0.2, 0.3, 0.4, 0 respectively. But (a) has n=4 and (b) has n=5. Maybe you want n to be the number of nonzero probabilities? But now consider (a) along with the following probability distribution parameterized by a (small, positive) number h: 1, 2, 3, 4, or 4+h, with probabilities 0.1, 0.2, 0.3, 0.4-h, h. Every version of this distribution with h>0 has n=5, but when h is very small it's practically indistinguishable from (a) with n=4.

Further, since the sum of probabilities is always 1, what you write as sum (p 1/n) is just the same as the number 1/n. You can call it "the mean" if you want to, but I don't see what this adds over calling it what it is: the reciprocal of the number of possibilities.

There is something to what you say: the entropy is kinda related to the number of possibilities; if the probabilities are all equal, the entropy is log(#possibilities); if the probabilities are equal-ish then it's modestly smaller than that. But note e.g. that this relationship is exactly the inverse of what you say, in that "the mean" decreases with the number of possibilities, and the entropy increases with the number of possibilities.

The entropy is not "a measure of the mean". It kinda-sorta is related to "the number of possibilities", which is the reciprocal of "the mean". It is not at all the case, as your last paragraph suggests, that for most purposes we should be using "the mean" but we need to use the entropy when "the number of modes ... is not handled well by simpler metrics", whatever that means; for most purposes we should be using the entropy, and in the special case where all the probabilities are equal we can get away with just counting possibilities.

(In some important situations it turns out that what you have is some number of possibilities with roughly equal probabilities, and a whole lot more whose probabilities rapidly decrease to almost zero, and then you can get away with counting the number of reasonably-probable possibilities and taking its log. E.g., various situations in communications theory can fruitfully be thought of this way. But the entropy is still the more fundamental quantity, and "the mean" is still a needless obfuscation of "the (effectively) number of possibilities".)


It can be related to compression. If some phrase has a probability p_i of occuring, then the optimal length for the code is -log(p_i). The entropy sum(-p_i log_pi) = mean(-log(p_i)) is how long code you will use on average.


The author mentions Boltzman brains and that a human body could theoretically spontaneously form out of particles given a long enough time span. Of course, nothing like this can ever happen. It’s the fallacy of thinking infinite time means infinite possibilities.


> that a human body could theoretically spontaneously form out of particles given a long enough time span.

To be fair, isn't this precisely what happened?


If you leave out "spontaneously"!


Specifically if you leave it in! It's a matter of viewpoint or scope.

"Spontaneous" can be defined a few different ways: https://www.merriam-webster.com/dictionary/spontaneous

From the link "2: arising from a momentary impulse" you get the implied meaning from the original comment if you assume standard human experience of a "moment" IE a few seconds or less.

However you could argue that human evolution is but a moment in the scope of the universe.

And then with definition "5: developing or occurring without apparent external influence, force, cause, or treatment"

The only way it wouldn't be spontaneous would be if an external actor (Deity of your choice?) directed human evolution somehow. To say this is debatable is an understatement..

And so we have fun in word play and hopefully appreciation of different viewpoints.

:-)

edit: rearranged for better flow


"Oh, that was easy," says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing."


It's not a fallacy, it's a paradox. One that indicates that our theories of quantum fluctuations in an infinite universe are incomplete.

This podcast episode has a good discussion on the subject: https://universetoday.fireside.fm/745


The initial conversation is an entirely false premise, that an infinitely large universe would have "anything that can happen, would happen." An infinitely large universe could be empty, and fit the bill, and there would be nothing like "there are infinite copies of myself that are slightly different than now." It's bad philosophy, not science.


Perhaps it's actually correct, but our intuitions about ridiculously long periods of time aren't good. Note that heat death is in ~10¹⁰⁰ years, whereas this Boltzman body would take ~10^(10⁶⁹) years. That second time period is literally incomprehensible. So we think, of course a fully formed human body wouldn't appear; in practice, that's not how it actually works; the fact that it appears possible is at best a mathematical artifact, not reality. But we're talking about a timescale that's not just longer than the age of the universe, or than the total lifespan of the universe, not just orders of magnitude longer than those times, but on a completely different scale. Given that, I think we need to toss out those intuitions.

As to the author's last question of whether such a thing even makes sense at all given those time scales, I don't see why not. After all, once the universe reaches heat death, as far as we know nothing from the outside is going to come along and garbage collect it, so why couldn't it last for an arbitrary/infinite number of years? And compared to that, ~10^(10⁶⁹) years, or ~10^(10^(10⁵⁶)) years, or whatever, is nothing.


I think this more about the arrow than the entropy.

Considering the ‘arrow of time’ a function on each microstate lends itself better to ”can” questions.

I think both the author and you are assuming your definition for the arrow is correct.

It is still unknown if the arrow allows any arbitrary state to become another, or if there is a strict genealogy for the change of the state that can be seeded from an existing state that ensures it will evolve into a specific other under some time scale, or whether either of those permit Boltzmann brains, or even if there is some control the state has over itself (free will?).


Why is it not bound to happen eventually?


For the same reason an infinite number of zeroes will never contain a one. Particles still have to follow the laws of how matter behaves, no matter the timescale. Nothing spontaneously forms like the author suggests, regardless of entropy. That’s not how matter works.


In short: the authors make a good summary of these ideas:

- Entropy in thermodynamic equilibrium is well understood. The early theory (before statistical mechanics was developed) fits well with our modern understanding.

- The analogies made about entropy are not always good and indeed, if you try to match the physics with "entropy is disorder" it does not always work.

- In non-equilibrium situations it is, as the author points out, more complex.

Regarding the last item, even Stephen Hawking postulated some strange ideas about the universe having to rewind past some point in time, so that the big crush would be the mirror of the big bang.


Here's another head-spinning application of the concept of entropy, in quantum information theory:

https://www.cambridge.org/core/books/abs/quantum-information...

> "The first fundamental measure that we introduce is the von Neumman entropy. It is the quantum analog of the Shannon entropy, but it captures both classical and quantum uncertainty in a quantum state. The von Neumann entropy gives meaning to a notion of the information qubit. This notion is different from that of the physical qubit, which is the description of a quantum state in an electron or a photon. The information qubit is the fundamental quantum informational unit of measure, determining how much quantum information is in a quantum system."

Incidentally chem.libretexts.org, a collection of open-source chemistry textbooks, has a good overview of the physical-chemical applications. The site is kind of a mess but you'd want chapter 18.3:

https://chem.libretexts.org/Bookshelves/General_Chemistry/Ma...


Actually a quite nice article. After also spending years as a professional physicist not understanding entropy, I finally decided that I was not necessarily the problem, and spent the last 5 years or so trying to understand it better by rewording the foundations with my research group. (I'm one of the papers the author cites is part of a series from our group developing "observational entropy" in order to do so.)

A lot of what makes this topic confusing is just that there are the two basic definitions — Gibbs (\sum p_i log l_i) and "Boltzmann" (log \Omega) — entropy, and they're really rather different. There's usually some confusing handwaving about how to relate them, but the fact is that in a closed system one of them (generally) rises and the other doesn't, and one of them depends on a coarse-graining into macrostates and the other doesn't.

The better way to relate them, I've come to believe, is to consider them both as limits of a more general entropy (the one we developed — first in fact written down in some form by von Neumann but for some reason not pursued much over the years.) There's a brief version here: https://link.springer.com/article/10.1007/s10701-021-00498-x.

This entropy has Gibbs and Bolztmann entropy as limits, is good in and out of equilibrium, is defined in quantum theory and with a very nice classical-quantum correspondence, and has been shown to reproduce thermodynamic entropy in both our papers and the elegant one by Strasberg and Winter: https://journals.aps.org/prxquantum/abstract/10.1103/PRXQuan...

After all this work I finally feel that entropy makes sense to me, which it never quite did before — so I hope this is helpful to others.

p.s. If you're not convinced a new definition of entropy is called for, ask a set of working physicists what it would mean to say "the entropy of the universe is increasing." Since von Neumann entropy is conserved in a closed system (which the universe is if anything is), and there really is no definition of a quantum Boltzmann entropy (until observational entropy), the answers you'll get will be either a mush or a properly furrowed brows.


The universe is an open system.


The universe is not a closed system


How do you define "the universe"?


Statistical mechanics is one way of representing entropy but you don’t need it. The second law of thermodynamics can be expressed in other much more general terms. Also it requires that the system be isolated not “thermally isolated”. There’s other types of interactions such as gravitational and electromagnetic.


I mean, come on. You know and I know that the statistical mechanics definition gets you 99% of the way there in terms of intuition. Obviously if I spin a rotor in my thermally insulated box with a magnet on the outside I can add energy to order things with, I don't think anyone is confused on that point.


Well, claiming to having understood entropy is no joke. He should be flawless then. Just like with enlightenment he who claims to understand entropy really does not and he who does will not say so.


Can you explain what you mean by "not needing stat mech" and thermo entropy being "more general"?


I had an art teacher who was very philosophical. One day he described to the class what entropy was. I took a lot of physics and even astrophysics. Little did i know he had a better conceptual understanding and explanation than i've ever heard before. Too bad i don't remember exactly what he said.


Not to poke holes in your nostalgia, but how do you know he had a great explanation if you don’t remember it after further study?


"Don't try to understand it, feel it" - from tenet, but does sort of apply to entropy as a way of looking at problems.

That being said I heard a sports science student try to recall their working definition of entry and it was some mess of locks and keys floating around randomly hitting eachother?


> definition of entry and it was some mess of locks and keys floating around randomly hitting eachother?

Locks and keys does indeed sound like "entry". I thought we were discussing "entropy".


In art, a high-entropy painting is one that would be hard to tell apart from similar paintings, one example being paintings created by simply splattering paints all over the canvas.


That's the problem with randomness.

The required Dilbert reference: https://dilbert.com/strip/2001-10-25


> "Entropy is not Disorder One of the most popular belief about entropy is that it represents disorder."

This is what confused me the most about entropy in high school, the "order / disorder" lingo. Isn't "order" a metaphysical concept, something a conscious entity thinks about a system? How would nature know the difference? It took me some years to understand that that lingo is indeed misleading. (Still definitely not an expert of course.)


It's order in the different things in different places sense. Not the best term, since if I had to give an informational definition of order I'd probably set it up backwards, but not terrible. I like "separate" to communicate low entropy and "mixed" for high entropy, but that's just what particular examples of low and high entropy look like.


> Isn't "order" a metaphysical concept

Yes. The most scientific way of talking about "order" in that sense is the Kolmogorov complexity, which is still extremely poorly-defined. The best way to put it is that "ordered" states are ones that have low Kolmogorov complexity.


What is a way to articulate it then?


OP is about as good as any explanation I've seen.


"Compressibility" (in the software sense) would perhaps carry the most meaning.

Entropy is ultimately all about the ability to extract information (i.e. work) out of a system.


"Compressibility" is really not that great an illustration for physical entropy. Information-theoretic entropy is not quite the same thing as physical entropy, but close enough to confuse you if you're not paying attention.


I'm going to need an example on the differences, because in so far as statistical entropy is entropy - it's ultimately describing an information function (and the lean in quantum mechanics these days is that information is describing physical properties as well - hence holography and the blackhole information paradox).

The heat-deathed universe for example would be the ultimate compressible information: 1 measurable state, across all space, for the rest of infinite time.


Indeed, it was actually James Gleick's book The Information that helped me understand the concept better.


>Contrary to popular opinion, uniformly distributed matter is unstable when interactions are dominated by gravity (Jeans instability) and is actually the least likely state, thus with very low entropy. Most probable states, with high-entropy, are those where matter is all lumped together in massive objects.

That means over time the system becomes more ordered and starts organizing itself into spheres.

I once brought this question up on physics stack exchange and basically the answers were either some form of rolling their eyes at me or dismissing me outright. The people who did answer the question stated that as particles organize themselves into spheres some other part of the universe gets hotter as a result and that the seemingly self organization I see going on with the solar system was just an isolated system.

This answer still seemed far fetched to me. It still looks as if some overall self organization is still going on if the universe gets hotter on one side and matter gets organized into solar systems on another side.

It took me 3 years to somewhat understand what entropy is. If you have loaded dice that always roll 6s then the dice rolling ALL 6s is the highest entropic state. rolling Random numbers would then be a low entropy state.

Entropy is simply a phenomenon of probability. As time moves forward, particles enter high probability configurations. Like rolling dice. As you roll dice more and more... rolling random numbers has a higher probability then rolling all 6s...

It just so happens that disordered arrangements happen to have higher probabilities in most systems. But if you look at a system of loaded dice or the solar system... in those cases Ordered configurations have higher probabilities. That's really all it is. The entire phenomenon of entropy comes down to probability and the root of probability is the law of large numbers.


Entropy: "to describe energy loss in irreversible processes". We have no clue about what is or is not reversible. Complex systems exhibit self-organizing behavior for no reason (that we understand), and we continue to identify more conditions under which this occurs. How does a Nobel Prize get handed out for identifying/quantifying "self-organization" http://pespmc1.vub.ac.be/COMPNATS.html without bringing everything we think we know about entropy under scrutiny? Self-organization does not consume energy any more than entropic decay emits it. Irreversibility is a poor assumption.


> Self-organization does not consume energy any more than entropic decay emits it.

This statement is incredibly wrong - this is exactly what both these processes do. We calculate chemistry reaction kinetics by including entropy terms, and optimize reactions by manipulating the entropy on one side of the equation (a classic is getting a liquid phase to precipitate out as you produce it).

I mean the reason coal can be turned into electricity is because there's a big increase in entropy going from "solid carbon in a specific location" to "CO2 diffused everywhere".



Complex systems are net increases in entropy. The water is flowing downhill, but it takes a really weird organism-shaped path to get there. Self organization is supposedly interesting because we don't know why such a path manifests. Thus far, nothing has given the second law a second's (ha ha) pause. It's not impossible, but considering most of our foundational physics is time-symmetric it makes sense to call entropy irreversible. Even if it could be reversed (and don't hold your breath on that one), it's still the cause of the arrow of time.


Welp, I suppose the author has another 10 years to work on their article about how it took them 20 years to actually understand entropy.


Entropy isn’t measuring a loss of energy, but the loss of the ability for a closed system to do useful work.

Order is often used to describe what’s going on but it’s not the kind of order we normally think of. Sufficient cold water in a warm room is just as capable of preforming work as warm water in a cold room.


Gas molecules in a box - entropy seems quite straighforward there. An even distribution is the most likely state and has the highest entropy.

In space, at large scales, gravity starts dominating - so stars and planets are actually a higher likelihood state than an even distribtion.

Isn't this just about statistical independence? In a small amount of gas (almost by definition of what is a gas), the particles don't have much effect on each other. One can assume statistical independence.

While in space with gravity overwhelming other effects, the particles have very much effect on each other. Hence the statistics about their state are affected by these dependencies. So the previous intuition about entropy can't hold.


I enjoyed the article but have a very minor nitpick. I didn't understand why the author added this sentence.

"However, the timescales involved in these calculation are so unreasonably large and abstract that one could wonder if these makes any sense at all."

Apart from the fact that we could wonder about anything and everything I think the author does not state what evidence do we have to suspect that large enough timescales would change the laws of physics.

It could be the case of course, and it would be great to talk about them if they exist but without further justification I feel that this sentence is an unjustified opinion in what is otherwise a very nice article that helps better understand enthropy.


Of course you don't need to really understand entropy for it to be useful. It's definitely an interesting concept but when I was crunching equations for Thermodynamics, one of the weeder classes for ME, it becomes clear you need it for things to balance out. Once you've cranked threw a dozen or so problems you get a feel for what it is even if the physics and the spiritual side of it remains murky.

Now, 35 years later, when I marvel at my new engine or what have you, I still vaguely remember my entropy-problems days and appreciate that someone worked this stuff out.


I view entropy as a probability distribution of some set of configurations of something. Entropy is low if there’s only one configuration and high if uniformly distributed.

There’s also some observer/interaction effect which is like introduction a conditional probability which would cause crystallization in an otherwise homogeneous system. Essentially a catalyst.

I also find it fascinating that when it is super cold outside and you throw a pan of boiling water out the window it turns to snow instantly vs a cup of room temperature water which does not. It probably fits in terms of activation energy as well.


I recommend Information Theory for Intelligent People: http://tuvalu.santafe.edu/~simon/it.pdf


Let me add: This PDF is only 13 pages and presents a few different ways of viewing entropy. It was easy for me to follow as an undergrad.


That's pretty good as far as I'm concerned. Took me a couple years to really grasp electrical impedance. Breakthrough for me was a concise book written in 1976 by Rufus P. Turner.

Subtle things take a while to get.


The Science of Can and Can't[1] is interesting in how it looks to address a number of fundamentals via counterfactuals including the 2nd Law of Thermodynamics.

Edit: See [2] for background about Constructor Theory.

[1] https://www.chiaramarletto.com/books/the-science-of-can-and-... [2] https://www.youtube.com/watch?v=8DH2xwIYuT0


Of possible related interest: https://arxiv.org/abs/chao-dyn/9603009

I think Bricmont is a clear thinker/presenter on these matters and this article actually showed up in a "for humanities people" anthology. [1]

[1] https://www.amazon.com/Flight-Science-Reason-Academy-Science...


I had the pain, and pleasure, of taking and then (assistant) teaching thermodynamics at MIT.

One of the tidbits that always stuck with me was that astronomers have estimated that observable universe’s total entropy:

When you compare that value to the maximum possible entropy, i.e. the heat death of the universe, and then to the ridiculously low entropy state of the beginning of the universe, we are currently halfway along in that ‘timeline’.

It always brought to mind a grandfather clock; the clock stops when the weight hits the floor, and we are halfway there…


My understanding of entropy: it is a measure of how big a system (matter + energy from a space region) is, and how much its components have interacted with each other: Entropy ~ log(number of possible system states). As the universe unfolds, systems originally isolated are starting to interact and to form bigger systems, hence the number of possible states increases, and entropy increases too.


Entropy implies that these states are indistinguishable from each other.


Of all of physics, entropy is the most depressing part.


Of all places, it was a conversation about the Socratic Forms in a Political Theory course I took in college that really brought the weight of the concept home to me. It went something like "Unlike the realm Socratic forms exist in, everything in our universe is subject to entropy; it is in everything's nature to degrade or decay over time."

Maybe there's more to that? I'm all ears.


The forms are a low energy state that materially arise from an entropic process. Degradation and decay can lead to more organization and beauty.


Small nit in case the author sees this: the image labelled "Entropy of each configuration of system with two dices where the observed macrostate is their sum" is either incorrect or mislabeled.

For example, 2 and 12 each have 1 microstate, and ln 1 = 0, so the entropy of 2 and 12 is 0, but the image says 0.028 (which is the probability of 2 or 12, not the entropy).


> Boltzmann imagined that our universe could have reached thermodynamical equilibrium and its maximal entropy state a long time ago, but that a spontaneous entropy decrease to the level of our early universe occured after an extremely long period of time, just for statistical reasons.

I’m interested in reading more about this. Any pointers?


I read with interest most well written articles explaining entropy. I often leave the article mildly satisfied that I understood it. Until the next day when I again have to figure out the difference between "high" and "low" entropy in a particular model, and invariably I mix up the two.


As a computer scientist it isn't helpful that the entropy in thermodynamics and that in computer science (informational content - I don't know a good English term) collide a bit.


but is it not hubris to think that we really know much about the origin and outcome of the universe? is it wise to make decisions based on this modicum of knowledge that we currently have regarding thermodynamics and the universe?

I suspect that the scientists of a trillion years from now will know a lot more than we do know...so, I don't really much that much confidence in current pronouncements regarding the beginning and possible end of the universe..

and yes I do have a degree in science and courses in physics & thermodynamics


As the old quote runs...

"[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." -- Asimov

It isn't wise to say you're ignorant, it's wise to know how ignorant you are. If I see a coin come up HHH and have to bet on the next 2 flips, you can be damn sure I'll bet HH. You can bet HT, TH, or whatever else at equal probability, but I suspect I'll come out the winner more frequently than you.


Does it hurt to try?


it hurts to make decisions based on our current knowledge that is a child's knowledge


How exactly? And how exactly would we ever get past a ‘child’s knowledge’ without learning and making mistakes?


The models we have work and are relatively parsimonious. It would be silly to assume they are final but equally silly to shy away from using them for obvious reasons.

The modernization of physics also means we have outlines for what theories should look like, so even if our current theories are wrong we can still use the principles of (say) symmetry and information to constrain future work.


Shannon called the function "entropy" and used it as a measure of "uncertainty," interchanging the two words in his writings without discrimination


scientific method: it started with thermodynamic entropy, but scientits found out that this truth is much deeper engrained in our universe, then we got a mathematically generalized version, which is now used used to explain the "arrow of time" which our time reversable physics equations would not be able to explain alone.


The problem is that entropy is a subjective notion.

It's a measure of our lack of knowledge about the state of a system.


Alternatively it took ten years for Aurelien Pelissier's misunderstandings of entropy to decay.


Anyone who think they understand entropy is living in a state of sin.


For those missing the reference, the original quote is also great:

"Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin." John von Neumann


Google freewall? Guess I won't read this article...


Seeing "Read the rest of this story with a free account."

Nope.


What do you mean?


It is required that I log-in to the article using my google or facebook account to read it.

Garbage. Your article isn't worth reading if I need to jump through hoops to get there.


Here's an archived copy:

https://archive.today/vRD7i


I think the word entropy is science’s largest mistake.


Shannon explained the name 'entropy' in (McIrvine and Tribus 1971):

My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'


Very cool! I didn’t know that.


English was not my college professors native language. It took me a while to realize entropy was not the same as enthalpy. Very confusing.


Yeah, and while computational modeling has a good handle on enthalpy in drug design, it's all handwaving and statistics for entropy. And yet it's entropy that breaks predictions and drives medchem.

Solve how to properly model the free energy of interaction between a ligand and a protein (or two proteins) with proper solvent treatment and (a) you'll be famous, not Kardashian famous, but famous and (b) a whole lot of people will buy or download your software.


Wait until you find out about enstrophy.


entropy does not increase. The universe has organized itself into people, brains, cities, iPhones


not to brag but it took only 2 months to forget anything about it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: