Wolfram’s Computational Interpretation of the Second Law and a Physics Based View of Evolution.

David Galbraith
21 min readFeb 22, 2023


A Computational Understanding of the Second Law of Thermodynamics

This is an expansion of notes from Wolfram’s 3 part series which attempts to show a new theory of the Second Law of Thermodynamics, founded on a computational understanding vs an energy one based on heat exchange (Clausius) or the statistical interaction of gas molecules (Boltzmann). There is no mention of Shannon in Wolfram’s discussion, but in some ways the computational model goes beyond the information one (as I’ll outline below) to create a unified version of computational and mechanical entropy. I extrapolate Wolfram’s model to one which shows natural selection as an emergent property from the application of the Second Law.

What is Meant by Computational

What he means by computation is: iterative, algorithmic operations of a rule or equation over a given state to calculate a new one at a specific point in the future. Until recently much of science was based on equation based mechanisms which showed the state at any point in the future based on inputs to a formula (usually calculus based) and running it once to produce the output. This only applied for a small number of idealized systems since massive repeated interactions, feeding outputs as inputs into the same or further formulas, weren’t possible before computers. An obvious example of something only accessible from this repeated, algorithmic approach would be the discovery of fractals.

Computational Equivalence and Irreducibility

Many seemingly simple but technically complex systems, such as the position of three planets revolving around each other cannot be computed other than by brute force iteration of formulas, many times. Wolfram says this lack of short cuts (single equation results or compression of the number of steps), or Computational Irreducibility (CI), is an inevitable product of the principal of Computational Equivalence (CE), where CE is similar to saying you need a universe to compute a completely accurate snapshot of systems embedded in the universe, without iteration.

Wolfram’s one dimensional, elementary cellular automaton where the next state of a single black or white pixel is dependent on its current state and that of its left and right neighbor and which has 256 possible rules governing this, is used to create the simplest possible model to test a theory of a computational approach to the Second Law based on the above principles. Wolfram used the same elementary cellular automaton in A New Kind of Science, to suggest that algorithmic vs equation based approaches were what the title suggested. And it was later shown that some of them could be Turing complete, which meant that although simple, their output was infinitely complex, but structured.

[An example rule for an elementary cellular automaton.]

Why Equivalence Implies Irreducibility

The reason that CI is implied by CE is that ‘analyzers’ of a system “cannot be expected to have any greater computational sophistication than the system itself, and so are reduced to just tracing each step in the evolution of a system to find out what it does.” Wolfram also suggests that CE means lots of simple systems (such as Rule 110 [of the 256 possible rules in the elementary cellular automaton] or even a bunch of molecules of gas) will be Turing complete.

In other words, observers may not be able to predict the state of an observed system several steps into the future other than by brute force calculation over that same number of steps, but many very simple systems will be able to do this. These systems are labeled Class 4 systems (see diagram below) in Wolfram’s analysis of the iterated outputs of the elementary cellular automaton rulespace, although it can be argued that they sit between systems which repeat ad infinitum (class 2) or evolve to be random, obeying the 2nd Law (class 3).

A diagram of these, when applied to his elementary cellular automaton is shown below, where a one dimensional cellular automaton is chosen to be an extremely simple system with inputs and outputs based on rules.

[Wolfram categorized four different classes of outputs from his one dimensional cellular automata. Strictly speaking (and pointed out by Wolfram) Class 4 systems sit between Class 2 (constant entropy) and class 3, (2nd Law obeying, increasingly random systems) just as fractals sit on the boundary between computation results which go nowhere and ones that spiral off into infinity (I wouldn’t be surprised if there was a link and some fractals are Turing complete). Class 4 systems like Rule 110 and Rule 30 have some structure but are complex without being random and are Turing complete — they are computational systems. Class 4 doesn’t just comprise systems but the phase space these systems sit in within the universe of possibilities, which in turn is defined as a complex boundary between possible systems. They also share characteristics of the aperiodic crystal like structure that Schrodinger envisioned was necessary for life to be able to store information, and evolve. In other words, living things and the instructions given by the DNA that comprise them, are possibly similar to Class 4 systems.]

The 2nd Law from Imperfect Observation of Irreducible Systems

Wolfram suggests that the 2nd Law is the result of the interplay between observers with limited computational ability and systems which cannot be quickly computed by such observers (CI), where a theory of observers is needed to define things more rigorously.

In other words, the tendency for things to look progressively more random (2nd Law) is because systems progress in ways that are eventually beyond an observer’s finite (bounded) ability to comprehend/calculate. At the point where they reach a rate of change which accelerates beyond ability to fully comprehend it, everything looks the same — or more random — and the 2nd Law applies.

As a result, one person’s random (high entropy) is another with more knowledge or intelligence’s meaningful order (low entropy). Colloquially this is intuitively obvious — the mathematics of Einstein’s General Relativity looks like gibberish to most of us, and certainly to a toddler.

This relativistic approach to entropy (different for different observers) was arguably already a part of statistical mechanics and the concept of coarse graining, but few people seem to have interpreted it this way.

In a coarse graining approach to measuring entropy, it is the number of possible microstates per macrostate. A macrostate (e.g. gas pressure) means something to an observer and in turn consists of different configurations of microstates (any snapshot of gas molecules at specific positions can produce the same pressure) which all look the same.

What makes the Wolfram approach specifically attack the idea of relative entropy based on the calculation ability of the observer is the notion that systems tend towards finer and finer grained macrostates, till they seem random (and become microstates).

This tendency for finer and finer graining (microstates becoming macrostates) means entropy of certain observed systems lower over time is equivalent to the process of learning.

Lowering of perceived entropy after learning, where learning = previous calculations having been performed) is not terminology that Wolfram uses when referring to calculation ability increase over time. However, learning and calculation ability are potentially important and possibly linked when you consider the relevance of this ‘relativistic’ view of entropy vs traditional information entropy and a possible connection to a physics based understanding of evolution and natural selection which we’ll discuss later.

The tendency of ‘certain’ observed systems to have entropy which reduces over time for an observer, doesn’t violate the second law if the overall entropy of the environment increases. This holds for open systems (all calculating or observing systems are open ones) which are far from equilibrium, from planets absorbing a number of low entropy, high energy photons from the sun and re-radiating a larger number of lower energy ones back into space to a human gaining more and more understanding of a complicated system such as another living thing upon more and more detailed examination.

Observer Relativism and Entropy

“Yes, if an observer could look at all the details they would see great complexity. But the point is that a computationally bounded observer necessarily can’t look at those details, and instead the features they identify have a certain simplicity.”

This is the essence of a relativistic approach to entropy and one that, in the sense of information entropy (which says nothing about meaning), goes towards a theory of meaning as one person’s high entropy is another’s low entropy, depending on the level of resolution that can be perceived (number of bits varies according to observers), based on computational ability or (as I’ll argue later) number of past computations (knowledge or learning) which in turn yield meaning to certain states for certain observers and not for others.

Computationally bound, here means the sophistication of the observer which can improve over time based on both increased processing ability or expanded memory (to use the computer analogy).

In other words, interaction between an observing and observed system, over time, results in expanding the limitations of the computationally bound observer — it gets more sophisticated and more complex and collects together a bit of the universe that is lower entropy (such as the relatively low entropy [compared to surroundings] collection of molecules that make up a living thing, like us). We may have more molecules than an amoeba. But we will have lower entropy than an amoeba and a total number of water molecules it is swimming in, equivalent to the total number of molecules in a human. Much of the amoeba and us are structured and low entropy, the water isn’t.

As systems learn, they get more complex and create a local area of low entropy. So that the Second Law is preserved, this overall entropy reduction must be compensated by an increase in entropy in the overall environment (living things eat and produce more high entropy, waste shit!).

This implies that high order living things sitting in an energy gradient (the sun shining on us, or us eating) must increase energy or information flows compared to if we weren’t there. Low entropy energy flows through a low entropy machine such as a human to create a larger amount of high entropy, unusable, waste energy output than if the sun shone on a barren earth and heat was re-radiated directly out into space..

When we apply the above conclusions to any open system, then we can see what the implications of a calculation based approach to entropy are on the physics of evolution by natural selection, which currently is a theory that only applies to biological organisms. It seems odd that such a profoundly low level mechanism as Darwinian natural selection, isn’t a product of the underlying physics of energy flows. The calculation based interpretation of the 2nd Law allows for this as a logically consistent thought experiment.

Observer Relativism, information Entropy and Meaning

In colloquial terms, in the Wolfram model, a number sequence such as 1111 is lower entropy than 1066 for an observer with pure calculation ability, but 1066 is potentially (I’m not sure this is proven, but suspect it is a natural consequence of the relativistic view of entropy) low entropy for something with calculation ability and memory — specifically, a memory of European history. In the Shannon model, which says nothing about meaning, 1066 isn’t low entropy relative to other 4 digit, base 10 sequences.

If you consider definitions of entropy as microstates per macrostate then it is possible that a process of learning allows us to perceive more finely grained macrostates as systems interact and learn. This would be analogous to where we are able to perceive the component pixels that make up each digit. I.e. 1111 on a digital display is rendered as a lot more than four pixels if you are able to see the pixels that make up each digit or the smaller bits that make up each larger bit.

Wolfram makes no mention of Shannon (information) entropy and while it’s true that in principle it is purely analogous to thermodynamic entropy, a belief upheld largely because of the apocryphal tale of Von Neumann telling Shannon to call it that because ‘nobody really knows what entropy is’, this is exactly the point that Wolfram shows in his extensive review of the history of the 2nd Law. Further and really interestingly, Wolfram shows that the relationship between statistical mechanics and heat was long argued to be merely analogous. If anything, both Wolfram’s historical research and posited theory argues that information entropy is equivalent particularly when a relativistic approach (different observers or different computational ability actually see a different number of bits) is used as it doesn’t violate the Shannon instance as anything other than a special case, so his formula holds. Beyond this, formally linking information and thermodynamic entropy gives a much more trivial resolution of the Maxwell’s Demon paradox than Wolfram suggests (the Demon uses information to manage heat), and one which should surely have been resolved by Landauer (inevitable link between computation and heat), anyway.

In the Maxwell’s Demon model, the observer becomes part of the system. As Wolfram points out, both Quantum theory and Relativity, although unresolved between them, are both theories that embed the observer in the system to create a different outcome, and this hasn’t been done for statistical mechanics.

Different Types of Observer

For the requirements of a theory of observers, I’d argue that sentient observers are a special case of observers (where observing = information syncing interaction, which is synonymous with learning) these, in turn, are a special case of general interaction between systems, so we need a theory of observers, defining both interaction (observing in broad sense) and systems (observers).

In the Shannon-Weaver model of information exchange, there are five elements, a source, a transmitter, a channel, a receiver, and a destination. Observers and observed systems are equivalent to source & transmitter and observed as destination & receiver, which reduces the model to three elements. If we define a system as a boundary with inputs and outputs (since no system is truly closed), then both source and destination are merely complex, contorted channels, where a channel is the simplest type of system with a boundary on both sides and no contortion in the middle. This reduces the Shannon-Weaver model to connections between, one type of element, an open system, with more of less complexity based on how ‘tied up in knots’ it is.

Colloquially, people usually describe learning and information exchange as information transfer. But if we examine information exchange, not just in an information channel, but in a system comprising observer, channel and observed system, information isn’t transferred, it is synced. To illustrate this: If you tell me a sequence of numbers, that same sequence is now in two places (your brain and my brain) and the overall number of bits needed to describe the states of both our brains is now less — it is lower entropy. The number hasn’t moved from one place to another; it has been duplicated and synced between observer and observed. Information isn’t ‘transferred’ in the process of learning and entropy between two systems is reduced while the entropy of the overall environment they sit in is increased from waste heat etc. (To define this rigorously needs a consideration of Landauer and the idea that waste heat is only necessarily generated when bits are deleted. I would argue that information syncing necessarily results in the deletion of bits).

Given that observers and observed are both types of open ‘system’ (a truly closed system cannot be observed) we also need a strict definition of what an open system is.

A system is something with a permeable boundary (inputs and outputs but where there is something in between which changes the flow or processes those outputs), and contents defining its state and inputs and outputs, which mean its state changes over time or can be maintained as being different from its surroundings. The simplest open systems are where the changes to a flow go from a straight line to a full loop — such as what happens in the change from laminar to turbulent flow, where turbulence is actually a highly structured set of cascading loops (like gears) to allow a high rate of flow (a test of the low entropy of turbulence would be to see if it can actually increases flows compared to laminar ones).

[Laminar vs turbulent flow, Turbulent flow occurs as a phase change when flows (channels) go through a full loop to create the simplest form of open system: a loop (system boundary) with input and output). In turbulent flows, this sets off a cascade of smaller and smaller loops to dissipate flows efficiently. As such, turbulence is highly structured and low entropy, it merely looks like a mess to someone who doesn’t look hard enough — or, as Wolfram puts it, a computationally bound observer] and the efficient dissipation channels low to high entropy states at a greater rate than laminar flows, to compensate for the presence of the low entropy turbulence mechanism, to preserve the 2nd Law ever when someone ’does look hard enough’.]

The state of an observed system is measured relative to an observer’s viewing (calculation) capability. A non-passive (where, strictly speaking at the fine grained, quantum level there is no such thing as passive observation so it’s always like this) observer can change the state of a perceived system and the new state is based implicitly on the application of some known or unknown rule (all information processing happens according to implicit rules in Wolfram’s calculation based universe). But because all observations of information/energy flows are relative, there are no rules that are independent of observers and rules (laws of physics even?) are relative to the observer. A more sophisticated observer will perceive a change from a system at a current to future state as if a different rule had been applied and this can evolve over time as the observer learns.

The above is not as weird as it at first seems. An example would be a system which observes movement of planets based on Newtonian mechanics vs General Relativity. The latter has greater calculation ability based on a finer grained view of the world operating under different, more accurate rules.

In summary, the components of interacting systems in a calculation based model of the 2nd Law are systems and rules operating on them, where the apparent rule applied to a system depends on what resolution we look at. The specific rule applied will have a degree of probability based on the degree of coarse graining (since nothing can view the universe at the finest level of graining than the universe itself). The rule may even cycle to a different rule based on that probability if the uncertainty created by that probability is sufficient to allow the rule to ‘accidentally’ migrate or mutate into another one. The potential cycling over rules due to this probability has potentially profound consequences.

Wolfram hints at the trend to less and less coarse graining (learning) over time: “and what seemed like “random noise” just a few decades ago is now often known to have specific, detailed structure.

He also points out that the probabilistic nature of any interaction means that interations are not deterministic and this is the origin of the fact that interactions operating under the 2nd Law are irreversible.

“If the underlying rules which govern the universe are reversible, then in principle it will always be possible to go back from that future “random heat” and reconstruct from it all the rich structures that have existed in the history of the universe. But the point of the Second Law as we’ve discussed it is that at least for computationally bounded observers like us that won’t be possible. The past will always in principle be determinable from the future, but it will take irreducibly much computation to do so — and vastly more than observers like us can muster.”

i.e. the Universe appears irreversible for any observer within it.

There is a fuzziness in the apparent application of any rule, relative to any observing system introducing a degree of statistical uncertainty as to what deterministic application (rule) has been applied. This means a process could have a statistically different application again, including again but in reverse and therefore it is irreversible. This fuzziness can be described as noise, and its inevitability due to the nature of imperfect and subjective views by observers who are computationally bound.

If there is always noise in systems where rules are being applied, then rules can potentially be randomly swapped for other rules, due to this noise and all systems operate under a model of variable inheritance of the application of rules, which is how living systems evolve through the random mutation of DNA and its effect on survivability in terms of ability to reproduce. In the rule based model, the state of a system at a point in future is merely a special case of reproduction, equivalent to the death of the original system and a new incarnation.

In order for natural selection to operate, we need expansive reproduction (more offspring than ancestors) so that certain rules can predominate. This can be seen as a second phase change, where the first was the transition from laminar to turbulent flow at the point where flows go through a full loop.

Although our ability to process the universe and extract information from it will increase over time, the universe will still always proceed to perceived randomness (entropy increasing) as its progression evolves beyond our ability to process it (we are not just observing the state of a static universe but one which is itself changing over time). If we were to catch up, then entropy would reduce and eventually become zero, like the deterministic cellular automata that Wolfram describes.

This can apply to system of any size such as stars raining energy on planets and back into space). The components of the systems themselves a far from equilibrium and are therefore lower entropy than their surroundings. In other words the number of bits required to understand the universe increases as stuff happens and at a rate that exceeds any summary by one observer.

Undifferentiable microstates mean macrostates can have multiple microstates that are in fact slightly different and therefore their difference will be perceived as noise, or more accurately as a statistical chance that a microstate will not register. This brings randomness to any observer. If we reduce natural selection to an abstract, calculation based notion of the ‘variable inheritance of rules’ where that variability is created by noise, then any system will evolve via natural selection. — Wolfram doesn’t go this far and those that have suggested it (such as Jeremy England) have focused on energy vs calculation based approaches. I believe a physics (rather than biological) theory of natural selection can be proved to be a byproduct of an approach like Wolfram’s.

Imperfect Observation and Macrostates

The bounded computational ability of the observer means that some collections of microstates (e.g. gas particles in a box) will look effectively the same at the macro scale (some gas) and this is the same as the conventional modern definition of entropy which is the coarse grained view where the number of microstates per macrostate (arrangements of molecules in a box which still look like ‘some gas in a box’). In this way, entropy has long been a relative measure, even if it isn’t formally acknowledged.

Wolfram makes an attemopt to formalize this into a set of variables as described below, in ‘Towards a New Formulation of the 2nd Law’

S: state of an observed system (e.g. simply the values of cells in a cellular automaton)

Θ: observer function which gives a ‘summary’ of S’s state.

Ξ: evolution function applied to S for a number of steps (“We might represent an individual step by an operator ξ, and say that in effect Ξ = ξt. We can always construct ξt by explicitly applying ξ successively t times.”)

Ξ is an algorithm, so the state of the system over time is governed by iterations of an algorithm/rule/equation.

The basic claim of the Second Law is that the “sizes” normally satisfy the inequality:

Θ[Ξ[S]] ≥ Θ[S]

“or in other words, that “compression by the observer” is less effective after the evolution of system, in effect because the state of the system has “become more random””

i.e.the summary of the state of a system by an observer, increases over time. (Wolfram doesn’t say this, but another way of saying this is it learns).

“There are clearly some features Θ must have. For example, it can’t involve unbounded amounts of computation. But realistically there’s more than that. Somehow the role of observers is to take all the details that might exist in the “outside world”, and reduce or compress these to some “smaller” representation that can “fit in the mind of the observer””

Beyond the 2nd Law.

Claim of the Potential to Unify Relativity, Quantum and Statistical Mechanics on top of a Computational Approach to Physics.

This part of Wolfram’s analysis is the most speculative. He says that this approach, which is observer dependent, takes thermodynamics or statistical mechanics into an observer dependent, relativistic realm, alongside Relativity and Quantum Mechanics and that this calculation based approach underpins and potentially integrates all three. For example, he suggests the emergence of smooth, continuous spacetime from different discrete states of space defined by connections in a ‘hypergraph’ (spacetime itself emerges merely from properties of connections in a discrete, non physical, mathematical ‘space’) which ‘all looks the same’ at the scale which we perceive it, much like any high entropy system. Wolfram also argues that “the same core phenomenon responsible for randomness in the 2nd Law also appears to be what’s responsible for randomness in quantum mechanics.”

Natural Selection and Beyond Wolfram’s Thesis

Applying the idea of a relativistic (observer dependent) and computation based (cycling over rules) view of entropy to natural selection.

Although the section in wolfram’s piece titled: ‘The Mechanoidal Phase and Bulk Molecular Biology’ looks at biological systems and their relationship to Class 4 systems, he doesn’t look at the relationship between calculation based systems and natural selection.

A highly simplified, abstract model of natural selection can be described as the survival (dominance at the expense of alternatives) or a particular result (species) based on random mutations.

The ‘result’ here could be the dominance of a particular rule due to it being particularly sticky in an environment where rules are cycled through by some form of mutation. i.e. natural selection would be the variable inheritance and survival of rules, a gene centric view of the world, much like Dawkins’. By sticky, we mean predominant vs other rules and this typically happens by making it have more offsping than other rules. These offspring can be nested within the boundary of an existing system or outside, and at any scale.

If it so happened that the more complex (Turing complete, complex but lower entropy i.e. class 4 not class 2 or 3) outputs were more sticky as they acted as mechanisms to increase overall rate of entropy production of the system as a whole by creating duplicate systems as nested, cascading or parallel systems, then we would have the equivalent of low entropy living organisms which are compatible with the 2nd Law as although they themselves are low entropy they increase overall entropy production.

An example of this could apply to Wolfram’s one dimensional cellular automata, where the results of plots over the finite universe of rules would stand in for Schrodinger’s imagined ‘aperiodic crystal’ (which eventually turned out to be DNA), necessary for the persistence of information required to propagate living things operating under natural selection.

If we imagine an environment where rules are imperfectly applied and can be cycled through the entire set of 256 rules, such that a Rule 109 could suddenly mutate to a Rule 110. If Rule 110 allowed for maximized entropy production (by, amongst other things allowing nested cellular automata within itself, on account of being Turing complete) and this maximized entropy production somehow made it ‘more sticky’ and resilient to mutating further so that a different rule was applied, then it would come to dominate, but would still further evolve due to mutations on the nested components made possible by the potentially infinite complexity of a Turing complete machine.

We would then have a situation which would show that natural selection was an inevitable product of the variable inheritance of rules, and it would possibly show that natural selection was an inevitable product of physics and the 2nd Law, rather than merely biology.

What the observer dependent view of entropy allows for is a situation where a particular macrostate (e.g. a square in a one dimensional cellular automaton being black or white) would be dependent on underlying microstates that were invisible and equivalent to the observer and there would be a gray area where it could be wrongly interpreted as black or white, therefore allowing the ‘rule’ to mutate (variable inheritance or imperfect application of rules) and the universe of rulesets to cycle through until they became sticky.

To explain what this means consider a black square in such an automaton being dependent on the color of smaller squares in a 2x2 grid of microstates (at a resolution which are not identifiable as macrostates by the observer), under it (like a pixelated, or blurred image). If all 4 small pixels are black, then the bigger one is unconditionally black and if all 4 are white then it is unconditionally white. But if 1, 2 or 3 of 4 small pixels are black then the large pixel is shades of gray and there is a statistical likelihood of it being interpreted as white or black.

In this way, the observer dependent view of entropy allows for statistical interpretations of macrostates which effectively mean that there are mutations in the application of deterministic rules and rules are cycled through until ‘sticky’ ones are found and dominate.

We can see examples of nested systems in the nested structures within plots of class 4 elementary cellular automata, examples of cascading systems in the gear like, connected spirals of turbulent flow and examples of duplicate systems in living things. Clearly natural selection operates in the last example, if it apples to the second, then we have a physics based one based on energy flows, if it operates in the first case then we have a fundamental theory of natural selection as an emergent phenomenon in the interaction of any system

Such a scenario would allow natural selection to emerge from the real world, observer dependent operation of one dimensional cellular automata. It would be a byproduct of Wolfram’s computation based model of the 2nd Law and it would mean that life was an inevitable byproduct of the physics of open systems, maintained far from equilibrium by information or energy flows acting under the 2nd Law.