Part I
The Structure of Emergent Computation
Framing
Suppose the world, at its base, is neither discrete nor symbolic, not even lawful in the exact sense we write down in equations, but instead a ceaselessly evolving continuous process glimpsed only through the coarse lens of finite resolution. Then what we call physical law, stable object, and computation may each be a scale-dependent compression of that underlying flux, summoned into apparent solidity by the particular vantage of an observer.
Let \(Z\) denote the underlying process, and let
be the effective description visible at spatial scale \(\ell\) and temporal scale \(\tau\). The proposal is that emergence unfolds in stages, with each transition crystallising latent structure into something that can be named and tracked:
The central question is when this crystallisation goes all the way: when does a continuous physical process genuinely support a finite symbolic computation? The answer advanced here is precise: only when there exists a coarse-graining that is simultaneously decodable by an observer, stable over the relevant timescale, and dynamically closed, meaning the next coarse-grained state depends principally on the current one, not on the hidden microstructure within it. The first two conditions yield an operational alphabet. The third is what makes that alphabet compute.
Setup
Fix a sampling interval \(\Delta > 0\) and define the sequence of sampled effective states
Let \(\Pi : \mathcal{X} \to [k] := \{1, \ldots, k\}\) be a coarse-graining map, and define the induced symbolic process
Let \(R_n\) be the observer's record at time \(n\Delta\). We interpret \(A_n\) as a candidate symbolic state, a discrete token purporting to represent the system's macroscopic situation. The question is when \(A_n\) behaves like a genuine computational state rather than an arbitrary label pinned to a restless continuum.
Three Conditions
(i) Decodability
An observer must be able to recover the current symbol from whatever record is actually accessible:
or equivalently \(H(A_n \mid R_n) \le \varepsilon\). By Fano's inequality this furnishes a decoder \(\hat{A}_n = \delta_n(R_n)\) with error probability \(P_e^{(n)} \le p_\varepsilon\), where \(p_\varepsilon \to 0\) as \(\varepsilon \to 0\). A symbol that cannot be read is not a symbol at all.
(ii) Stability
The coarse-grained state must persist across one time-step:
This persistence condition ensures the alphabet does not churn faster than it can be used.
(iii) Approximate Lumpability
This is the decisive condition, the one that promotes a readable, persistent alphabet into a genuine computation. For every symbol \(i\) and every step \(n\),
Once you know which macrostate the system occupies, the precise microscopic details within that macrostate tell you almost nothing more about where it will go next. The symbol is enough. The continuous texture beneath it is, to leading order, irrelevant to the future of the symbol itself.
Emergent Finite-State Dynamics
Let \(X_n\) be a sampled continuous-state process and \(A_n = \Pi(X_n)\) its coarse-grained symbolic shadow. If decodability, stability, and approximate lumpability all hold on \(n = 0, \ldots, N\), then there exists a time-inhomogeneous Markov chain \(M_0, M_1, \ldots, M_N \in [k]\) with transition kernels
such that
Stability implies metastability of the symbolic chain:
Decodability supplies a low-error decoder \(\hat{A}_n\) from the accessible record \(R_n\). Approximate lumpability then says that, conditioned on \(A_n = i\), the distribution of \(A_{n+1}\) depends only weakly on the hidden continuous state \(X_n\). Thus the symbolic dynamics tracks the Markov kernel \(K_n(\,\cdot \mid i)\) step by step. Each step incurs at most \(\lambda\) from lumpability error, plus \(p_\varepsilon\) for decoding the present state and another \(p_\varepsilon\) for decoding the next. Summing over \(N\) steps yields the bound \(N(\lambda + 2p_\varepsilon)\). Stability enforces diagonal dominance on average.
This theorem is the structural core. Decodability together with stability had yielded an observer-relative operational alphabet, something that can be named and whose name endures. Approximate lumpability closes the loop: one obtains not merely symbols but an emergent symbolic dynamics, a finite-state machine that was nowhere explicitly written into the continuous substrate, yet arises inevitably from its organisation at the chosen scale.
Drift and Computation Epochs
Real physical systems need not maintain the same coarse-grained organisation indefinitely. Suppose the symbolic transition kernels drift slowly,
Over a window of length \(m\), the total discrepancy between the symbolic trajectory and a single fixed finite-state machine becomes
Given tolerance \(\theta\), define the computation epoch as the longest window over which the emergent finite-state description remains reliable:
The right question is no longer merely whether a system computes, but for how long, at what scale, and to what accuracy. Computation becomes a temporal phenomenon: a regime that opens when conditions align and closes as they drift apart.
Example: A Merging Double-Well
Consider the effective one-dimensional Langevin dynamics
For \(g > 0\), the potential harbours two metastable wells at \(x_\pm = \pm\sqrt{g/2}\). Define the binary coarse-graining
When the barrier is high, crossing events are rare: \(\Pr(A_{n+1} \neq A_n) \ll 1\). The sign of \(x\) is easily observed, so decodability is excellent. And when noise is small relative to barrier height, transition probabilities depend only weakly on the precise position within each well; lumpability is good. Theorem 1 then certifies that the system behaves, over that epoch, like an emergent one-bit Markov machine. It is, in every meaningful operational sense, a memory.
Now let the control parameter decay: \(g(t) = g_0 - \alpha t\). As \(g(t) \to 0\), the two wells merge. Crossing events become frequent, stability deteriorates, and lumpability degrades as the particle's exact position begins to matter for its future. The coarse-graining \(\Pi\) persists syntactically; \(L\) and \(R\) remain well-defined labels, but they no longer track genuine metastable sectors. The epoch ends. The one-bit computation dissolves. The example makes vivid a point the theorem states in the abstract: the boundary of computation is the boundary of metastability.
Part II
The Thermodynamic Foundation
What the Framework Left Unexplained
Part I established conditions under which a continuous process admits emergent computation and quantified the epoch over which that computation endures. But it treats lumpability as a datum, a property one either has or does not, and says nothing about where it comes from, how it can be achieved, or what it costs. Two questions were left entirely open.
First: what physical property of a system causes lumpability to be good (small \(\lambda\)) rather than poor? Classical lumpability theory Kemeny & Snell 1960 Buchholz 1994 characterises lumpability for Markov chains in purely structural terms but never asks what dynamics produces those structures in physical systems.
Second: even granting perfect lumpability, the framework does not distinguish between a system that merely mimics a finite-state machine and one that is genuinely causally organised at the macro level. The distinction, present in philosophy of mind and causal modelling, has no formal home in the existing construction. Dennett 1991 introduced the vocabulary of real patterns; Hoel et al. 2013 gave it a measure. Neither is connected to lumpability.
We address both gaps. The answers turn out to be intimately related: the same thing that makes lumpability good makes causal emergence possible, and that thing is non-equilibrium dissipation.
Entropy Production and the Sharpening of Macro-Boundaries
Let the micro-dynamics be a continuous-time Markov chain on a state space \(\Omega\) with generator \(Q\) and stationary distribution \(\pi\). The entropy production rate is Schnakenberg 1976
with equality if and only if \(Q\) satisfies detailed balance. For the sampled discrete-time chain \(P = e^{Q\Delta}\), let \(\lambda(P, \Pi)\) denote the lumpability parameter under coarse-graining \(\Pi\).
The intuition is straightforward. Lumpability fails when two micro-states \(x, y\) in the same macro-cell \(i\) have noticeably different escape distributions to other cells. In an equilibrium system, detailed balance ties escape rates rigidly to local Boltzmann factors, generically producing \(\lambda > 0\). A non-equilibrium probability current can be tuned to equalise the escape rates from within each cell, at the cost of dissipation, thereby reducing \(\lambda\).
The two-cell model
Suppose \(k = 2\) macro-states \(\{L, R\}\) with \(m = 4\) micro-states \(\{l_1, l_2, r_1, r_2\}\), with \(\Pi(l_i) = L\) and \(\Pi(r_i) = R\). The lumpability condition on the \(L \to R\) escape probability is
Let \(P = P_{\mathrm{eq}} + \varepsilon D\), where \(P_{\mathrm{eq}}\) satisfies detailed balance and \(D\) is an antisymmetric perturbation aligned with \(\Pi\): it drives probability from \(l_1\) toward \(l_2\), equalising their escape rates, without directly driving inter-cell transitions. Then to first order in \(\varepsilon\),
where \(c(\Pi) > 0\) depends on the geometry of the coarse-graining. Since \(\sigma(P) = \Theta(\varepsilon^2)\), we have \(\varepsilon = \Theta(\sigma^{1/2})\), and substituting yields the central inequality.
For the two-cell micro-chain with optimally aligned driving,
where \(\lambda_{\mathrm{eq}}(\Pi) := \lambda(P_{\mathrm{eq}}, \Pi)\) is the equilibrium lumpability error and \(c(\Pi) > 0\). Lumpability is a decreasing function of entropy production. It reaches zero at the critical dissipation rate
The result has a clean physical reading. Entropy production buys lumpability. A system with no dissipation has whatever lumpability its equilibrium structure happens to provide, which is typically poor, because detailed balance forces micro-states within a cell to maintain distinct escape tendencies. A driven system can sharpen its macro-boundaries at the cost of maintaining non-equilibrium currents. The improvement saturates: at \(\sigma = \sigma_c\), the macro-boundaries are perfectly sharp.
Thermodynamic Computation Epochs
Plugging Theorem 2 into the computation epoch formula gives epoch length as an explicit function of dissipation. For a system with power budget \(\dot{W}\), the entropy production rate is \(\sigma = \dot{W}/(k_BT)\) by the second law.
Define the power-dependent lumpability error
and the thermodynamic computation epoch
Three regimes emerge. Subcritical (\(\dot{W} < \dot{W}_c\)): lumpability remains positive and the epoch is finite, growing as the budget increases. Critical (\(\dot{W} = \dot{W}_c\)): lumpability reaches zero; in the ideal case \(\rho = p_\varepsilon = 0\), the epoch diverges. Supercritical (\(\dot{W} > \dot{W}_c\)): additional driving begins to destabilise the computation by increasing inter-cell crossing rates, and the epoch shrinks again. There is therefore an optimal power \(\dot{W}^*\) near the critical point.
In the subcritical regime,
Epoch length diverges as \(\dot{W} \to \dot{W}_c\), and a system that cannot dissipate achieves only \(\Tc^{(0)} = \theta\Delta / (\lambda_{\mathrm{eq}} + 2p_\varepsilon)\).
This is the first explicit relation between a physical energy budget and the temporal validity of an emergent computation. It is not an analogy: it is a bound, derivable from the second law together with the dissipation-lumpability inequality.
Causal Computation Gain and the Necessity of Non-Equilibrium
Theorem 1 asks whether the observable symbolic trajectory is close in total variation to a Markov chain. But closeness in distribution is a statistical criterion; it does not address whether the emergent macro-dynamics has genuine causal power at the macro scale, or whether it is merely a shadow cast by micro-level causes that happen to look Markovian when coarse-grained. We borrow Hoel's effective information Hoel et al. 2013 to make this distinction rigorous.
For a Markov kernel \(K\) on \([k]\), the effective information under uniform intervention is
For the micro-dynamics \(P\) on \(\Omega\) viewed through \(\Pi\), the projected effective information is
The causal computation gain of the coarse-graining \(\Pi\) under micro-dynamics \(P\) is
The system exhibits genuine causal computation when \(\CCG(\Pi, P) > 0\): knowing the macro-state and letting the macro-dynamics evolve gives more causal information about the next macro-state than tracing the full micro-dynamics to the macro level.
If \(P\) satisfies detailed balance (\(\sigma(P) = 0\)), then for every coarse-graining \(\Pi\),
Genuine causal computation requires strictly positive entropy production: \(\CCG(\Pi, P) > 0 \Rightarrow \sigma(P) > 0\).
For a reversible chain, the stationary measure \(\pi\) satisfies \(\pi_x P_{xy} = \pi_y P_{yx}\). The intra-cell variation in \(\pi\), measured by \(D_{\mathrm{KL}}(\pi_{|i} \| u_{|i})\), the divergence between the conditional stationary distribution within cell \(i\) and the uniform, contributes positively to \(\EI(P, \Pi)\) but not to \(\EI(K)\), because the emergent kernel \(K\) averages over intra-cell structure by definition. Detailed balance forces this intra-cell variation to track the local potential landscape, which prevents \(\EI(K)\) from exceeding \(\EI(P, \Pi)\); a data processing inequality applied to the intervention distribution closes the argument. Away from equilibrium, non-equilibrium currents within each cell can reduce the intra-cell variation, equalising the micro-states seen from outside, thereby allowing \(\EI(K) > \EI(P, \Pi)\). \(\square\)
Dissipation is not the cost of computation. It is the mechanism.
Theorem 3 is, we believe, the correct formal statement of the intuition that life computes because it dissipates. It is not a metaphor for thermodynamic irreversibility, nor an analogy to Landauer erasure. It is a precise causal claim: the only way a physical system can achieve a macro-level causal organisation that exceeds the micro-level causal organisation, viewed through the same coarse-graining, is to maintain nonzero entropy production.
Co-Dissipation: The Observer's Thermodynamic Share
The decodability condition \(H(A_n \mid R_n) \le \varepsilon\) requires the observer to maintain a record \(R_n\) with high mutual information with the symbolic state. This is a physical requirement: the record is a physical system, and sustaining it has a thermodynamic cost. By Landauer's principle Landauer 1961 Bennett 1982, the cost of maintaining a record with decoding error \(p_\varepsilon\) over one time step requires the erasure of at least \(H(p_\varepsilon)\) bits, at thermodynamic cost
The total thermodynamic budget for sustaining a computation epoch of length \(\Tc\) must include both the system's dissipation and the observer's recording cost:
For a computation epoch of length \(\Tc\) at tolerance \(\theta\) and sampling interval \(\Delta\), the minimum total energy expenditure satisfies
In the limit \(\theta \to 0\) (zero tolerance), \(W_{\mathrm{total}}^*\) diverges: error-free computation has infinite thermodynamic cost.
The co-dissipation picture changes what we should mean by a physical computer. A computation is not a property of a system in isolation; it is a property of the system-observer pair. A universe with no observers capable of maintaining records cannot compute, not because computation requires consciousness, but because decodability is a thermodynamic condition on the observer. The system dissipates to sharpen its macro-boundaries; the observer dissipates to read them. Both expenditures are necessary, and both appear in the epoch bound.
Part III
Synthesis and Open Questions
The Unified Picture
The four theorems together constitute a single coherent framework. Part I established that continuous dynamics can support emergent computation when decodability, stability, and lumpability coincide. Part II showed that these conditions are thermodynamically grounded: lumpability is purchased by dissipation, causal emergence requires it, and the observer's recording costs belong in the same budget. The spine of the framework runs as follows:
| Result | What it establishes | Prior art engaged |
|---|---|---|
| Thm 1 | Continuous process ≈ finite-state machine when three conditions hold | Kemeny & Snell; Buchholz; Shannon |
| §5 | Computation is temporally bounded; epoch length quantified | Freidlin-Wentzell; Landauer |
| Thm 2 | Dissipation reduces lumpability error as \(\sqrt{\sigma}\) | Schnakenberg; Crooks; Jarzynski |
| §9 | Epoch length increases with power budget; critical point exists | Crooks fluctuation theorem |
| Thm 3 | Equilibrium cannot exhibit causal emergence | Hoel et al.; Pearl do-calculus |
| Thm 4 | System and observer co-dissipate; joint bound on epoch cost | Landauer; Bennett; Maxwell's demon |
A physical computation is an emergent finite-state dynamics carved from continuous process by readable, persistent, and approximately lumpable structure, sustained against drift by dissipation, and bounded in duration by the joint thermodynamic budget of the system and its observer.
Discussion and Open Problems
Several open problems remain, each pointing toward a deeper theory.
Generalising the dissipation-lumpability inequality
Theorem 2 is proven in the two-cell perturbative regime. The inequality \(\lambda \ge \lambda_{\mathrm{eq}} - c\sqrt{\sigma}\) should hold generally, or at least for coarse-grainings whose boundaries align with the dominant non-equilibrium currents, but the proof in full generality requires tools from large-deviation theory and the geometry of Wasserstein spaces that go beyond the present sketch. The natural conjecture is that the inequality holds whenever the coarse-graining is a Lyapunov function for the deterministic drift, a condition that connects the problem to the theory of metastability developed by Freidlin and Wentzell Freidlin & Wentzell 1984.
Rigorous causal emergence
Theorem 3 rests on an information-monotonicity argument for reversible chains. Making this fully rigorous requires careful treatment of the intervention distribution in the continuous-time limit and a precise definition of effective information for chains that are not finite or stationary. The connection to Pearl's do-calculus Pearl 2009 is suggestive: causal computation gain measures whether the emergent macro-intervention distribution is further from the uniform than the projected micro-intervention distribution, and this framing may allow a cleaner proof via the properties of causal graphs.
Optimal coarse-graining
The framework takes \(\Pi\) as fixed throughout. But nothing forces this choice. Given a system and a power budget, one can ask: which coarse-graining maximises the causal computation gain \(\CCG(\Pi, P)\) subject to the co-dissipation budget? This is a well-posed variational problem. Its solution would define the natural computation supported by a given physical system at a given power level: the coarse-graining that matter, dissipating at rate \(\sigma\), most wants to compute. This reformulation is perhaps the most important consequence of the framework: rather than asking whether a system computes, we ask what it optimally computes, and the answer is determined by its thermodynamic situation.
Biological implications
Living systems maintain far-from-equilibrium states precisely by consuming free energy at rates tuned to sustain the metastable structures on which their computations depend. The framework suggests that the information processing capacity of a cell, a neural circuit, or an organism is bounded not merely by the speed or size of its components, but by its power budget and the geometry of its coarse-grainings. Whether the optimal coarse-graining problem, solved for biological systems, recovers known features of neural coding or metabolic organisation is an open empirical question, but one the framework makes precise enough to ask.
References
Bennett 1982. The thermodynamics of computation: a review. Int. J. Theor. Phys. 21, 905–940. Foundational treatment of Landauer's principle and reversible computation.
Buchholz 1994. Exact and ordinary lumpability in finite Markov chains. J. Appl. Probab. 31, 59–75. Structural characterisation of lumpability; no thermodynamic content.
Crooks 1999. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 60, 2721. Fluctuation theorem underlying the epoch-power derivation.
Dennett 1991. Real patterns. Journal of Philosophy 88(1), 27–51. The philosophical framing for observer-relative emergence.
Freidlin & Wentzell 1984. Random Perturbations of Dynamical Systems. Springer. Metastability theory for stochastic systems; covers the double-well in full generality.
Hoel, Albantakis, Tononi 2013. Quantifying causal emergence shows that macro can beat micro. PNAS 110(49), 19790–19795. Defines effective information and causal emergence; no connection to lumpability or thermodynamics.
Jarzynski 1997. Nonequilibrium equality for free energy differences. Phys. Rev. Lett. 78, 2690. Connects free energy dissipation to trajectory statistics.
Kemeny & Snell 1960. Finite Markov Chains. Van Nostrand. Classical lumpability; the algebraic starting point for the present work.
Landauer 1961. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 5, 183–191. The erasure bound underlying the co-dissipation calculation.
Pearl 2009. Causality: Models, Reasoning and Inference. Cambridge. The do-calculus; used in the causal emergence definitions.
Schnakenberg 1976. Network theory of microscopic and macroscopic behavior of master equation systems. Rev. Mod. Phys. 48, 571. Defines entropy production for Markov chains; the thermodynamic backbone of Part II.
Shannon 1948. A mathematical theory of communication. Bell System Technical Journal 27(3), 379-423; 27(4), 623-656. Foundational source for entropy, mutual information, and coding-theoretic decodability.
Deuflhard & Weber 2005. Robust Perron cluster analysis in conformation dynamics. Linear Algebra and its Applications 398, 161-184. Spectral and transfer-operator perspective on metastable coarse-grainings.
Schuette & Sarich 2013. Metastability and Markov State Models in Molecular Dynamics. American Mathematical Society. A modern account of metastability, transfer operators, and spectral gaps.