Conservation of Information: The History of an Idea
Conservation of information is a powerful idea in both physics and computer science. This article provides a historical overview of the concept.
Over the last two months, I’ve been writing and revising a paper titled “The Law of Conservation of Information: Search Processes Only Redistribute Existing Information.” I submitted an earlier draft for publication to the journal Bio-Complexity, but feedback I received on it asked about the appropriateness of the term “conservation of information” for the aspect of search that the paper is examining. The history of the term convinced me that “conservation of information” is the right way to designate what’s at stake here in search. This paper forms an appendix to the paper I submitted to Bio-Complexity. Because the history recounted here largely stands on its own feet, I’m making it available through my Substack column.
“Conservation of information” is a term that appears in both the physics and the computer science literature. In physics, it is concerned with physical processes that are in some sense reversible, allowing one physical state to be recovered from another and vice versa. Information is thereby conserved between states. In computer science, conservation of information is concerned with search processes for which information output cannot exceed prior information input. Information is thereby conserved when output matches rather than falls short of input.
These two uses of the term intersect insofar as physical processes can be involved in performing search. Nonetheless, how conservation of information is understood in physics is quite different from how it is understood in computer science. In physics, whenever the word “conservation” is used, there’s a heavy emphasis on whatever is being conserved as remaining strictly constant—no gain, no loss. Conservation of energy in a closed system, for instance, means that the total energy of that system stays strictly constant, though the forms that the energy in the system takes can change (potential, kinetic, chemical, electrical, nuclear, etc.).[i] By contrast, with respect to search, “conservation” denotes an ideal limit (upper bound) that search processes may achieve but also in general may fall short of.
Conservation of information in physics arises in various situations. Stephen Hawking, for instance, memorably proposed that information represented in matter may be irretrievably lost once sucked into a black hole that then evaporates in what has come to be called Hawking radiation. Under these circumstances, Hawking saw conservation of information as violated. Leonard Susskind, by contrast, proposed a holographic shifting of information from higher-dimensional spaces to lower-dimensional boundaries along with the idea of black hole complementarity. In consequence, he theorized that information encoded on the event horizon could be recovered from Hawking radiation in highly scrambled form, preserving information rather than destroying it. In this way, Susskind saw even black holes as obeying conservation of information.[ii]
In classical thermodynamics, conservation of information is tied to the second law of thermodynamics and the concept of entropy. While entropy has an inherent tendency to increase, suggesting a loss of order or information in macroscopic systems, statistical mechanics treats the underlying microstates of a system as unfolding deterministically and therefore keeping information intact. Accordingly, conservation of information becomes a consequence of ergodic theory, in which all states of a suitably structured dynamical system will recur. In statistical mechanics, the relevant result from ergodic theory is Liouville’s theorem, which states that the phase-space density of a closed system remains constant over time, implying that information about initial conditions is conserved, even if it appears jumbled at a macroscopic level.[iii]
Other examples of conservation of information in physics can readily be found. In quantum mechanics, the unitary evolution of quantum states as governed by the Schrödinger equation preserves the total information encoded in a quantum state. This conservation of quantum information was also underscored in the more recently proven no-hiding theorem.[iv] Classical physics, which is deterministic and whose equations underwrite the time-reversibility of certain physical processes, implies that information about a system at any one point in time can, in principle, be reconstructed from any other point in time, thereby conserving information. Laplace, for instance, argued that conservation of information (without using this terminology) applied across the entire universe. Thus, over 200 years ago in his Celestial Mechanics, he argued that from Newton’s physics it followed that knowing the positions and momenta of all particles at a given moment in time could be used to predict and retrodict every aspect of the universe’s unfolding, down to the tiniest details.[v]
I recount these examples of conservation of information in physics not only to indicate that conservation of information is a respectable notion within physics but also to distinguish its role in physics from its role in search. In physics, conservation of information refers to the precise preservation of the same quantity of information through various physical transformations, with any one instance of information being recoverable from any other. Yet when it comes to search, conservation of information is concerned with information outputs never exceeding information inputs. Conservation of information in search says you can’t get something from nothing, though it allows that starting with something you might end up with less or even nothing.
Conservation of information in search is more practical than conservation of information in physics. From the vantage of conservation of information in physics, the destruction of the Library of Alexandria by fire did nothing to destroy the information in its volumes—all the information still exists and is even recoverable if we could but reverse in time the physical processes at play since the library’s destruction. A super-physicist could thus in principle reconstruct the entire Library of Alexandria. Conservation of information in search, however, recognizes “frictional forces” that lead to inefficiencies in information transfer, treating information loss as the norm and full preservation as a seldom-attained ideal. From the vantage of conservation of information in search, the Library of Alexandria is indeed lost.
The actual term “conservation of information” makes more than a passing appearance first in the writings of biologist and Nobel laureate Peter Medawar in a slender volume titled The Limits of Science, which was published in 1984. In fact, repeating the title of the larger paper from which this article is drawn, Medawar introduced the concept in a section of the book titled “The Law of Conservation of Information.”[vi] He formulated this law as follows: “No process of logical reasoning—no mere act of mind or computer-programmable operation—can enlarge the information content of the axioms and premises or observation statements from which it proceeds.”[vii] Rather than defend this law, Medawar treats it as self-evident: “I attempt no demonstration of the validity of this law other than to challenge anyone to find an exception to it—to find a logical operation that will add to the information content of any utterance whatsoever.”[viii]
Medawar explains in the preface to The Limits of Science that he drew inspiration for this law and the terminology of “conservation of information” from physicist H. A. Rowland, who in 1899 presented a paper to the American Physical Society titled “The Highest Aim of the Physicist.” In that paper, Rowland formulated “the law of the conservation of knowledge” as follows: “A mathematical investigation always obeys the law of the conservation of knowledge: we never get out more from it than we put in. The knowledge may be changed in form, it may be clearer and more exactly stated, but the total amount of the knowledge of nature given out by the investigation is the same as we started with.”[ix] Medawar’s law and Rowland’s law are entirely parallel.
“The Law of Conservation of information” conveniently characterizes the consistent finding that for search processes, subsequent information output never exceeds prior information input. Medawar and Rowland expressed this idea in terms of conservation, but so have others with or without that terminology. In a book on science and information published in 1962, the French physicist Léon Brillouin made exactly the same point:
It has been often hinted that a computing machine actually manufactures new information. This is not really the case. Machines are able to process information; they take the raw material and they give out a finished product, but the total amount of information has not been increased. In the best ideal circumstances, information may be kept constant during the computing, but under normal conditions there will be some loss and the final information will be smaller than the input.[x]
And what is the ideal circumstance in which no information is lost? According to Brillouin,
If the machine [for computing firing tables of projectiles] works perfectly it must be reversible: given the impact [of the projectiles] it may compute backward and obtain the initial conditions. In a similar way, a translator may translate back into Japanese, and should recover the original paper or its equivalent. Ideal operating conditions correspond to reversibility, hence to conservation of information.[xi]
This passing reference to conservation of information is the earliest I have been able to find in the literature. Brillouin sums up the matter thus: “The [computing] machine does not create any new information, but it performs a very valuable transformation of known information.”[xii]
This idea that machines can only output what they first received as input goes well back to the first half of the nineteenth century to Charles Babbage’s Analytical Engine. In its design, this was first full-fledged digital computer, though given the materials of the time Babbage was unable to make it fully functional. Babbage’s Analytical Engine was designed to be a mechanical digital computer, not an electronic digital computer as we have in our day and that has proven itself in practice to be workable. Yet Babbage and his colleague Ada Lovelace understood the theoretical implications of their computing machine, even if they were never able to get a full working model. As Lovelace put it in her 1843 notes about Luigi Menabrea’s memoir on Babbage’s Analytical Engine,
The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths. Its province is to assist us in making available what we are already acquainted with.[xiii]
The idiom here is slightly different, but the point is exactly the one made by Medawar and Brillouin.
A few years earlier, in 1836, Edgar Allen Poe likewise made the same point, and again in addressing Babbage’s Analytical Engine. Poe, in an essay titled “Maelzel’s Chess Player,” investigated the claim that this supposed chess playing machine was an automaton that could play chess without human intervention. As he saw it, Maelzel’s chess player was a subterfuge: it was made to appear as though a machine was playing chess all by itself even though a human hidden inside it was in fact playing chess. Thus, while Maelzel would make a show of letting the audience see the interior of his chess player, in fact the full workings of the interior were always occluded, allowing a small human to lurk and scurry about inside the so-called machine. As Poe put it,
The man within is now at liberty to move about. He gets up into the body of the Turk [i.e., the visible puppet that moves the chess pieces] just so high as to bring his eyes above the level of the chess-board. It is very probable that he seats himself upon the little square block or protuberance which is seen in a corner of the main compartment when the doors are open.[xiv]
Poe then contrasted this pseudo-machine with Babbage’s Analytical Engine, whose initial conditions, once specified, allow it to proceed automatically without human intervention, thus making it a real machine:
Arithmetical or algebraical calculations are, from their very nature, fixed and determinate. Certain data being given, certain results necessarily and inevitably follow. These results have dependence upon nothing, and are influenced by nothing but the data originally given. And the question to be solved proceeds, or should proceed, to its final determination, by a succession of unerring steps liable to no change, and subject to no modification. This being the case, we can without difficulty conceive the possibility of so arranging a piece of mechanism, that upon starting it in accordance with the data of the question to be solved, it should continue its movements regularly, progressively, and undeviatingly towards the required solution, since these movements, however complex, are never imagined to be otherwise than finite and determinate. But the case is widely different with the Chess-Player…[xv]
Like Medawar, Brillouin, and Lovelace, Poe is here making the point that information output by a mechanical process cannot exceed or otherwise transcend prior information input.
I’ve reviewed this history of conservation of information for search, whether called by that name or described in equivalent terms, to show just how deep-seated this idea is. Early accounts of conservation of information applied to deterministic searches based on fully specified data (“axioms” or “premises”) and acted upon by fully specified operations (“logical” or “computer-programmable”). Notably lacking from these early accounts were stochastic elements. Accordingly, the outputs of these deterministic searches, if they were assigned probabilities at all, would have probability 0 or 1. The goal of these searches was therefore not to resolve needle-in-a-haystack problems but rather to confirm that certain outputs followed necessarily from certain inputs or else were entirely precluded by those inputs. I want therefore next to fast-forward to the 1990s, when conservation of information was generalized to cover the full range of search.
In the 1990s, the biggest result in conservation of information for search was the no-free-lunch theorems of computer scientists David Wolpert and William Macready. Their first paper on the subject was a Santa Fe Institute technical report titled “No Free Lunch Theorems for Search,”[xvi] positioning these theorems squarely on the side of search (as opposed to physics), and focused on search in general (both deterministic and stochastic). This paper appeared in 1996 as a typescript for limited dissemination. Its full formal version appeared subsequently in 1997 in an IEEE journal, where it received a new title: “No Free Lunch Theorems for Optimization.”[xvii] But optimization is simply a search for something optimal, so the focus of this latter paper was still on search.
The formulation and proof of the no-free-lunch theorems requires some technical firepower from mathematics and computer science, and so a detailed account is not readily comprehensible to non-technical readers. Yet the underlying idea of no free lunch is straightforward and intuitively convincing. As a helpful example that Robert Marks and I have used to illustrate the idea of no free lunch, imagine 52 cards from a standard deck have been arranged at random face down on a table. The challenge is to find the ace of spades (A♠). The probability of flipping over A♠ in five or fewer card flips does not depend on any method of flipping the cards. For instance, noting that a previously flipped card is, say, K♣ provides no guidance for the next card to flip over and does not help increase the probability of successfully locating A♠. After five card flips, no matter what method is used, the probability of find A♠ remains the same, namely, 5/52.[xviii]
While this example makes good intuitive sense, it also oversimplifies the no-free-lunch theorems. What we are calling a method for finding the target (the target in this example being A♠) is, in the no-free-lunch theorems, a search algorithm. Search algorithms in computer science consist of four main components: (1) an initialization, defining the starting point of the search and often chosen at random; (2) a feedback mechanism, providing information on the degree of optimality of each item queried (represented as a fitness function in evolutionary computing and as a parameterization in neural networks[xix]); (3) an update rule, which uses feedback to decide the next location(s) in the search space to be queried; and (4) a stop criterion, specifying when to terminate the search (ideally, the search terminates once it successfully locates the target, but it may also terminate when the search has executed a fixed number of queries or even when it runs out of steam for having exhausted its computational or other scarce resources). Note that biological evolution, inspired by this conception of search, sees points (2) and (3) as corresponding to differential survival and reproduction.
The no-free-lunch theorems involve technical details, but in terms of the framework for search algorithms outlined in the previous paragraph, the essence of the no-free-lunch theorems is captured in the interplay between points (2) and (3). Imagine two algorithms A and B searching a given search space. The no-free-lunch theorems assume feedback information for A and B is sufficiently varied and extensive to allow for effective search of the entire search space by either algorithm. In other words, any target is searchable anywhere in the search space because by assumption feedback information is available that enables it to be found.[xx] No free lunch then states that as the update rule manages this feedback information, search performance from algorithm A to algorithm B, when averaged across all items of feedback information for each, is constant and equivalent to blind search. What this means is that no search algorithm has an across-the-board advantage over any other search algorithm, and so, in the absence of specific feedback information, any search algorithm is no better than blind search, which by definition is the least efficient form of search.[xxi]
I distinctly remember around 2000 having a conversation with an MIT computer scientist about the predilection of her students to throw an evolutionary algorithm at any problem they might encounter, as though evolutionary algorithms were universal problem solvers having special advantages over other types of algorithms. She saw this as naïve, out of step with the then recently proven no-free-lunch theorems. To her mind, there was nothing magical about these algorithms. Instead, their success was mundane, dependent on effectively harnessing them via information feedback mechanisms to the problems at hand (for evolutionary algorithms, feedback took the form of fitness functions).
These days, with the success of large language models (LLMs), the search algorithm of choice to throw at problems is neural networks. Granted, neural networks are remarkably versatile. Yet they are useless unless judiciously updated through their feedback mechanisms. No free lunch makes precisely that point: all the interesting work in search algorithms is contained in their feedback mechanisms and, specifically, in the particular feedback information that is supplied to the algorithm. Absent appropriate feedback information, no algorithm outperforms blind search. And since blind search is always feasible, blind search sets the baseline for all search.
An equivalent way to understand no free lunch is in terms biasing. Successful search always requires biasing a search in the direction of a successful outcome, implying that biasing can also occur in the direction of unsuccessful outcomes. And so, when all this biasing is averaged out, no advantage is obtained in finding the object of interest (think of an easter egg hunt guided by good and bad directions—good directions guarantee you’ll find the egg, bad directions that you won’t). Because no-free-lunch theorems assert that average performance of certain classes of search algorithms remain constant at the level of blind search, these theorems have very much a conservation of information feel in the sense of that conservation is strictly maintained and not merely that conservation is the best that can be achieved, with loss of conservation also a possibility.
Computer scientist Thomas English, writing about the no-free-lunch theorems of Wolpert and Macready soon after their groundbreaking work first appeared, thus went so far as to redub their results as “conservation of information” results:
The recent “no free lunch” theorems of Wolpert and Macready indicate the need to reassess empirical methods for evaluation of evolutionary and genetic optimizers. Their main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average. The present work [is about] the conservation of information as an optimizer evaluates points. It is shown that the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions. Inasmuch as information about one distribution is misinformation about another, there is no generally superior function optimizer. Empirical studies are best regarded as attempts to infer the prior information optimizers have about distributions—i.e., to determine which tools are good for which tasks.[xxii]
With the no-free-lunch theorems, something is clearly being conserved in that performance of different search algorithms, when averaged across the range of feedback information, is constant and equivalent to performance of blind search. The question then arises how no free lunch relates to the consistent claim in the earlier conservation-of-information literature about output information not exceeding input information. In fact, the connection is straightforward. The only reason to average performance of algorithms across feedback information is if we don’t have any domain-specific information to help us find the target in question.
Consequently, no free lunch tells us that without such domain-specific information, we have no special input information to improve the search, and thus no way to achieve output information that exceeds the capacities of blind search. When it comes to search, blind search is always the lowest common denominator—any search under consideration must always be able to do at least as good as blind search because we can always execute a blind search. With no free lunch, it is blind search as input and blind search as output. The averaging of feedback information treated as input acts as a leveler, ensuring parity between information input and output. No free lunch preserves strict conservation precisely because it sets the bar so low at blind search.
I’ve been pondering the no free lunch theorems since the late 1990s. In that time, I’ve always found these theorems to be intriguing but also unsatisfying. It’s fine and well that search algorithms averaged across a comprehensive breadth of different items of feedback information cannot rise in performance above blind search. The fact is, however, what makes search algorithms interesting is their ability to perform better than blind search. Blind search sets a low bar for search performance that is meant to be surpassed.
Classical conservation of information, as developed by Medawar, Brillouin, and others, required that any increase in information output always had to be repaid by information input at least as costly. But classical conservation of information didn’t put precise numbers on the relation between information input and information output. In fact, it didn’t have to because its searches depended entirely on logical consequence relations (logical inference rules or logical programming steps) that proceeded deterministically. Its searches were not so much blind as exhaustive. For purely deterministic exhaustive search, the probabilities are all 0 or 1, and success is all or nothing. Classical conservation of information therefore didn’t require fine gradations for measuring information.
But as soon as searches also incorporated stochastic elements, and thus were no longer deterministic, precise quantitative relations between information input and information output were required for any causally adequate theory of search. A needle-in-a-haystack search with probability p close to 0 for successfully finding the needle requires additional information if that probability is to be raised to a probability q close to 1 sufficient for successfully finding the needle. Conservation of information as developed in the work of Robert Marks, Winston Ewert, George Montañez, and me then established that raising the probability of successful search from p to q incurred a probability cost of p/q, thereby providing a precise numerical accounting of how information output in search cannot exceed prior information input.[xxiii]
This earlier work on conservation-of-information showed that attempts to enhance search success (such as turning a near-impossible search into a feasible one) merely shifted the informational burden rather than eliminated it. But this earlier work was adapted on a case-by-case basis to specific types of search and therefore lacked full generality. The larger paper for which this article forms a historical appendix establishes a unifying probabilistic framework that generalizes all the previous work on conservation of information. By distilling its findings into a single fundamental relation of probability theory, this work provides a definitive, fully developed, general formulation of the Law of Conservation of Information, showing that information that facilitates search cannot magically materialize out of nothing but instead must be derived from pre-existing sources.
[i] The term “conservation of energy” goes back to 1851 to William Thomson (Lord Kelvin). For a brief history of conservation of information, see Ricardo Lopes Coelho, “On the Concept of Energy: How Understanding Its History Can Improve Physics Teaching,” Science & Education 18(8) (2007): 961–983.
[ii] Leonard Susskind and James Lindesay, An Introduction to Black Holes, Information, and the String Theory Revolution: The Holographic Universe (Singapore: World Scientific, 2005), ch. 9. The title of this chapter is, tellingly, “The Puzzle of Information Conservation in Black Hole Environments.”
[iii] I. P. Cornfeld, S. V. Fomin, and Ya. G. Sinai, Ergodic Theory (New York: Springer, 1982), sec. 2.2.
[iv] Samuel L. Braunstein and Arun K. Pati, “Quantum Information Cannot Be Completely Hidden in Correlations: Implications for the Black-Hole Information Paradox,” Physical Review Letters 98(8) (2007-02-23): 080502.
[v] As Laplace put it:
We ought then to regard the present state of the universe as the effect of its anterior state and as the cause of the one which is to follow. Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective situation of the beings who compose it—an intelligence sufficiently vast to submit these data to analysis—it would embrace in the same formula the movements of the greatest bodies of the universe and those of the lightest atom; for it, nothing would be uncertain and the future, as the past, would be present to its eyes. The human mind offers, in the perfection which it has been able to give to astronomy, a feeble idea of this intelligence. Its discoveries in mechanics and geometry added to that of universal gravity, have enabled it to comprehend in the same analytical expressions the past and future states of the systems of the world. Applying the same method to some other objects of its knowledge, it has succeeded in referring to general laws observed phenomena and in foreseeing those which given circumstances ought to produce. All these efforts in the search for truth tend to lead it back continually to the vast intelligence which we have just mentioned, but from which it will always remain infinitely removed. This tendency, peculiar to the human race, is that which renders it superior to animals; and their progress in this respect distinguishes nations and ages and constitutes their true glory.
Quoted from Patrick Suppes, Probabilistic Metaphysics (Oxford: Basil Blackwell, 1984), 17–18.
[vi] Peter Medawar, The Limits of Science (New York: Harper & Row, 1984), 78–82.
[vii] Ibid., 79.
[viii] Ibid.
[ix] H. A. Rowland, “The Highest Aim of the Physicist,” Presidential Address to the American Physical Society (October 28, 1899), 11, available online at https://history.aip.org/exhibits/gap/PDF/rowland_aim.pdf (last accessed January 10, 2025), emphasis added.
[x] Léon Brillouin, Science and Information Theory, 2nd ed. (New York: Academic Press, 1962), 267.
[xi] Ibid., 268, emphasis added.
[xii] Ibid., 269.
[xiii] Ada Lovelace, “Translator’s Notes on Menebrea’s Memoir,” in R. Taylor, ed., Scientific Memoirs, Selected from the Transactions of Foreign Academies of Science and Learned Societies, and from Foreign Journals, vol. 3, 691–731 (London: Richard and John E. Taylor, 1843), 722. Emphasis in the original. Interestingly, Alan Turing spends much precious space in his famous article “Computing Machinery and Intelligence” (Mind 59(236) (1950): 433–460) responding to this quote by Lovelace, arguing that machines such as Babbage’s Analytical Engine could nonetheless be said to think provided they reached a certain critical threshold of complexity (presumably, then, as evidenced by them in passing the imitation game, convincing humans that they are themselves human). But even the claim that machines can think in the same way that humans think (i.e., computational reductionism) does not contradict Lovelace’s point, which is that machines in their outputs cannot exceed their inputs. Though I reject that human minds are equivalent to computing machines (which is to say I reject computational reductionism), it seems that a charitable reading of Lovelace credits her with making the same point as Medawar and Brillouin. It’s a question of whether machines are limited in their information output by their prior information input. Whether machines can think and what it would mean for them to think are separate questions.
[xiv] Edgar Allen Poe, “Maelzel's Chess-Player,” Southern Literary Messenger 2 (April 1836): 318–326, 322. Available online at https://www.eapoe.org/works/essays/maelzel.htm (last accessed January 13, 2025).
[xv] Ibid., 319. Emphasis in the original.
[xvi] David Wolpert and William Macready, “No Free Lunch Theorems for Search,” SFI-TR-95-02-010 (Santa Fe Institute), February 23, 1996.
[xvii] David H. Wolpert and William G. Macready, “No Free Lunch Theorems for Optimization,” IEEE Transactions on Evolutionary Computation 1(1) (1997): 67–82.
[xviii] William A. Dembski and Robert J. Marks II, “Bernoulli’s Principle of Insufficient Reason and Conservation of Information in Computer Search,” Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA (October 2009): 2647–2652, 2648.
[xix] A fitness function in evolutionary computing and a parameterization in neural networks both guide optimization, but in different ways. A fitness function evaluates candidate solutions in evolutionary algorithms, determining their optimality. The evolutionary algorithm then selects the best-performing solutions for the next round of fitness evaluation. In contrast, parameterization in neural networks refers to the weights and biases that define the model’s behavior. These parameters are iteratively refined through gradient-based optimization like backpropagation, adjusting continuously based on the gradients of a loss function. While evolutionary computing relies on discrete selection driven by fitness evaluation, neural networks optimize through continuous parameter updates. Both approaches seek improved performance over iterations, but fitness functions operate on whole solutions, whereas parameterizations refine network infrastructure. Evolutionary computing typically explores a discrete or combinatorial solution space, whereas neural networks adjust continuous-valued parameter weights within high-dimensional spaces.
[xx] This ability of such a fully varied range of feedback information to allow a search algorithm to explore the entirety of the search space is sometimes expressed in terms of the algorithm being closed under permutation, which means it doesn’t matter how you relabel the points in the search space, any such re-labeling leaves unaffected the effectiveness of a search algorithm in reaching a target. For details about this point, see Christian Igel and Marc Toussaint, “A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions,” Journal of Mathematical Modelling and Algorithms 3 (2004): 313–322.
[xxi] Blind search explores a search space without domain-specific heuristics or other information to guide the search to its target. Common types of blind search include uniform random sampling (with and without memory) and exhaustive search (traversing the entire search space, or as much of it as is feasible, in no special order). Blind search guarantees finding a solution if enough locations in the search space can be queried, but this is not feasible if the search space is excessively large. Blind search tends to be inefficient in that it is wont to explore large portions of the search space that have no bearing on the target being searched. Unlike heuristic search, which uses problem-specific knowledge/information to direct exploration, blind search treats all possible ways of traversing the search space as equivalent, making it computationally expensive for large-scale problems. While often used as a baseline for comparison, as in the no-free-lunch theorems, blind search works successfully in practice only for search spaces that are not overly large and where little or no information is available to guide search.
[xxii] From the abstract to Thomas M. English, “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch,” in L. J. Fogel, P. J. Angeline, and T Bäck, eds., Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, 163-169 (Cambridge, Mass: MIT Press, 1996). Note the date, 1996, which means that English was responding to Wolpert and Macready’s original Santa Fe Institute paper on no free lunch (1996) and not their subsequent more polished paper for the IEEE (1997). English also expanded on the underlying information-theoretic rationale for using the language of conservation of information to characterize no free lunch in Thomas M. English, “Some Information Theoretic Results on Evolutionary Optimization,” Proceedings of the 1999 Congress on Evolutionary Computation-CEC99, 788–795 and also in Thomas M. English, “No More Lunch: Analysis of Sequential Search,” Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753), 227–234.
For completeness, let me mention a groundbreaking paper on conservation of information (broadly construed) by Cullen Schaffer that appeared in 1994, before all the hubbub surrounding the no-free-lunch theorems in 1996: Cullen Schaffer, “A Conservation Law for Generalization Performance,” in W. W. Cohen and H. Hirsh, eds., Machine Learning: Proceedings of the Eleventh International Conference (San Francisco: Morgan Kaufmann, 1994), 259–265. In this paper, Schaffer proves a fundamental result about machine learning, showing that generalization performance—how well a model predicts on unseen data—is inherently a zero-sum game: total generalization performance of a learner, averaged over all learning situations, is zero. Thus, for any machine learning model, performance improvements in one subset of learning problems are exactly offset by performance degradation in others. Schaffer’s conservation law imposes strict limits on the ability of any learner to perform well universally, emphasizing that gains in generalization are always balanced by losses elsewhere in the problem space. Schaffer explores the implications of this result for the design, evaluation, and enhancement of machine learning models. Even though Schaffer shifts the idiom from search to learning, the two are in fact equivalent (learning is a search for knowledge). Arguably, all the core elements in Wolpert and Macready’s no-free-lunch theorems are prefigured in Schaffer’s Conservation Law for Generalization Performance.
[xxiii] See the publications page of the Evolutionary Informatics Lab: https://www.evoinfo.org/publications.html (last accessed January 16, 2025).
Thanks Bill for this historical overview. I know your definition of information is based on ruling out possibilities, so I think to help the lay person (probably familiar with other ideas of ‘information’) the terms information, knowledge and learning should remain separate. Maybe it is just semantics that I am using, but I do struggle with the idea (my interpretation at least) that, say, to synthesize/analyse ‘information’ does not necessarily provide new information. Wouldn’t new insights from say, synthesizing previous decisions (where possibilities were eliminated), provide further ‘information’, i.e. to possibly rule out/in other/previous possibilities that were not apparent before, or is this just me using hindsight? Whatever the case, I think these types of possible interpretations need to be ‘tightened-up’ moving forward.