The Existence of God and the Principle of Sufficient Reason

On arguments for the existence of God from the principle of sufficient reason. The principle of sufficient reason is the principle that everything must have a reason, cause, or ground. This principle has been applied to argue for the existence of God as the ultimate reason behind all things.

In previous episodes I have discussed a couple arguments for the existence of God: the argument for “the One” and the argument from eternal truths. Both are kinds of cosmological arguments, characteristic of the thought of Plotinus and Augustine respectively. The central notion of a cosmological argument is that everything depends on something else for its existence and nature, except for one thing that is the ultimate source for everything else, one thing that is absolutely independent and necessary. With this episode I’d like to talk about another kind of cosmological argument that attends to this same central notion but in a slightly different way. This is the argument from the principle of sufficient reason. This argument was given its classical form by the German mathematician Gottfried Wilhelm Leibniz (1646 – 1716). Leibniz was also the one to use the term sufficient reason to refer to the principle, though it had certainly been expressed by many people previously.

The principle of sufficient reason, often abbreviated as PSR, is that “everything must have a reason, cause, or ground” (Stanford Encyclopedia of Philosophy). Leibniz said in his Monadology:

“Our reasonings are grounded upon two great principles, that of contradiction, in virtue of which we judge false that which involves a contradiction, and true that which is opposed or contradictory to the false; And that of sufficient reason, in virtue of which we hold that there can be no fact real or existing, no statement true, unless there be a sufficient reason, why it should be so and not otherwise, although these reasons usually cannot be known by us.”

Leibniz expands on this, leading to an argument for God, in a passage worth quoting extensively:

“In short, there are simple ideas, of which no definition can be given; there are also axioms and postulates, in a word, primary principles, which cannot be proved, and indeed have no need of proof; and these are identical propositions, whose opposite involves an express contradiction. But there must also be a sufficient reason for contingent truths or truths of fact, that is to say, for the sequence or connexion of the things which are dispersed throughout the universe of created beings, in which the analyzing into particular reasons might go on into endless detail, because of the immense variety of things in nature and the infinite division of bodies. There is an infinity of present and past forms and motions which go to make up the efficient cause of my present writing; and there is an infinity of minute tendencies and dispositions of my soul, which go to make its final cause. And as all this detail again involves other prior or more detailed contingent things, each of which still needs a similar analysis to yield its reason, we are no further forward: and the sufficient or final reason must be outside of the sequence or series of particular contingent things, however infinite this series may be. Thus the final reason of things must be in a necessary substance, in which the variety of particular changes exists only eminently, as in its source; and this substance we call God.”

Alexander Pruss picked out the key ideas from Leibniz’s argument and put it in the following, succinct form (The Blackwell Companion to Natural Theology, The Leibnizian Cosmological Argument, by Alexander R. Pruss, pp.25-6):

  1. Every contingent fact has an explanation.
  2. There is a contingent fact that includes all other contingent facts.
  3. Therefore, there is an explanation of this fact.
  4. This explanation must involve a necessary being.
  5. This necessary being is God.

Leibniz speaks of “an infinity” of “present and past forms and motions” of “minute tendencies and dispositions”. Basically an infinity of contingent facts. We can lump all these contingent facts into one, the logical conjunction of all contingent facts. This conjunction of all contingent facts has been called the Big Conjunctive Contingent Fact (abbreviated BCCF). Because all these facts are contingent and because their sum total is contingent it all requires explanation. But the explanation for all contingent facts cannot itself be contingent, otherwise it would be among the very set of facts in need of explanation. As Leibniz says, “the sufficient or final reason must be outside of the sequence or series of particular contingent things”. So this explanation must be the opposite of contingent, i.e. necessary. And Leibniz proposes that we call this God.

Like most arguments for the existence of God what is demonstrated, while important and significant, is also limited. There’s nothing demonstrated here about God’s activity in history or as revealed in scripture. It doesn’t tell us which religion or which sacred scripture, if any, is correct. It doesn’t tell us what kinds of ethical demands God might make of us. Such things might be demonstrated by other means and I think they very well can be. But that’s not where the argument has taken us so far. I think this is important because one actually doesn’t need to be religious to be a convinced theist. Being a Christian involves a lot more than this. For one thing it usually, and maybe always, involves transformative spiritual experience. Or one might be convinced just intellectually but in a way that would necessarily involve a great deal of familiarity with history, scriptural texts, and probably ancient languages. In modernity the reasonableness of theism itself is somewhat obscured by a lot of the cultural barriers, negative perceptions, and aversion to organized religion. But simple theism itself is fairly straightforward. I think theism is very rational and something that most people could easily accept, if not otherwise conditioned.

More later on the way that the principle of sufficient reason leads to an argument for the existence of God. Let’s first spend some time on the principle itself. Why accept it? What are some possible objections to it?

The best reasons for accepting the principle would seem to be indirect ones through arguments with a reductio ad absurdum form. Such arguments ask, what would follow if we rejected the principle? What would we expect the world to be like if everything did not have a reason? One thing we might expect is that it would be very common to find things and events that didn’t have any evident explanation or that were completely unintelligible. This would be very different from what we observe scientifically. It’s also just very different from the experience of regular life which is, well, quite regular. We just don’t see things happening or being certain ways for no reason at all. That’s what we could expect, let’s say, in the physical world. But it would go even deeper than this, into our minds and thoughts. The principle of sufficient reason pertains to connections between thoughts and ideas. We think one thing to be so by reason of some other thing and so on. But absent the principle of sufficient reason all of this is gone. We’d just have a bundle of thoughts and ideas without any way of structuring them to give support to one another and to know which ones to think are true and which are false. In other words, we wouldn’t be able to trust our own cognitive faculties.

Arthur Schopenhauer (1788 – 1860) distinguished between four forms of sufficient reason. Regardless of whether the four forms he picked out are the right ones or the only forms possible to pick out, I think that distinguishing between the different forms that sufficient reason can take is a good idea. Schopenhauer believed that philosophers throughout history had failed to make proper distinctions between various forms of the principle. In particular he thought most philosophers had failed to distinguish the other forms of PSR from the principle of causality, cause and effect in nature, which is only one form that the principle can take. Schopenhauer’s four forms were the principle of sufficient reason of becoming, knowing, being, and willing. These correspond to causality, rules of logic, mathematics, and motivations.

Causality is a major topic in philosophy with a whole host of objections and responses. Those are pertinent to PSR but, since causality is only one form of PSR, not to all forms of it. There are also objections to causality that are more properly objections to determinism rather than causality as such. For example, after the development of quantum mechanics we might think that many things happen without a cause. And we might similarly suppose that many things happen for no reason at all. For example, in radioactive decay the precise moment that any particular radioactive atom will decay is unpredictable. However the decay rate of many atoms of the same type over time is actually highly predictable, so that a given material has a characteristic half-life. A material’s half-life is consistent and related to its other properties. In more general terms the quantum state of a quantum system is characterized by a wave function. Under the Copenhagen Interpretation of quantum mechanics the square modulus of the wavefunction is a probability density. This gives the probability of different states being experimentally observed. Upon observation the wave function “collapses” into a single state. And we can’t predict with certainty which state will be observed.

Now this is certainly a different way of understanding how things work than we would otherwise have thought. But does it mean that things happen for no reason at all? I maintain that this is not the case because these events are still highly constrained and highly ordered. In quantum chemistry, for example, it is wave functions called orbitals that characterize the behavior of electrons in atoms and molecules. And yes, wavefunctions are inherently probabilistic. But the structure of these orbitals imparts tremendous explanatory power to chemistry at the level of atomic and molecular bonds, and consequently also to chemical reactions. I propose that it’s not the case that with the development of quantum mechanics we have found more things that happen for no reason at all. I think it’s the opposite. Before quantum mechanics chemistry was more dependent on macro-scale empirical observations of regularities. Although we could observe that certain chemical phenomena occurred regularly in certain ways we didn’t have as much understanding about why they occurred in the ways that they did.There was more arbitrariness in our explanations. But with quantum chemistry we have a much more developed understanding and we actually know more of the reasons why things happen the way they do.

Another important point is that although wave functions are inherently probabilistic in quantum mechanics this does not mean that the quantum states that are observed occur for no reason at all. That would only be the case if there were no laws of quantum mechanics. Then anything at all really could happen, without any kind of pattern. But such quantum phenomena do have a reason and that reason is the laws of quantum mechanics themselves.

Now returning to the way the principle of sufficient reason connects to the existence of God. A short description of the logic is that everything has to have a reason. Most things have their reason in something else. But ultimately all reasons have to lead back to one thing. And this one thing has to have very unique properties in order to be the reason for everything. For everything else and even for itself. The unique properties that this ultimate reason would have to have are those of God. Now to get into more detail.

The first important concepts to follow on to PSR are of contingency and necessity. If we grant that everything has to have a reason, cause, or ground the next step is to categorize the ways that they have these reasons. And there are two: contingency and necessity. A thing can have a reason in something else, which means it’s contingent. Or a thing can have its reason in its own nature, which means it’s necessary. And if we grant PSR there’s no third alternative.

Clearly almost everything is contingent. If you think of almost anything you can see how something else is a reason for it. And there are chains of reasons, as we see in a child’s relentless “Why?” game. Where a child asks some “Why?” question and follows up the answer with “Why?”, and follows up the next answer with “Why?”, over and over again. Eventually you give up answering, not because there is not a reason but because you don’t know what the reason is. This is a great illustration of contingency.

With these kinds of chains of reasons an issue that comes up is the infinite regress and the question of whether you can have an infinite regress. There are different opinions and arguments on that point but I think the possibility of infinite regress is untenable. William Lane Craig has done a lot of work on this subject. I also like what David Bentley Hart has called the “pleonastic fallacy”, which he defines as “the belief that an absolute qualitative difference can be overcome by a successive accumulation of extremely small and entirely relative quantitative steps.” (The Experience of God, 98) As it pertains to the case at hand, the difference between an infinite regress and a finite regress is an absolute qualitative difference. A finite regress terminates at some determinate point. An infinite regress does not. That’s an absolute qualitative difference. An infinite regress of reasons still lacks an ultimate reason. It doesn’t matter that there’s an infinite number of them. That’s the idea behind the joke that “it’s turtles all the way down.” If the world rests on the back of a turtle that rests on the back of another turtle and so on you can always ask what the next turtle is resting on. And it doesn’t help to say that it’s turtles all the way down. The stack of turtles is still unsupported. It doesn’t matter that there’s an infinite number of them. There has to be a termination in the chain. And that termination point has to be something with a unique set of qualifying properties.

So much for contingent things. What about necessary things? What would a necessary entity have to be like? There are reasons to think that a necessary being would have to be:

  1. purely actual
  2. absolutely simple or noncomposite, and 
  3. something which just is subsistent existence itself

A word on actuality. Actuality and potentiality are concepts going back to Aristotle and the Medieval Scholastics. An entity can have some attribute actually or potentially. Actuality is an entity’s already having an attribute. Potentiality is an entity’s capacity to have an attribute. Almost everything has both actuality and potentiality for different things. Since things are certain ways they have actuality for those ways that they are. But they also have potentiality for all the ways that they are not yet but could be. There’s a connection here to contingency. If a thing could be many different ways but happens to be only certain of those ways, the ways that it happens to be are contingent, because it could have been otherwise. When an entity has potentiality for an attribute that attribute can only become actual if it is actualized. And it can only be actualized by another entity that has that attribute already, i.e. has actuality for it. Heat is an illustrative example. All materials have a certain heat capacity, which is the amount of heat they can absorb for a given increase in temperature. Materials have this capacity even when they are not increasing in temperature. In order to increase in temperature the material has to receive an energy input from some heat source. The heat source has actuality that it imparts to the material, actualizing its potential for the higher temperature. This is a physical example but the principles of actuality and potentiality also apply to ideas.

What are the reasons to suppose that a necessary being would have to be purely actual, noncomposite, and self-subsistent? A necessary being is one that cannot not exist. A contingent being is the opposite because it is possible for it not to exist and for it to have been otherwise than it is. Because a contingent being could be otherwise than it actually is it has unactualized potentiality. A necessary being cannot have unactualized potentiality. It has to be purely actual. The way it is is the only way that could be. Furthermore, all other things ultimately trace their source of actualization back to this necessary being. A necessary being is purely actual and also the entity that actualizes everything else. A necessary being has to be concomposite because anything composite could have been composed differently. Anything composite is composed of parts. These could be physical parts or abstract parts. Anything composed of parts cannot be necessary because it is possible for its parts to be put together in different ways or not at all. Finally a necessary being has to be self-subsistent in its existence because if it depended on some other entity for its existence it wouldn’t be self-subsistent.

In a previous episode, An Argument for the One, I shared a Neo-Platonic argument for why an entity possessing these attributes would have to be unique. There can in principle be only one thing which is purely actual, absolutely simple or noncomposite, and something which just is subsistent existence itself. Why is that? If there were more than one necessary being each would have to have some differentiating feature that the others lacked. Otherwise they would just be the same entity. But to have any differentiating features they would have to have potentialities. The potentialities of each would be whatever features the others did not have. But since a necessary being is purely actual it cannot have any such potentialities and so no differentiating features. This would preclude there being any more than one. So there can be only one necessary being.

It’s worth noting here for a moment that these attributes that the one necessary entity would have to have preclude certain candidates that might naturally occur to us. The big one, I think, is the universe itself. Can’t the universe itself just be the one explanation for everything? But this won’t do because the universe lacks the qualities that a necessary entity has to have. The universe is not necessary; it’s contingent. It doesn’t have to exist. The universe is not purely actual, noncomposite, or self-subsistent. The universe has many unactualized potentialities, potentialities that we know a lot about now thanks to the science of cosmology. The universe is certainly not noncomposite. The observable universe is thought to contain 10^80 particles. So the universe does not qualify as the kind of thing that could be the one necessary entity.

If we grant the foregoing there are also reasons to think that a necessary being would also have to be:

  1. Immutable
  2. Eternal
  3. Immaterial
  4. Incorporeal
  5. Perfect
  6. Omnipotent
  7. Fully good
  8. Intelligent, and 
  9. Omniscient

These are clearly attributes associated with God. At this stage we’re looking at identifying the one necessary being with the attributes of God. Why would a necessary being have all of these attributes? These trace back to its being purely actual, noncomposite, and self-subsistent. Immutability is changelessness, which relates to actuality and potentiality. Things with potentiality have the capacity to change. But something that is purely actual is already fully actualized. It doesn’t have any potentialities that need to be actualized. Such changelessness also applies across time. Because God is the same across time he is eternal, the same at all moments. God is also immaterial and incorporeal because he is noncomposite. Matter and bodies are essentially composite, both because they are composed of particles and because matter is a plurality; there are many different material and bodily entities, each with distinguishing features. God, being noncomposite, cannot be like that.

Because God is pure actuality and doesn’t have any potentialities that need to be actualized he is already perfect. He’s already everything that he can be. This perfection includes moral perfection. For the one necessary being to be fully good is actually the attribute that is, on its face, least obvious to me and also the one of greatest existential concern. Apart from all the foregoing, it would be easy for me to imagine that the ultimate source of all things with all power might not be morally good but might actually be amoral. And that would be rather distressing. Maybe morality is a human invention and not pertinent to the one necessary being behind all things. But in relation to everything else we can reason about God there is good reason to think that the one necessary being is also fully good. The goodness of the necessary being relates to his pure actuality. To see the relation requires a certain understanding of goodness. In this understanding goodness is the actualization of an entity’s potentialities. This is the understanding of goodness expounded by Aristotle and articulated in modern times by Alisdair MacIntyre. The good is that at which things aim. Living things have natures with potentialities to become the kinds of things that they are meant to be. Goodness is the actualization of these potentialities. It’s essentially creative and fruitful. Its opposite, evil, is essentially destructive and privative. We think of something like the Holocaust as the ultimate evil, and very rightly so. This was supremely destructive and the complete opposite of creation, multiplying, and replenishing. Other evils may be much less total in their destructive force but also work against growth and realization of our potentialities. The one necessary being is decidedly on the side of creation and goodness. As pure actuality God is the very source of all creation and growth that empowers all entities to move toward the things for which they aim.

Omnipotence is another way of understanding God’s pure actuality. Actuality means making things happen, which is essentially what power is. Everything that happens and that can happen is dependent on being actualized by the ultimate actualizer, and so God is the source of all power and is all-powerful.

Intelligence and omniscience are probably the most bold assertions about God’s nature. We might imagine a single source for all things but still resist that this ultimate source itself possesses human-like consciousness. Why should we suppose this to be the case? The reason for this relates to an important Platonic insight about the nature of reality. And this is the existence of abstractions. I discussed this in another episode about an argument for the existence of God from eternal truths. Examples of abstraction include mathematical concepts and theorems that would seem to hold independent of anything physical. Abstractions have the character of ideas. They can certainly subsist in our minds. But they would also seem to transcend any particular, finite mind, like the minds of human beings. These abstract forms can be actualized in the physical universe, as in the form of physical laws or in the form of created entities. As Edward Feser has stated: “To cause something to exist is just to cause something having a certain form or fitting a certain pattern.” (Five Proofs of the Existence of God, 33). If these abstractions have real existence they have to exist somewhere. The various modes of subsistence they might have is a huge topic but for our purposes here we’ll just note that as entities of a mental character the most reasonable way for them to exist is as ideas in God. It is the intellectual and mental nature of these abstractions, existing in God, that gives reason to think that God must have intelligence. In fact, his intelligence must be very great indeed because it comprises all abstractions. And because all actualization, including actualization involving these sorts of mental abstractions, originates from God, God’s intelligence must be all-encompassing; in other words, omniscient.

Let’s return to the expression of all these ideas in the forms of arguments for the existence of God. I shared earlier the argument from Leibniz, as re-expressed by Alexander Pruss:

  1. Every contingent fact has an explanation.
  2. There is a contingent fact that includes all other contingent facts.
  3. Therefore, there is an explanation of this fact.
  4. This explanation must involve a necessary being.
  5. This necessary being is God.

This is a nice, concise argument. But a longer argument has the benefit of explaining a little more that is taken for granted here. For example, why we should understand a necessary being to be God and to have the attributes traditionally associated with God. The kinds of reasons I’ve been discussing. A longer version of the argument that lays all this out is given by Edward Feser in his book Five Proofs of the Existence of God, in his fifth proof which he calls the Rationalist Proof, since Leibniz was a rationalist. It proceeds in 27 steps. The terms and ideas should be familiar now after everything discussed so far. The argument is the following:

  1. The principle of sufficient reason (PSR) holds that there is an explanation for the existence of anything that does exist and for its having the attributes that it has.
  2. If PSR were not true, then things and events without evident explanation or intelligibility would be extremely common.
  3. But this is the opposite of what common sense and science alike find to be the case.
  4. If PSR were not true, then we would be unable to trust our own cognitive faculties.
  5. But in fact we are able to trust those faculties.
  6. Furthermore, there is no principled way to deny the truth of PSR while generally accepting that there are genuine explanations in science and philosophy.
  7. But there are many genuine explanations to be found in science and philosophy.
  8. So, PSR is true.
  9. The explanation of existence of anything is to be found either in some other thing which causes it, in which case it is contingent, or in its own nature, in which case it is necessary; PSR rules out any purported third alternative on which a thing’s existence is explained by nothing.
  10. There are contingent things.
  11. Even if the existence of an individual contingent thing could be explained by reference to some previously existing contingent thing, which in turn could be explained by a previous member, and so on to infinity, that the infinite series as a whole exists at all would remain to be explained.
  12. To explain this series by reference to some further contingent cause outside the series, and then explain this cause in terms of some yet further contingent thing, and so on to infinity, would merely yield another series whose existence would remain to be explained; and to posit yet another contingent thing outside this second series would merely generate the same problem yet again.
  13. So, no contingent thing or series of contingent things can explain why there are any contingent things at all.
  14. But that there are any contingent things at all must have some explanation, given PSR; and the only remaining explanation is in terms of a necessary being as cause.
  15. Furthermore, that an individual contingent thing persists in existence at any moment requires an explanation; and since it is contingent, that explanation must lie in some simultaneous cause distinct from it.
  16. If this cause is itself contingent, then even if it has yet another contingent thing as its own simultaneous cause, and that cause yet another contingent thing as its simultaneous cause, and so on to infinity, then once again we have an infinite series of contingent things the existence of which has yet to be explained.
  17. So, no contingent thing or series of contingent things can explain why any particular contingent thing persists in existence at any moment; and the only remaining explanation is in terms of a necessary being as its simultaneous cause.
  18. So, there must be at least one necessary being, to explain why any contingent things exist at all and how any particular contingent thing persists in existence at any moment.
  19. A necessary being would have to be purely actual, absolutely simple or noncomposite, and something which just is subsistent existence itself.
  20. But there can in principle be only one thing which is purely actual, absolutely simple or noncomposite, and something which just is subsistent existence itself.
  21. So, there is only one necessary being.
  22. So, it is this same one necessary being which is the explanation of why any contingent things exist at all and which is the cause of every particular contingent thing’s existing at any moment.
  23. So, this necessary being is the cause of everything other than itself.
  24. Something which is purely actual, absolutely simple or non-composite, and something which just is subsistent existence itself must also be immutable, eternal, immaterial, incorporeal, perfect, omnipotent, fully good, intelligent, and omniscient.
  25. So, there is a necessary being which is one, purely actual, absolutely simple, subsistent existence itself, cause of everything other than itself, immutable, eternal, immaterial, incorporeal, perfect, omnipotent, fully good, intelligent, and omniscient.
  26. But for there to be such a thing is for God to exist.
  27. So, God exists.

Feser’s argument covers the core points of the argument from the principle of sufficient reason as well as all related issues, tying this not only to God as the one necessary being but also to God with all of his classical divine attributes.

Now to speak more reflectively on all of this, I sometimes feel like we are all far too incurious and complacent about our existence. We’re all just thrown into life as infants without the ability to reflect on it and ask what should be a pretty obvious question: “What’s going on here?!” By the time we’re old enough to speak and reason we settle in and just go along with things. But we can still go back to the beginning, before we’ve taken everything for granted, and ask: “Where does all of this come from?” And there are important related questions like, “What’s our part in all of this?” “What are we supposed to be doing here?”

Maybe these are unreasonable questions to ask. But I don’t think so. Things don’t just happen for no reason at all. It may be practical to ignore these questions in order to just get along with the daily business of our lives. But we shouldn’t push them off forever. We are made for more than just our particular day to day affairs. The big questions are also the ones that give intelligibility and meaning to our life’s details.

The reason to believe in God is also the reason for asking for reasons for everything and anything at all.

Human Language and Artificial Neural Networks

The recent developments in AI are quite impressive. If someone had told me a couple years ago about the capabilities of something like ChatGPT I wouldn’t have believed them. AI certainly has enormous practical benefit. But since artificial neural networks were inspired by biological neural networks they can also be useful models for them. In this episode I share some recent studies investigating the behavior of the brain using AI models and evaluating their possible underlying computational similarities.

This is a follow-up to some things discussed in our last group episode on artificial intelligence. Since that conversation I’ve been digging more into the subject and wanted to share some ideas about it. I’ve been interested in artificial intelligence for a number of years. Part of that interest is because of its practical usefulness, which we’re really seeing explode now, with ChatGPT in particular. But I’m also interested in artificial intelligence as a model that could give us insights about human intelligence.

I have to say that the performance of these most recent models, like ChatGPT-3 and especially ChatGPT-4, is something that has really surprised me. If someone had told me a couple years ago that in 2022 & 2023 a deep learning model would be able to perform as well as these do I wouldn’t have believed it. I’m typically predisposed to doubt or at least be very critical about the capabilities of artificial intelligence. But in this case I think I was wrong and I’m happy to have been wrong about that. I don’t mean to swing too far to the other extreme and get too exuberant about it and overstate the capabilities of these models. But just a little bit of excess excitement might be excusable for the moment.

One claim that would be too extreme would be that these deep learning models are actually self-conscious already. Now I have no philosophical reason to suppose that an artificial device could not be self-conscious. I just don’t think we’re there yet. Another, less extreme claim, but one that would still go too far would be that deep learning models actually replicate the way human speech is produced in the brain. I think the implementations are still distinct. But that being said, I think there are enough similarities to be useful and interesting.

For comparison, there are insights we can gain into sight and hearing from cameras and audio recorders. Obviously they are not the same as our own sense organs but there are some similar principles that can help us think about how our senses work. The comparisons work both at the level of physical mechanisms and at the level of data processing. For example, I think there are some interesting insights about human senses from perceptual coding. Perceptual coding is a method used in digital signal processing that leverages the limitations and characteristics of the human sensory systems (auditory and visual) to provide data compression. For example, in audio, certain sounds are inaudible if they’re masked by louder sounds at a similar frequency. Similarly, in an image, subtle color differences in areas with high spatial detail are less noticeable than in smooth areas. Perceptual coding takes advantage of this by selectively removing the less noticeable information to reduce the data size, without significantly impacting perceived quality. This is done with MP3s and JPEGs. Extending this comparison to large language models, I’d propose that models like ChatGPT might be to human language production what cameras, JPEGs, audio recorders, and MP3s are to sight and sound. They aren’t the same but there are some parallels. ChatGPT is not the same as a human brain any more than a camera is an eye or an audio recorder is an ear. But, more modestly, ChatGPT may have some interesting similarities to human language production.

The most significant developments in this technology are so recent that the most useful reading material I’ve had to go to on the subject is peer-reviewed literature from the past year. Even there a lot of the research was done with GPT-2, which was a much less advanced model than we have available today. So it will be interesting to see what studies come out in the next year and beyond. The papers I want to focus on are 3 research papers from 2022 that present the results of experiments and 2 (just slightly older) perspective papers that offer some broad reflections and theoretical considerations.

In what follows, I’ll proceed in three parts: (1) philosophical background, (2) an overview of neural networks: biological and artificial, and (3) recent scientific literature.

Philosophical Background

Most of the philosophy I’d like to discuss is from the 20th century, in which there was considerable philosophical interest in language in what has been called the “linguistic turn”. But first something from the 18th century.

Something that stood out to me in all the research articles was the issue of interpretability. Artificial neural networks have been shown to have remarkable parallels to brain patterns in human brain production. That’s nice because the brain is so complex and little understood. The only problem is that ANNs themselves are also extremely complex and opaque to human comprehension. This challenges a notion going back to the 18th-century Italian philosopher Giambattista Vico: the Verum factum principle.

The phrase “verum factum” means “the true is the made,” which refers to the notion that truth is verified through creation or invention. In other words, we can only know with certainty that which we have created ourselves, because we understand its origins, structure, and purpose. Vico developed this principle as a critique of the Cartesian method of knowing, which, in Vico’s view, emphasized the abstract and ignored the concrete, humanistic dimensions of knowledge. By asserting that true knowledge comes from what humans create, Vico highlighted the role of human agency, creativity, and historical development in the creation of knowledge.

However, applying the verum factum principle to complex human creations like modern industrial economies, social organizations, big data, and artificial neural networks poses some interesting challenges. These creations certainly reflect human ingenuity and creativity, but they also possess a complexity that can make them difficult to fully comprehend, even for those directly involved in their creation. Artificial neural networks are inspired by our understanding of the human brain, but their function, especially in deep learning models, can be incredibly complex. It’s often said that these networks function as “black boxes,” as the pathway to a certain output given a certain input can be labyrinthine and largely inexplicable to humans, including their creators. So, while the verum factum principle encapsulates the role of human agency and creativity in the construction of knowledge, artificial neural networks illustrate that our creations can reach a level of complexity that challenges our ability to fully comprehend them.

Now turning to the 20th century I think four philosophers are especially relevant to the subject. These are Martin Heidegger, Ludwig Wittgenstein, Ferdinand de Saussure, and Hubert Dreyfus. Of these four Hubert Dreyfus was the one who most directly commented on artificial intelligence. But Dreyfus was also using ideas from Heidegger in his analysis of AI.

Let’s start with Dreyfus and Heidegger. Dreyfus’s main arguments were outlined in his influential 1972 book, What Computers Can’t Do. The core of his critique lies in what he sees as AI’s misguided reliance on formal symbolic reasoning and the assumption that all knowledge can be explicitly encoded. Dreyfus argued that human intelligence and understanding aren’t primarily about manipulating symbolic representations, as early AI research assumed. Instead, he believed that much of human knowledge is tacit, implicit, and tied to our embodied experience of “being in the world”, an important Heideggerian concept. These are aspects that computers, at least during that time, couldn’t easily replicate.

Dreyfus drew heavily on the philosophy of Martin Heidegger to make his arguments. Heidegger’s existential phenomenology, as expressed in his 1927 book Being and Time describes human existence (“Dasein”) as being-in-the-world—a complex, pre-reflective involvement with our surroundings. This contrasts with the traditional view of humans as subjects who perceive and act upon separate objects. According to Heidegger, we don’t usually encounter things in the world by intellectually representing them to ourselves; instead, we deal with them more directly.

Dreyfus related this to AI by arguing that human expertise often works in a similar way. When we become skilled at something, we don’t typically follow explicit rules or representations—we just act. This aligns with Heidegger’s notion of ‘ready-to-hand’—the way we normally deal with tools or equipment, not by observing them as separate objects (‘present-at-hand’), but by using them directly and transparently in our activities.

Another philosopher relevant to this topic is Ludwig Wittgenstein. He was one of the most influential philosophers of the 20th century. He is considered to have had two major phases that were quite different from each other.  His early work, primarily represented in Tractatus Logico-Philosophicus, proposed that language is a logical structure that represents the structure of reality. But in his later work, chiefly Philosophical Investigations, Wittgenstein advanced a very different view.

In Philosophical Investigations, Wittgenstein introduces the concept of language as a form of social activity, what he called “language games.” He argues that language does not have a single, universal function (as he had previously believed) but is instead used in many different ways for many different purposes.

Language, Wittgenstein claims, should be seen as a myriad of language games embedded in what he called ‘forms of life’, which are shared human practices or cultural activities. Different language games have different rules, and they can vary widely from commands, to questions, to descriptions, to expressions of feelings, and more. These language games are not separate from our life but constitute our life.

Wittgenstein also introduced the idea of ‘family resemblances’ to discuss the way words and concepts gain their meanings not from having one thing in common, but from a series of overlapping similarities, just like members of a family might resemble each other.

He also challenged the idea that every word needs to have a corresponding object in the world. He argued that trying to find a definitive reference for each word leads to philosophical confusions and that words acquire meaning through their use in specific language games, not through a one-to-one correspondence with objects in the world. So, for Wittgenstein, the meaning of a word is not something that is attached to it, like an object to a label. Instead, the meaning of a word is its use within the language game. This was a notion similar to a theory of language called structuralism. 

The leading figure of structuralism was the Swiss linguist Ferdinand de Saussure. His ideas laid the groundwork for much of the development in linguistics in the 20th century and provided the basis for structuralism. In his course in general linguistics, compiled from notes taken by his students, Saussure proposed a radical shift in the understanding of language. He proposed that language should be studied synchronically (as a whole system at a particular point in time) rather than diachronically (as a historical or evolutionary development). According to Saussure, language is a system of signs, each sign being a combination of a concept (the ‘signified’) and a sound-image (the ‘signifier’). Importantly, he emphasized that the relationship between the signifier and the signified is arbitrary – there is no inherent or natural reason why a particular sound-image should relate to a particular concept.

Regarding the creation of meaning, Saussure proposed that signs do not derive their meaning from a connection to a real object or idea in the world. Instead, the meaning of a sign comes from its place within the overall system of language and its differences from other signs; It’s within the structure of the language. That is, signs are defined not positively, by their content, but negatively, by their relations with other signs. For example, the word “cat” doesn’t mean what it does because of some inherent ‘cat-ness’ of the sound. Instead, it gains meaning because it’s different from “bat,” “cap,” “car,” etc. Moreover, it signifies a particular type of animal, different from a “dog” or a “rat”. Thus, a sign’s meaning is not about a direct link to a thing in the world but is about differential relations within the language system.

Saussure’s ideas about language and the generation of meaning can be interestingly compared to the techniques used in modern natural language processing (NLP) models, such as word2vec and to cosine similarity. For example, in word2vec, an algorithm developed by researchers at Google, words are understood in relation to other words.

Word2vec is a neural network model that learns to represent words as high-dimensional vectors (hence “word to vector”) based on their usage in large amounts of text data. Each word is assigned a position in a multi-dimensional space such that words used in similar contexts are positioned closer together. This spatial arrangement creates ‘semantic’ relationships: words with similar meanings are located near each other, and the differences between word vectors can capture meaningful relationships.

A measure of the similarity between two vectors is called cosine similarity. In the context of NLP, it’s often used to measure the semantic similarity between two words (or word vectors). If the word vectors are close in the multi-dimensional space (meaning the angle between them is small), their cosine similarity will be high, indicating that the words are used in similar contexts and likely have similar meanings. There are some interesting parallels between Saussure’s linguistics and AI language models. Both approaches stress that words do not have meaning in isolation but gain their meaning through their relations to other words within the system; language for Saussure and the trained model for word2vec.

Neural Networks: Biological and Artificial

Recall Hubert Dreyfus’s critique of formal symbolic reasoning in artificial intelligence and the assumption that all knowledge can be explicitly encoded. His critique is most relevant to traditional programming in which explicit program instructions are given. In machine learning, however, and in artificial neural networks program rules are developed in response to data. To whatever degree this is similar to the human mind, biology is at least the inspiration for artificial neural networks.

What is the structure of biological neural networks (BNNs)? In the brain connections between neurons are called synapses. Synapses are the tiny gaps at the junctions between neurons  in the brain where communication occurs. They play a vital role in the transmission of information in the brain. Each neuron can be connected to many others through synapses, forming a complex network of communicating cells.

Neurons communicate across the synapse using chemicals called neurotransmitters. When an electrical signal (an action potential) reaches the end of a neuron (the presynaptic neuron), it triggers the release of neurotransmitters into the synapse. These chemicals cross the synapse and bind to receptors on the receiving neuron (the postsynaptic neuron), which can result in a new electrical signal in that neuron. This is how neurons interact with each other and transmit information around the brain.

Synapses form during development and continue to form throughout life as part of learning and memory processes. The creation of new synapses is called synaptogenesis. This happens when a neuron extends a structure called an axon toward another neuron. When the axon of one neuron comes into close enough proximity with the dendrite of another neuron, a synapse can be formed.

The strength of synapses in the brain can change, a phenomenon known as synaptic plasticity. This is thought to be the basis of learning and memory. When two neurons are activated together frequently, the synapse between them can become stronger, a concept known as long-term potentiation (LTP). This is often summarized by the phrase “neurons that fire together, wire together”.

On the other hand, if two neurons aren’t activated together for a while, or the activation is uncorrelated, the synapse between them can become weaker, a process known as long-term depression (LTD).

Multiple factors contribute to these changes in synaptic strength, including the amount of neurotransmitter released, the sensitivity of the postsynaptic neuron, and structural changes such as the growth or retraction of synaptic connections. By adjusting the strength of synaptic connections, the brain can adapt to new experiences, form new memories, and continually rewire itself. This is a dynamic and ongoing process that underlies the brain’s remarkable plasticity.

How then do biological neural networks compare to artificial neural networks? In an artificial neural network, each connection between artificial neurons (also called nodes or units) has an associated weight. These weights play a role somewhat analogous to the strength of synaptic connections in a biological brain. A weight in an ANN determines the influence or importance of an input to the artificial neuron. When the network is being trained, these weights are iteratively adjusted in response to the input the network receives and the error in the network’s output. The goal of the training is to minimize this error, usually defined by a loss function.

The process of adjusting weights in an ANN is a bit like the changes in synaptic strength observed in biological neurons through processes like long-term potentiation (LTP) and long-term depression (LTD). In both cases, the changes are driven by the activity in the network (biological or artificial) and serve to improve the network’s performance – either in terms of survival and behavior for a biological organism, or in terms of prediction or classification accuracy for an ANN.

Of courses there are still multiple differences between biological neural networks and artificial neural networks. ANNs usually involve much simpler learning rules and lack many of the complex dynamics found in biological brains, such as the various types of neurons and synapses, detailed temporal dynamics, and biochemical processes. The biological synaptic plasticity is a much richer and more complex process than the adjustment of weights in an ANN. Also, in most ANNs, once training is complete, the weights remain static, while in biological brains, synaptic strength is continually adapting throughout an organism’s life. Biological and artificial neural networks share computational principles but they certainly don’t implement these computations in the same way.  Brains and computers are simply very different physical things, right down to the materials that compose them.

Artificial neural networks have been in development for several decades. But it is in very recent years that we’ve seen some especially remarkable advances, to which we’ll turn now.

Recent Scientific Literature

I’d like to share 5 papers that I’ve found useful on this subject. Three are research papers with experimental data and two are perspective papers that offer some broad reflections and theoretical considerations.

The 3 research papers are:

“Brains and algorithms partially converge in natural language processing”, published in Communications Biology in 2022 by Caucheteux & King.

“Shared computational principles for language processing in humans and deep language models”, published in Nature Neuroscience in 2022 by Goldstein et al.

“Explaining neural activity in human listeners with deep learning via natural language processing of narrative text”, published in Scientific Reports in 2022 by Russo et al.

And the 2 perspective articles are:

“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, published in Neuron in 2020 by Hasson et al.

“A deep learning framework for neuroscience”, published in Nature Neuroscience in 2019 by Richards et al.

In each of the three research papers human participants read or listened to certain passages while their brain signals for specific brain regions were measured. Deep learning models were trained on this data to predict the brain signals that would result from the text. Researchers looked for instances of high correlation between actual brain patterns and the brain patterns predicted by the model and mapped where in the brain these signals occurred at various points in time before and after word onset. In particular, they noted whether the brain regions activated corresponded to those regions that would be expected from neuroscience to activate in the various stages of language processing.

In the first article, “Brains and algorithms partially converge in natural language processing”, published in Communications Biology in 2022 by Caucheteux & King the researchers used deep learning models to predict brain responses to certain sentences. Then the actual brain responses of human subjects were used as training data for the models. They used a variety of models that they classified as visual, lexical, and compositional. Then they evaluated how well these different types of models matched brain responses in different brain regions. The brain responses in the human subjects were measured using functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG).

Regarding the 3 different types of models:

Visual models are deep learning models that are primarily used for tasks involving images or videos. They are trained to recognize patterns in visual data, which can then be used to perform tasks such as image classification, object detection, image generation, and more. The most common types of visual deep learning models are convolutional neural networks (CNNs). CNNs are specifically designed to process pixel data and have their architecture inspired by the human visual cortex. 

Lexical models are models that focus on the processing of words or “lexemes” in a language. They work with individual words or groups of words (n-grams), treating them as atomic units. Lexical models can learn word representations (often called “embeddings”) that capture the semantic meanings of words, and their relationships with each other. They are often used in natural language processing (NLP) tasks such as text classification, sentiment analysis, and named entity recognition. Examples of lexical models include traditional word2vec or GloVe models, which map words into a high-dimensional vector space.

Compositional models, also called “sequential” or “recurrent” models, handle sequences of data where the order of the data points is important, such as sentences, time-series data, etc. They are designed to process one part of the sequence at a time and maintain a kind of memory (in the form of hidden states) of what has been seen so far. This allows them to capture patterns over time and use this information to make predictions about future data points in the sequence. Examples include causal language transformers (CLTs) like GPT.

Interestingly enough, the accuracy of the different types of models was observed to vary with time from the word onset. And the moments of high correlation of each model type corresponded with the activation of certain brain regions.

In early visual responses – less than 150 ms, when subjects would first see a word – brain activations were in the primary visual cortex and correlated best with activations in visual models, convolutional neural networks (CNNs).

At around 200 ms these brain activations were conveyed to the posterior fusiform gyrus. At the same time lexical models like Word2Vec started to correlate better than CNNs. This tracks with the hypothesis that the fusiform gyrus is responsible for orthographic and morphemic computations.

Around 400 ms brain activations were present in a broad fronto-temporo-parietal network that peaked in the left temporal gyrus. At this point lexical models like Word2Vec also correlated with the entire language network. These word representations were then sustained for several seconds, suggesting a widespread distribution of meaning in the brain.

Around 500-600 ms there were complex recurrent dynamics dominated by both visual and lexical representations.

After 800 ms, brain activations were present in the prefrontal, parietal, and temporal lobes. At the same time compositional models like causal language transformers (CLTs) correlated better than lexical models. The team speculated that these late responses might be due to the complexity of the sentences used in this study, potentially delaying compositional computations.

The researchers concluded from their experiment that the results show that deep learning algorithms partially converge toward brain-like solutions.

In “Shared computational principles for language processing in humans and deep language models”, published in Nature Neuroscience in 2022 by Goldstein et al. the researchers compared the responses of human participants and autoregressive deep language models (DLMs) to the text of a 30-minute podcast.

The authors note that human language has traditionally been explained by psycholinguistic approaches using interpretable models that combine symbolic elements, such as nouns, verbs, adjectives, and adverbs, with rule-based operations. This is similar to the kind of traditional programming that Hubert Dreyfus argued would not be viable for AI. In contrast, autoregressive Deep Language Models (DLMs) learn language from real-world textual examples, with minimal or no explicit prior knowledge about language structure. They do not parse words into parts of speech or apply explicit syntactic transformations. Instead, these models learn to encode a sequence of words into a numerical vector, termed a contextual embedding, from which the model decodes the next word. Autoregressive DLMs, such as GPT-2, have demonstrated effectiveness in capturing the structure of language. But the open question is whether the core computational principles of these models relate to how the human brain processes language. The authors present their experimental findings as evidence that human brains process incoming speech in a manner similar to an autoregressive DLM.

In the first experimental setup, participants proceeded word by word through a 30-minute transcribed podcast, providing a prediction of each upcoming word. Both the human participants and GPT-2 were able to predict words well above chance. And there was high overlap in the accuracy of the predictions of human subjects and GPT-2 for individual words, i.e. words that human subjects predicted well GPT-2 also predicted well. This experiment was determined to demonstrate that listeners can accurately predict upcoming words when explicitly instructed to do so, and that human predictions and autoregressive DLM predictions are matched in this context. Next the researchers wanted to determine if the human brain, like an autoregressive DLM, is continuously engaged in spontaneous next-word prediction without such explicit instruction. And whether neural signals actually contain information about the words being predicted.

In the next experimental setup, the researchers used electrocorticography (ECoG) to measure neural responses of human participants before and after word-onset. Subjects engaged in free listening, without being given any explicit instruction to predict upcoming words. The goal was to see if our brains engage in such prediction all the time as simply a natural part of language comprehension.

The results from human subjects in this experiment were also compared to models. The first model used was a static word embedding model, GloVe. The model was used to localize electrodes containing reliable responses to single words in the narrative. The words were aligned with neural signals and then the model would be trained to predict neural signals from word embeddings. A series of coefficients corresponding to features of the word embedding was learned using linear regression to predict the neural signal across words from the assigned embeddings. “The model was evaluated by computing the correlation between the reconstructed signal and the actual signal” for the word.

In the results of this experiment there was indeed found to be a neural signal before word onset. But what the model also enabled the researchers to do was ascertain some kind of semantic content from that signal, since the model had been trained to predict certain neural signals for given words. What was observed was that “the neural responses before word onset contained information about human predictions regarding the identity of the next word. Crucially, the encoding was high for both correct and incorrect predictions. This demonstrated that pre-word-onset neural activity contains information about what listeners actually predicted, irrespective of what they subsequently perceived.” Of course, sometimes the subject’s predictions were wrong. So what happened in those cases? “The neural responses after word onset contained information about the words that were actually perceived.” So “the encoding before word onset was aligned with the content of the predicted words” and “ the encoding after word onset was aligned with the content of the perceived words.” This all aligns with what we would expect under a predictive processing (PP) model of the brain.

The next level of analysis was to replace the static embedding model (GloVe) with a contextual embedding model (GPT-2) to determine if this would improve the ability to predict the neural signals to each word. It did; an indication that contextual embedding is a closer approximation to the computational principles underlying human language. And the improved correlation from contextual embedding was found to be localized to specific brain regions. “Encoding based on contextual embeddings resulted in statistically significant correlations” in electrodes that “were not significantly predicted by static embedding. The additional electrodes revealed by contextual embedding were mainly located in higher-order language areas with long processing timescales along the inferior frontal gyrus, temporal pole, posterior superior temporal gyrus, parietal lobe and angular gyrus.” The authors concluded from this that “the brain is coding for the semantic relationship among words contained in static embeddings while also being tuned to the unique contextual relationship between the specific word and the preceding words in the sequence.”

The authors submit that DLMs provide a new modeling framework that drastically departs from classical psycholinguistic models. They are not designed to learn a concise set of interpretable syntactic rules to be implemented in novel situations, nor do they rely on part of speech concepts or other linguistic terms. Instead, they learn from surface-level linguistic behavior to predict and generate the contextually appropriate linguistic outputs. And they propose that their experiments provide compelling behavioral and neural evidence for shared computational principles between the way the human brain and autoregressive DLMs process natural language.

In “Explaining neural activity in human listeners with deep learning via natural language processing of narrative text”, published in Scientific Reports in 2022 by Russo et al. human participants listened to a short story, both forward and backward. Their brain responses were measured by functional MRI. Text versions of the same story were tokenized and submitted to GPT-2. Both the brain signal data and GPT-2 outputs were fed into a general linear model to encode the fMRI signals.

The 2 outputs researchers looked at from GPT-2 were surprisal and saliency. Surprisal is a measure of the information content associated with an event, in terms of its unexpectedness or rarity. The more unlikely an event, the higher its surprisal. It is defined mathematically as the negative logarithm of the probability of the event. Saliency refers to the quality by which an object stands out relative to its neighbors. In a text it’s the importance or prominence of certain words, phrases, or topics, a measure of how much a particular text element stands out relative to others in the same context.

What they found in their results was that the surprisal from GPT-2 correlated with the neural signals in the superior and middle temporal gyri, in the anterior and posterior cingulate cortices, and in the left prefrontal cortex. Saliency from GPT-2 correlated with the neural signals for longer segments in the left superior and middle temporal gyri.

The authors proposed that their results corroborated the idea that word-level prediction is accurately indexed by the surprisal metric and that the neural activation observed from the saliency scores suggests the co-occurrence of a weighing mechanism operating on the context words. This was something previously hypothesized as necessary to language comprehension.

The involvement of areas in the middle and the superior temporal gyrus aligns with previous studies supporting that core aspects of language comprehension, such as maintaining intermediate representations active in working memory and predicting upcoming words, do not necessarily engage areas in the executive control network but are instead performed by language-selective brain areas that, in this case, are the ones relatively early in the processing hierarchy.

I found the following comment in the discussion section of the paper quite interesting: “In general, considering that the architecture of artificial neural networks was originally inspired by the same principles of biological neural networks, it might be not at all surprising that some specific dynamics observed in the former are somehow reflected in the functioning of the latter.” I think that’s an interesting point. The whole idea of artificial neural networks came from biological neural networks. We were basically trying to do something similar to what neurons do. We don’t know exhaustively how biological neural networks work but we do know that they work very well. When we are finally able to make artificial networks that work quite well it’s perhaps to be expected that they would have similar characteristics as biological neural networks.

The other two papers were perspective papers. These didn’t present the results of experiments but discussed what I thought were some interesting ideas relating to the whole interchange between language processing in the human brain and in deep learning models.

In “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, published in Neuron in 2020 by Hasson et al. the authors covered several topics. One thing they addressed that I found interesting was a challenge to three basic assumptions of cognitive psychology. These assumptions are:

1. The brain’s computational resources are limited and the underlying neural code must be optimized for particular functions. They attribute this to Noam Chomsky and Jerry Fodor.

2. The brain’s inputs are ambiguous and too impoverished for learning without built-in knowledge. They attribute this to Noam Chomsky.

3. Shallow, externally supervised and self-supervised methods are not sufficient for learning. They attribute this to Steven Pinker.

In response to the first assumption the authors argue that the brain’s computational resources are actually not scarce. “Each cubic millimeter of cortex contains hundreds of thousands of neurons with millions of adjustable synaptic weights, and BNNs utilize complex circuit motifs hierarchically organized across many poorly understood cortical areas. Thus, relative to BNNs, ANNs are simplistic and minuscule.” Artificial neural networks are indeed trained on huge amounts of data. GPT-4 is essentially trained on the whole internet. Human children don’t learn to talk by reading the whole internet; that’s true. But the human brain is also a lot more complex than even the most sophisticated artificial neural networks; so far at least. So if GPT-4 is able to perform so impressively with a structure that’s less sophisticated than the human brain we can expect that the human brain’s computational resources are hardly scarce.

In response to the second assumption the authors argue that the brain’s input is not impoverished. Noam Chomsky, arguably the most important linguist of the 20th century, argued for what he called the “poverty of the stimulus,” meaning that the linguistic input children receive is often incomplete, ungrammatical, or otherwise imperfect. But they still manage to learn their native language effectively. How? Chomsky proposed that there is a “Language Acquisition Device” (LAD) within the human brain. This hypothetical module is thought to be equipped with knowledge of a “Universal Grammar,” which encapsulates the structural rules common to all human languages. But Hasson et al. argue that there is no poverty of the stimulus because deep learning models can produce direct fit with reliable interpretations using dense and broad sampling for the parameter space. The model is casting a very wide net. They state: “One of our main insights is that dense sampling changes the nature of the problem and exposes the power of direct-fit interpolation-based learning… The unexpected power of ANNs to discover unintuitive structure in the world suggests that our attempts to intuitively quantify the statistical structure in the world may fall short. How confident are we that multimodal inputs are in fact not so rich?” By the way, I was sharing a draft of this with a friend who shared another recent paper with me by UC Berkeley professor Steven Piantadosi, titled “Modern language models refute Chomsky’s approach to language”. I’m not going to get into that now but just thought I’d mention it.

In response to the third assumption the authors argue that shallow self-supervision and external-supervision are sufficient for learning. The authors cite Pinker’s book The Language Instinct: How the Mind Creates Language as an example of the view that they are challenging. Pinker’s views are very similar to Chomsky’s. Pinker argues that language learning is not just about imitation or conditioning. Instead, he believes that the human brain has an inherent structure for understanding language, which is why children are able to learn languages so rapidly and effortlessly, often making grammatical leaps that aren’t explicitly taught or present in their environment. But Hasson et al. argue that humans have a great deal of external supervision from our environment, both social and physical. They refer to the importance of embodiment to predictive processing, referring to the ideas of Andy Clark and Karl Friston, among others.

Another subject the authors address is the issue of interpretability. This goes back to the Verum factum principle from Vico. Scientific models, including those in neuroscience, are often evaluated based on two desirable features: (1) interpretability and (2) generalization. We want explanations to have good predictive power but we also want to be able to understand them. Not just verify that they work. If it’s an equation we like to be able to look at an equation and be able to intuit how it works. And this means that the equation can’t be too long or have too many parameters. However, interpretability and generalization are often in conflict. Models with good interpretability may have strong explanatory appeal but poor predictive power, and vice versa.

The authors suggest that the brain is an exceptionally over-parameterized modeling organ. Interpretability in the brain is intractable for the same reason interpretability of deep learning models is intractable. They work with a huge number of parameters. There’s quantification occurring but it’s not like a concise equation that you can look at in grasp intellectually. The authors propose that neural computation relies on brute-force direct fitting, which uses over-parameterized optimization algorithms to enhance predictive power, i.e. generalization, without explicitly modeling the underlying generative structure of the world.

One thing that’s really nice about this paper (and I highly recommend it by the way, it’s a delightful read) is its 3 “boxes” that touch on some key concepts. One box covers the biomimicry of biological neural networks by artificial neural networks. The authors state that artificial neural networks (ANNs) are learning models that draw inspiration from the biological neural networks (BNNs) present in living brains, but that ANNs are a highly abstracted version of BNNs. Some biological nervous systems include functional specialized system-level components like the hippocampus, striatum, thalamus, and hypothalamus, elements not included in contemporary ANNs. ANNs are also disembodied and do not closely interact with the environment in a closed-loop manner. While the authors concede that ANNs are indeed highly simplified models of BNNs, they propose that there exist some essential similarities: they both belong to the same group of over-parameterized, direct-fit models that depend on dense sampling for learning task-relevant structures in data. And, crucially, ANNs are currently the only models that achieve human-like behavioral performance in many domains and can offer unanticipated insights into both the strengths and limitations of the direct-fit approach. Like BNNs, ANNs are founded on a collection of connected nodes known as artificial neurons or units that loosely resemble neurons in a biological nervous system. Each connection, akin to synapses in BNNs, links one artificial neuron to another, and the strength of these connections can be adjusted through learning. The connections between artificial neurons have weights that are adjusted during the learning process based on supervised feedback or reward signals. The weight amplifies or reduces the strength of a connection. And much like BNNs, ANNs are sometimes organized into layers.

Another “box” addresses embodiment. This is something the philosopher Andy Clark has addressed a lot in his work. Not to mention, going further back, the philosopher Maruice Merleau-Ponty. At present, ANNs are disembodied and unable to actively sample or modify their world. The brain does not operate with strictly defined training and test regimes as found in machine learning. Objective functions in BNNs must satisfy certain body-imposed constraints to behave adaptively when interacting with the world. The authors suggest that adding a body to current ANNs, capable of actively sampling and interacting with the world, along with ways to directly interact with other networks, could increase the network’s learning capacity and reduce the gaps between BNNs and ANNs. Interestingly enough, they cite Wittgenstein’s “Philosophical Investigations” when addressing the way social others direct our learning processes.

One other topic in the paper that I found interesting was a discussion of “System 1” and “System 2”. This model was made most famous by Daniel Kahneman in his 2011 book Thinking Fast and Slow. The authors cite Jonathan St B. T. Evans’s 1984 paper “Heuristic and analytic processes in reasoning”. And there are earlier precedents for the general idea going back further in history. System 1 represents fast, automatic, and intuitive thinking, what Evans called heuristic processes. And System 2 represents slow, effortful, and deliberate thinking, what Evans called analytic processes. Hasson et al. propose that we can understand System 1 to be a kind of substrate from which System 2 can arise. System 2 is where things get really interesting. That’s where we find some of the most impressive capacities of the human mind. But they maintain that we have to start with System 1 and build from there. They state: “Although the human mind inspires us to touch the stars, it is grounded in the mindless billions of direct-fit parameters of System 1.” They see artificial neural networks as having the most relevance toward explaining System 1 processes. And the thing is we seem to be continually finding that System 1 includes more than we might have thought. “Every day, new ANN architectures are developed using direct-fit procedures to learn and perform more complex cognitive functions, such as driving, translating languages, learning calculus, or making a restaurant reservation–functions that were historically assumed to be under the jurisdiction of System 2.” 

In “A deep learning framework for neuroscience”, published in Nature Neuroscience in 2019 by Richards et al. the authors focus on three key features of artificial neural network design – (1) objective functions, (2) learning rules, and (3) architectures – and address how these design components can impact neuroscience.

The authors observe that when the traditional framework for systems neuroscience was formulated, they could only collect data from a small selection of neurons. Under this framework, a scientist observes neural activity, formulates a theory of what individual neurons compute, and then constructs a circuit-level theory of how these neurons integrate their operations. However, the question arises as to whether this traditional framework can scale up to accommodate recordings from thousands of neurons and all of the behaviors that one might want to explain. It’s arguable that the classical approach hasn’t seen as much success when applied to large neural circuits that perform a variety of functions, such as the neocortex or hippocampus. These limitations of the classical framework suggest that new methodologies are necessary to capitalize on experimental advancements.

At their fundamental level, ANNs model neural computation using simplified units that loosely emulate the integration and activation properties of real neurons. The specific computations performed by ANNs are not designed but learned. When setting up ANNs, scientists don’t shape the specific computations performed by the network. Instead, they establish the three components mentioned previously: objective functions, learning rules, and architecture. Objective functions measure the network’s performance on a task, and learning involves finding synaptic weights that maximize or minimize this objective function. These are often referred to as ‘loss’ or ‘cost’ functions. Learning rules offer a guide for updating the synaptic weights. And architectures dictate the arrangement of units in the network and determine the flow of information, as well as the computations the network can or cannot learn.

Richards et al. make an observation about interpretability similar to that made by Hasson et al. The computations that emerge in large-scale ANNs trained on high-dimensional datasets can be hard to interpret. An ANN can be constructed with a few lines of code, and for each unit in an ANN, the equations determining their responses to stimuli or relationships to behavior can be specified. But after training, a network is characterized by millions of weights that collectively encode what the network has learned, and it is difficult to envision how such a system could be described with only a few parameters, let alone in words. They suggest that we think about this in the following way. Theories can have a compact explanation that can be expressed in relatively few words that can then be used to develop more complex, non-compact models. They give the theory of evolution by natural selection as a comparative example. The underlying principle is fairly simple and comprehensible, even if the actual mechanics that emerge from it are very complex. For systems neuroscience we can start with these three relatively simple and comprehensible principles: objective functions, learning rules, and architecture. Then even though the system that emerges from that is too complex to comprehend at least the underlying principles are comprehensible and give some degree of intuitive understanding.

Conclusion

Something that I find exciting about all this is that it’s an interesting interface between philosophy of mind, neuroscience, and programming. I think that some of the most interesting problems out there are philosophical problems. Even many scientific problems transition into philosophical problems eventually. But our philosophy needs periodic grounding in the world of empirical observations. What we might call armchair philosophy runs the danger of getting untethered from reality. In the philosophy of mind we can speculate about a lot of things that don’t work out very well in neuroscience. That’s not to say that philosophy of mind has to be entirely bounded by neuroscience. Just because human minds work in a certain way doesn’t mean that minds of any kind would have to be constrained in the same way. There could be many different ways for minds to work. But if we’re theorizing about ways other types of minds might work we don’t, at present, have ways to verify that they actually would work. With theories about human minds we can at least try to verify them. Even that’s kind of challenging though because the brain is so complex and difficult to observe directly at high resolution.

Still, there’s a lot about our brains that we do know that we can take into account in our theories of the mind. We know that our brains have neurons and that neurons make synaptic connections. And we know that those synaptic connections can strengthen or weaken. We can at least account for that in our theories. Artificial neural networks patterned after biological neural networks are useful tools to model our brains. We can’t go into every synaptic cleft in the brain to sample its flux of neurotransmitters. Or record the firing frequency of every neuron in the brain. That would be great but we just don’t have that capability. With artificial neural networks, as imperfect approximations as they are, we at least have recorded information for billions of parameters, even if their sheer quantity defies comprehension. And we can try out different configurations to see how well they work.

Another subtopic that’s interested me for a while is the possibility of what I call a general theory of mind. “General” in the sense of applying beyond just the special case of human minds, a theory of the human mind being a “special” theory of mind. What other kinds of minds might there be? What are all the different ways that a mind can work? AI might give us the ability to simulate and test more general and exotic possibilities and to extract the general principles they all hold in common.

I think the recent success of these large language models is quite exciting. Maybe a little bit frightening. But I’m mostly excited to see what we can learn.

The AI Revolution

Jared, Mike, and Todd talk about ChatGPT and the AI revolution, particularly the remarkable developments of the last year and 6 months (2022-2023). We talk briefly about how neural networks and large language models work. Then we get into the possible social, economic, aesthetic, and even existential implications for humanity. We talk about the possibility of this being a major positive breakthrough of the first order, with revolutionary implications for human prosperity and standard of living. But we also talk about the risks, ranging from an eclipsing of human distinctiveness and creativity to a threat to humanity’s very survival. Overall, we have a hunch that something big is happening and we want to talk about it!

The Practice of Prayer

There is a condition of looking for something without knowing what we are looking for, or even that we are looking for anything at all. Augustine called it restlessness. Jesus described it as a thing that we would ask for if we knew to ask for it. It is a thirst for living water that will quench all thirst. All religions give witness to this act of reaching out. Jesus taught us to reach out by calling upon God in prayer. Prayer is not just one act among many. It works directly on that essential thirst that can only be satisfied in God.

With this episode I’d like to talk about some things I’ve been studying about prayer. This may be one of the most practical topics I’ve ever gotten into since it’s essentially about a practice, something that you do. We can talk about it and reflect on it, which is what I’ll be doing here. But prayer is ultimately a spiritual practice. Theology can certainly be theoretical and intellectual. And that’s something that I really like about it. But I always try to remember something that Evagrius Ponticus (345 – 399) said about theology: “A theologian is one who prays, and one who prays is a theologian.”

I try to live my life in imitation of Christ and one thing that stands out to me in the scriptures is that Jesus prayed. And I think this is very significant. In his book Jesus of Nazareth Pope Benedict XVI said:

“Again and again the Gospels note that Jesus withdrew ‘to the mountain’ to spend nights in prayer ‘alone’ with his Father. These short passages are fundamental for our understanding of Jesus; they lift the veil of mystery just a little; they give us a glimpse into Jesus’ filial existence, into the source from which his action and teaching and suffering sprang. This ‘praying’ of Jesus is the Son conversing with the Father; Jesus’ human consciousness and will, his human soul, is taken up into that exchange, and in this way human ‘praying’ is able to become a participation in this filial communion with the Father.” (7)

As is typical with Benedict, he packs a lot into very condensed passages. Three points stand out to me here about Jesus’ practice of prayer.

1. It is fundamental for our understanding of him.

2. It is the source from which his action and teaching and suffering sprang.

3. Our prayer is a way of participating in the communion that Jesus has with the Father.

That prayer was something fundamental to Jesus’ behavior and identity was apparently something that his disciples noticed as well. On one occasion after he returned from prayer they asked him to instruct them.

“And it came to pass, that, as he was praying in a certain place, when he ceased, one of his disciples said unto him, Lord, teach us to pray, as John also taught his disciples.” (John 11:1, KJV)

And we have many examples in the Gospels of Jesus teaching about prayer and how to pray, especially in Luke.

As I’ve reflected on prayer I keep sensing its great importance. It’s such a simple thing. And we even tend to dismiss it as insignificant. Like many things, the phrase “thoughts and prayers” is politicized and maybe that’s an apt indicator of our attitudes about prayer, that it’s something empty and ineffectual. And it’s certainly true that prayer can be empty and vain. Jesus even said as much (Matthew 6:5-8). But I actually believe that sincere prayer, far from being empty and ineffectual, is actually the most important thing that we can do. If we want to change the world, starting especially with changing ourselves, we must pray.

Prayer touches on the fundamental issues of who we are and what we exist for. Augustine of Hippo (354 – 430) said to God in his Confessions, “You have made us for Yourself.” Why do we exist? We exist for God. That’s not what most of us think. We may think we exist for any number of other things, or nothing at all. We could say, as Jesus said to Martha, that we “are worried and troubled about many things” (Luke 10:41, NKJV). Ultimately all of these things, all our desires, interests, projects, and concerns are imperfect reflections of the most fundamental and innate desire for our creator and sustainer. But we often don’t know that that’s what we’re looking for, or even that we’re looking for anything at all.

Each of us is, in many ways, the Samaritan woman at the well to whom Jesus said:

“If you knew the gift of God, and who it is who says to you, ‘Give Me a drink,’ you would have asked Him, and He would have given you living water.” (John 4:10, NKJV)

What an interesting hypothetical. You would be asking for something. You’re not asking for it now. But you would ask for it if you knew about it. It’s this fascinating situation where we’re looking for something without knowing what we are looking for or even that we are looking for anything at all.

I think this is an apt description of the human condition generally. There’s this kind of generalized discontent and incompleteness to our existence. Augustine called it restlessness.

“You have made us for Yourself, and our hearts are restless until they rest in You.” 

I think a scripturally appropriate term would be thirst. Jesus described the object of this thirst as “living water”:

“Whoever drinks of this water [meaning literal, physical water] will thirst again, but whoever drinks of the water that I shall give him will never thirst. But the water that I shall give him will become in him a fountain of water springing up into everlasting life.” (John 4:13-14, NKJV)

This living water is both our source and purpose. It’s the culmination of all our longing but we know, both from scripture and just from experience, that the challenges of finding it are significant. Paul said we seek in the hope that we might grope for and find the Unknown God, even “though He is not far from each one of us” (Acts 17:27, NKJV)  Paul also acknowledged that prayer itself is difficult: “We do not know what we should pray for as we ought” (Romans 8:26, NKJV)

In my conversations with fellow Christians we’ve shared this experience that prayer can be difficult. We don’t feel like we’re doing it right or that we’re making that spiritual connection with God. That’s a common experience and has been from the beginning. But we have help. Paul said:

“Likewise the Spirit also helps in our weaknesses. For we do not know what we should pray for as we ought, but the Spirit Himself makes intercession for us with groanings which cannot be uttered.” (Romans 8:26, NKJV)

It seems appropriate and perfect to me that the Spirit would intercede for our nondescript, generalized restlessness for the Unknown God with unutterable groanings. Even if we don’t know what we’re looking for or that we’re looking for anything at all the Spirit can intercede and act on this most vague longing with groanings which cannot be uttered.

Something that I’ve found helpful in the practice of prayer is making use of the different forms of prayer from the Christian tradition. The Catechism of the Catholic Church identifies three major expressions of the life of prayer in the Christian tradition (2699, 2721):

Vocal Prayer

Meditation

Contemplative Prayer

I find that one or the other of these three expressions of prayer is often most suitable at certain times. I think that sometimes we find prayer difficult because we only know of one form. And even though that one form may be very suitable in many situations it might not be most suitable in others. I’ve found it helpful to weave these three forms together in my practice of prayer.

I tend to think of these three expressions of prayer as sitting on a spectrum of expressibility and expressive content. Vocal prayer is most characterized by expressible content in the sentences that we speak. Contemplative prayer mostly transcends anything that can be expressed in words. And meditation, centering mostly in the words scripture and the life of Christ, sits between vocal prayer and contemplative prayer in its degree of expressibility.

The first major expression of prayer in the Christian tradition is vocal prayer. There are a couple things that strike me about Jesus’s teachings about vocal prayer. And I think they’re related. The first is that in our petitions we must have faith. The second is that we should be relentless in our petitions. I think those two things are related. And they strike me because I don’t feel like I live in an age and culture where we really believe in miracles, especially not to a degree that we would pursue them relentlessly in our prayers. Part of that may be our secularism. And part of it may be a concern that relentlessness would be irreverently presumptuous. But Jesus seemed to have precious little concern about presumptuousness. Consider the following parable:

“Then He spoke a parable to them, that men always ought to pray and not lose heart, saying: There was in a certain city a judge who did not fear God nor regard man. Now there was a widow in that city; and she came to him, saying, Get justice for me from my adversary. And he would not for a while; but afterward he said within himself, Though I do not fear God nor regard man, yet because this widow troubles me I will avenge her, lest by her continual coming she weary me. Then the Lord said, Hear what the unjust judge said. And shall God not avenge His own elect who cry out day and night to Him, though He bears long with them? I tell you that He will avenge them speedily. Nevertheless, when the Son of Man comes, will He really find faith on the earth?” (Luke 18:1-8, NKJV)

Jesus was insistent that God is the most disposed to grant petitions for those who seek after them.

“Ask, and it will be given to you; seek, and you will find; knock, and it will be opened to you. For everyone who asks receives, and he who seeks finds, and to him who knocks it will be opened. Or what man is there among you who, if his son asks for bread, will give him a stone? Or if he asks for a fish, will he give him a serpent? If you then, being evil, know how to give good gifts to your children, how much more will your Father who is in heaven give good things to those who ask Him!” (Matthew 7:7-12, NKJV)

I’m struck by the directness and complete lack of qualification in these teachings. But if you’re like me you have doubts that it can really be so straightforward. Why? Because we’ve all had the experience that Jesus’s disciples had, where we pursued a miracle that didn’t come:

“Then the disciples came to Jesus privately and said, Why could we not cast it [the demon] out? So Jesus said to them, Because of your unbelief; for assuredly, I say to you, if you have faith as a mustard seed, you will say to this mountain, Move from here to there, and it will move; and nothing will be impossible for you. However, this kind does not go out except by prayer and fasting.” (Matthew 17:19-21, NKJV)

We’ve all had this experience. We pray for something and we don’t get it. I’ve even considered this an important spiritual developmental step, moving from a more naive conception of God to one that’s more sophisticated, where we can appreciate the various reasons that our petitions in prayer might not be granted. But I’m coming around to question that. I wonder if we’re too quick in our sophistication to enable underdeveloped faith.

This is why I think prayer, far from being vain and ineffectual, is the most important thing we can do. We need, as individuals and as societies and nations, things that we cannot produce on our own. We need God to intervene. There are societies and sub-cultures where these things do happen, where people expect, pursue, and receive miracles. God knows how to give good gifts to his children.

The second major expression of prayer in Christian tradition is meditation. Meditation might not be something we popularly associate with Christianity but it’s definitely part of the tradition. It’s often facilitated by texts of scripture and devotional writings. Also visual arts like icons. Lectio divina is one venerable practice of reading scripture for the special purpose of focusing and meditating on it in prayer. I often use one of the Psalms for this purpose. Events from the life of Christ are also very powerful. 

The Rosary is a classic example of a practice of prayer that is focused on events from the life of Christ. Each cycle of the Rosary goes through five “Mysteries” from the life of Christ.

The Joyful Mysteries are:

  • The Annunciation
  • The Visitation
  • The Nativity
  • The Presentation in the Temple
  • The Finding in the Temple

The Sorrowful Mysteries are

  • The Agony in the Garden
  • The Scourging at the Pillar
  • The Crowning with Thorns
  • The Carrying of the Cross
  • The Crucifixion and Death

The Glorious Mysteries are

  • The Resurrection
  • The Ascension
  • The Descent of the Holy Spirit
  • The Assumption
  • The Coronation of Mary

The Luminous Mysteries are

  • The Baptism of Christ in the Jordan
  • The Wedding Feast at Cana
  • Jesus’ Proclamation of the Coming of the Kingdom of God
  • The Transfiguration
  • The Institution of the Eucharist

We can read the accounts of these events in scripture and learn about their contents. But in meditative prayer we can go deeper into them to be moved and edified by them. As an example, concerning the mystery of the Carrying of the Cross, Bishop Robert Barron remarked that, “Carrying the cross must become the very structure of the Christian life.” This idea has had a profound impact on me as I’ve meditated on it.

Something I enjoy about scripture is that it’s very intellectually challenging and stimulating. And interdisciplinary. It involves topics of history, philosophy, and linguistics. I think that’s wonderful. But I think there’s sometimes a temptation to compete over who can be the most knowledgeable about the content of scripture. I don’t think that serves the purposes of scripture at all. In The Imitation of Christ Thomas à Kempis (1380 – 1471) warned: “If you wish to derive profit from your reading of Scripture, do it with humility, simplicity, and faith; at no time use it to gain a reputation for being one who is learned.” (Book I, Chapter V) Rather, Thomas said: “Let it then be our main concern to meditate on the life of Jesus Christ.” (Book I, Chapter I)

In addition to meditation of the life of Christ I cannot speak highly enough about the edifying influence of the Psalms. I’ve said at times, and I still think it’s true, that the fastest way to learn about the narrative arc of the Old Testament is to read 1 Samuel through 2 Kings. And of course those four books are books of holy scripture, so well worth reading. But I think now that the most direct path into the spiritual world of the Old Testament is in the Psalms. I admit that I didn’t always appreciate them and couldn’t get into them. Maybe I wasn’t ready for them. But I really appreciate them now. Sometimes if I find it difficult to get into prayer the Psalms are a great way to get started, to get into the right frame of mind.

To paraphrase Ecclesiastes (3:1), there is a Psalm for every season.

Psalms of joy:

“O how love I thy law! it is my meditation all the day.” (Psalm 119:97, KJV)

“How sweet are thy words unto my taste! yea, sweeter than honey to my mouth!” (Psalm 119: 103, KJV)

“Thy word is a lamp unto my feet, and a light unto my path.” (Psalm 119:103, KJV)

“Let every thing that hath breath praise the Lord. Praise ye the Lord.” (Psalm 150:6, KJV)

Psalms of grief and frustration:

“How long wilt thou forget me, O Lord? for ever? how long wilt thou hide thy face from me?” (Psalm 13:1, KJV)

“My God, my God, why hast thou forsaken me?” (Psalm 22:1)

And Psalms of reflection:

“When I consider thy heavens, the work of thy fingers, the moon and the stars, which thou hast ordained; What is man, that thou art mindful of him? and the son of man, that thou visitest him?” (Psalm 8:3-4, KJV)

“One thing have I desired of the Lord, that will I seek after; that I may dwell in the house of the Lord all the days of my life, to behold the beauty of the Lord, and to enquire in his temple.” (Psalm 27:1, KJV)

In his book Jesus of Nazareth, Pope Benedict XVI said: “The Psalms are words that the Holy Spirit has given to men; they are God’s Spirit become word.” (131) Speaking about the Psalms and the Lord’s Prayer he remarks that certain formulaic prayers like these can help us to get started in prayer and in approaching God.

“Our prayer can and should be a wholly personal prayer. But we also constantly need to make use of those prayers that express in words the encounter with God experienced both by the Church as a whole and by individual members of the Church… In the formulaic prayers that arose first from the faith of Israel and then from the faith of praying members of the Church, we get to know God and ourselves as well. They are a ‘school of prayer’ that transforms and opens up our life… Normally, thought precedes word; it seems to formulate the word. But praying the Psalms and liturgical prayer in general is exactly the other way around: The word, the voice, goes ahead of us, and our mind must adapt to it. For on our own we human beings do not ‘know how to pray as we ought’ (Rom 8:26) – we  are too far removed from God, he is too mysterious and too great for us. And so God has come to our aid: He himself provides the words of our prayer and teaches us to pray. Through the prayers that come from him, he enables us to set out toward him; by praying together with the brothers and sisters he has given us, we gradually come to know him and draw closer to him.” (130-131)

The third major expression of prayer in the Christian tradition is contemplative prayer. This is the form of prayer that I think of as being the furthest on the spectrum away from expressibility and expressive content. In the Eastern Christian tradition it’s sometimes called “hesychasm”, derived from the Greek hesychia (ἡσυχία), meaning “stillness, rest, quiet, or silence”. Another descriptive term is “apophatic”, from the Greek apophēmi (ἀπόφημι), meaning “to deny”, which is characterized by negative content rather than positive content. I sometimes think of it as empty space into which the Spirit can freely enter. 

Perhaps appropriately some of the greatest spiritual writers in this tradition are anonymous (or pseudonymous). One lived sometime in the 5th or 6th century, writing under the pseudonym Dionysius, whose major work was On The Divine Names. Another was an English writer living sometime in the 14th century, whose major work was The Cloud of Unknowing.

Contemplative prayer is the most unexpressible form of prayer, but it often still involves single words or phases, similar to a mantra in Indian religious traditions. In The Cloud of Unknowing the author instructs that we should use one word of just one syllable in which to enfold our intent:

“If you like, you can have this reaching out, wrapped up and enfolded in a single word. So as to have a better grasp of it, take just a little word, of one syllable rather than of two; for the shorter it is the better it is in agreement with this exercise of the spirit. Such a one is the word ‘God’ or the word ‘love.’ Choose which one you prefer, or any other according to your liking – the word of one syllable that you like the best. Fasten this word to your heart, so that whatever happens it will never go away. This word is to be your shield and your spear, whether you are riding in peace or in war. With this word you are to beat upon this cloud and this darkness about you. With this word you are to strike down every kind of thought under the cloud of forgetting.” (Chapter VII, James Walsh edition)

Other contemplatives haven’t necessarily restricted themselves to one word alone but have also used phrases. The most notable example, especially in Eastern Christianity, is the Jesus Prayer. The Jesus Prayer is this:

“Lord Jesus Christ, Son of God, have mercy on me, a sinner.”

The scriptural roots of this prayer are in the Parable of the Pharisee and the Publican in Luke 18:9-14.

“And the tax collector, standing afar off, would not so much as raise his eyes to heaven, but beat his breast, saying, ‘God, be merciful to me a sinner!’” (NKJV)

Paul, in his first letter to the Thassolonians, counseled to “pray without ceasing” (1 Thessalonians 5:17). The Jesus Prayer is traditionally thought to be a prayer that a person can eventually learn to pray continually at every moment. In the 19th century Russian text, The Way of a Pilgrim, the pilgrim learns to pray without ceasing by incorporating the Jesus Prayer into his very breath.

“Begin bringing the whole prayer of Jesus into and out of your heart in time with your breathing, as the Fathers taught. Thus, as you draw your breath in, say, or imagine yourself saying, ‘Lord Jesus Christ,’ and as you breathe again, ‘have mercy on me.’ Do this as often and as much as you can, and in a short space of time you will feel a slight and not unpleasant pain in your heart, followed by a warmth. Thus by God’s help you will get the joy of self-acting inward prayer of the heart.”

I have found the Jesus Prayer to be the most powerful prayer for my practice of contemplation.

The Cloud of Unknowing invites what I interpret to be an inversion in perspective and attitude toward the experience of unknowing. Usually we want to know things but when we approach God in his infinity we find ourselves unable to comprehend him because he exceeds our comprehension. But this very experience of unknowability is itself a form of knowledge. It is in this cloud of unknowing that we must dwell.

“This darkness and cloud is always between you and your God, no matter what you do, and it prevents you from seeing him clearly by the light of understanding in your reason, and from experiencing him in sweetness of love in your affection. So set yourself to rest in this darkness as long as you can, always crying out after him whom you love. For if you are to experience him or to see him at all, insofar as it is possible here, it must always be in this cloud and in this darkness. So if you labour at it with all your attention as I bid you, I trust, in his mercy, that you will reach this point.” (Chapter III)

In scripture the cloud is often where we find and hear the voice of God.

“While he was still speaking, behold, a bright cloud overshadowed them; and suddenly a voice came out of the cloud.” (Matthew 17:5, NKJV)

“Now the glory of the Lord rested on Mount Sinai, and the cloud covered it six days. And on the seventh day He called to Moses out of the midst of the cloud. The sight of the glory of the Lord was like a consuming fire on the top of the mountain in the eyes of the children of Israel. So Moses went into the midst of the cloud and went up into the mountain.” (Exodus 24:16-18, NKJV)

The cloud is not an easy place to be. It requires practice and conditioning. As the author says, “So set yourself to rest in this darkness as long as you can”.

The author also counsels that such contemplation is the one act that it is not possible to pursue to excess.

“If you ask me the further question, how you are to apply discretion in this exercise, I answer and say, ‘none at all!’ In all your other activities you are to have discretion, in eating and drinking, in sleeping, and in protecting your body from the extremes of heat and cold, in the length of time you give to prayer or reading or to conversation with your fellow-Christians. In all these things you are to observe moderation, avoiding excess and defect. But in this exercise there is no question of moderation; I would prefer that you should never leave off as long as you live.” (Chapter 41)

Not only is excess of contemplation not a possibility or a problem. Unrestrained indulgence in contemplation also rightly orders the soul in regards to all other things, such that they are not taken to excess, but in proper measure.

“Now perhaps you will ask how you shall observe prudence in eating and sleeping and everything else. My answer to this is brief enough: ‘Understand it as best you can.’ Work at this exercise without ceasing and without moderation, and you will know where to begin and to end all your other activities with great discretion. I cannot believe that a soul who perseveres in this exercise night and day without moderation should ever make a mistake in any of his external activities.” (Chapter 42)

Why might this be? The Catechism says, “Contemplation is a gaze of faith, fixed on Jesus.” (2715) With a gaze fixed on Christ all other things become rightly ordered and proportioned. As Jesus said:

“But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you.” (Matthew 6:33, KJV)

I think this coheres with what I said earlier about how I believe prayer is the most important thing we can do. Because prayer, especially prayer of contemplation focuses our gaze singly on Christ. Jesus said to Martha: “You are worried and troubled about many things.” (Luke 10:41, NKJV) That’s all of us. The Greek word merimnao (μεριμνάω), to be anxious, is a word I always pay close attention to in the New Testament when I see it. It occurs a number of times in the Sermon on the Mount in Matthew 6: me merimnate (μὴ μεριμνᾶτε), do not worry. “Do not worry about your life, what you will eat or what you will drink; nor about your body, what you will put on… For your heavenly Father knows that you need all these things. But seek first the kingdom of God and His righteousness, and all these things shall be added to you.” (Matthew 6:25; 32-33, NKJV) As Jesus said to Martha: “One thing is needed.” (Luke 10:42, NKJV) That one thing is the gaze of faith, fixed on Jesus in prayer.