Causal and Emergent Models

Models are critical tools that enable us to think about, qualify, and quantify features of many processes. And as with any kind of tool, different kinds of models are better suited to different circumstances. Here we look at two kinds of models for understanding transport phenomena: causal and emergent models. In a causal process there is some kind of distinct, sequential, goal-oriented event with an observable beginning and end. In an emergent process there are uniform, parallel, independent events with no beginning or end but in which observable patterns eventually emerge.

For the video version of this episode, which includes some visual aids, see on YouTube.

Since my university studies I’ve been fascinated by the ways we use models to understand and even make quantitative descriptions and predictions about the world. I don’t remember when exactly, but at some point I really began to appreciate how the pictures of chemical and physical processes I had in my head were not the way things “really” were (exactly) but were useful models for thinking about things and solving problems.

Conceptual models in science, engineering, economics, etc. are similar to toy models like model cars or model airplanes in that they aren’t the things themselves but have enough in common with the things they are modeling to still perform in similar ways. As long as a model enables you to get the information and understanding you need it is useful, at least for the scale and circumstances you’re interested in. Models are ubiquitous in the sciences and one of the major activities in the sciences is to improve models, generate new models, and create more models to apply to more conditions.

Something to bear in mind when working with a model is the set of conditions in which it works well. That’s important because a model may work very well under a certain set of conditions but then break down outside those conditions. Outside those conditions it may give less accurate results or just not describe well qualitatively what’s going on in the system we’re trying to understand. This could be something like being outside a temperature or pressure range, extremes in velocity or gravitational field strength, etc. And often it’s a matter of geometric scale, like whether we’re dealing in meters or nanometers. The world looks different at the microscopic and molecular scale than at the macroscopic scale of daily life.

I’m really a pluralist when it comes to models. I’m in favor of several types to meet the tasks at hand. Is a classical, Newtonian model for gravity superior to a relativistic model for gravity? I don’t think so. Yeah, a Newtonian model breaks down under certain conditions. But it’s much easier and intuitive to work with under most conditions. It doesn’t make sense to just throw away a Newtonian model after relativity. And we don’t. We can’t. It would be absurdly impractical. And practicality is a major virtue of models. That’s not to say there’s no such thing as better or worse models. A Newtonian model of planetary motion is better than a Ptolemaic one because it’s both more accurate and simpler to understand. So I don’t embrace pluralism without standards of evaluation. I suppose there’d be an infinite number of really bad models in the set of all possible models. Even so, there are still multiple that do work well, that overlap and cover similar systems.

I studied chemical engineering in the university and one of my textbooks was Transport Phenomena by Bird, Stewart, and Lightfoot, sort of a holy trinity of the discipline. Transport phenomena covers fluids, heat, and diffusion, which all share many features and whose models share a very similar structure. One of the ideas I liked in that book is its systematic study of processes at three scales: macroscopic, microscopic, and molecular. I’ll quote from the book for their explanations of these different scales.

“At the macroscopic level we write down a set of equations called the ‘macroscopic balances,’ which describe how the mass, momentum, energy, and angular momentum in the system change because of the introduction and removal of these entities via the entering and leaving streams, and because of various other inputs to the system from the surroundings. No attempt is made to understand all the details of the system.”

“At the microscopic level we examine what is happening to the fluid mixture in a small region within the equipment. We write down a set of equations called the ‘equations of change,’ which describe how the mass, momentum, energy, and angular moment change within this small region. The aim here is to get information about velocity, temperature, pressure, and concentration profiles within the system. This more detailed information may be required for the understanding of some processes.”

“At the molecular level we seek a fundamental understanding of the mechanisms of mass, momentum, energy, and angular momentum transport in terms of molecular structure and intermolecular forces. Generally this is the realm of the theoretical physicist or physical chemist, but occasionally engineers and applied scientists have to get involved at this level.”

I came across an interesting paper recently from a 2002 engineering education conference titled How Chemical Engineering Seniors Think about Mechanisms of Momentum Transport by Ronald L. Miller, Ruth A. Streveler, and Barbara M. Olds. It caught my attention since I’ve been a chemical engineering senior so I wanted to see how it compared to my experience. And it tracked it pretty well actually. Their idea is that one of the things that starts to click for seniors in their studies, something that often hadn’t clicked before, is a conceptual understanding of many fundamental molecular-level and atomic-level phenomena including heat, light, diffusion, chemical reactions, and electricity. I’ll refer mostly to the examples from this paper by Miller, Streveler, and Olds but I’ll mention that they base much of their presentation on the work of Michelene Chi, who is a cognitive and learning scientist. In particular they refer to her work on causal versus emergent conceptual models for these physical processes. Her paper on this is titled Misconceived Causal Explanations for Emergent Processes. Miller, Streveler, and Olds propose that chemical engineering students start out using causal models to understand many of these processes but then move to more advanced, emergent models later in their studies.

In a causal process there is some kind of distinct, sequential, goal-oriented event with an observable beginning and end. In an elastic collision for instance, a moving object collides with a previously stationary object and transfers its momentum to it. In an emergent process there are uniform, parallel, independent events with no beginning or end but in which observable patterns eventually emerge. Electricity, fluid flow, heat transfer and molecular equilibrium are examples of emergent processes. Miller, Streveler, and Olds correlate causal and emergent explanations with macroscopic and molecular models respectively. As Bird, Stewart, and Lightfoot had said in their descriptions of their three scales, it’s at the molecular level that “we seek a fundamental understanding of the mechanisms.” But at the macroscopic scales we aren’t looking at so fundamental an explanation.      

Miller, Streveler, and Olds use diffusion, i.e. mass transport, as an example to show the difference between causal and emergent explanations. Say we have a glass of water and we add a drop of color dye to it. The water is a solvent and the color dye is a solute. This color dye solute starts to diffuse, or spread, into the water solvent and we can explain this diffusion process in both causal and emergent ways; or we could also say in macroscopic and molecular ways.

First, a quick overview of diffusion. The mathematical model for diffusion is Fick’s Law of Diffusion. The equation for this is:       

J = -D(dC/dx)

Where,
J is the diffusion flux
C is concentration
x is position
D is diffusivity, the applicable constant of proportionality in this case

The basic logic of this equation is that the diffusion of a solute is proportional to the gradient of the concentration of that solute in a solvent. If the solute is evenly distributed in the solution the concentration is the same everywhere in the solution, so there is no concentration gradient and no diffusion. But there is a gradient if the solute concentration is different at different positions in the space, for example, if it is highly concentrated at one point and less concentrated as you move away from that point. The diffusion flux is proportional to the steepness of that decrease, that gradient. If a drop of dye has just been placed in a glass of water the flux of diffusion is going to be very high at the boundary between that drop and the surrounding water because there is a huge difference in the concentration of the dye there.

So that’s the logic of Fick’s Law of Diffusion. But why does this happen? And here we can look at the two different kinds of explanations, causal and emergent explanations.         

Here are a few examples of both:

Causal Explanation: “Dye molecules move towards water molecules.”
Emergent Explanation: “All molecules exercise Brownian motion.”

Causal Explanation: “Dye molecules flow from areas of high concentration to areas of low concentration.”
Emergent Explanation: “All molecules move at the same time.”

Causal Explanation: “Dye molecules are ‘pushed’ into the water by other dye molecules.”
Emergent Explanation: “Molecules collide independently of prior collisions. What happens to one molecule doesn’t affect interactions with other molecules.”

Causal Explanation: “Dye molecules want to mix with water molecules.”
Emergent Explanation: “The local conditions around each molecule affect where it moves and at what velocity.”

Causal Explanation: “Dye molecules stop moving when dye and water become mixed.”
Emergent Explanation: “Molecular interactions continue when equilibrium is reached.”

This is gives something of a flavor of the two different kinds of explanations. Causal explanations have more of a top-down approach, looking for the big forces that make things happen, and may even speak in metaphorical terms of volition, like what a molecule “wants” to do. Emergent explanations have more of a bottom-up approach, looking at all the things going on independently in a system and how that results in the patterns we observe.

I remember Brownian motion being something that really started pushing me to think of diffusion in a more emergent way. Brownian motion is the random motion of particles suspended in a medium, like a liquid or a gas. If you just set a glass of water on a table it may look stationary, but at the molecular scale there’s still a lot of movement. The water molecules are moving around in random directions. If you add a drop of color dye to the water the molecules in the dye also have Brownian motion, with all those molecules moving in random directions. So what’s going to happen in this situation? Well, things aren’t just going to stay put. The water molecules are going to keep moving around in random directions and the dye molecules are going to keep moving around in random directions. What kind of patter should we expect to see emerge from this?

Let’s imagine imposing a three-dimensional grid onto this space, dividing the glass up into cube volumes or voxels. Far away from the drop of dye, water molecules will still be moving around randomly between voxels but those voxels will continue to look about the same. Looking at the space around the dye, voxels in the middle of the drop will be all dye. Voxels on the boundary will have some dye molecules and some water molecules. And voxels with a lot of dye molecules will be next to voxels with few dye molecules. As water molecules and dye molecules continue their random motion we’re going to see the most state changes in the voxels that are different from each other. Dye molecules near a voxel with mostly water molecules can very likely move into one of those voxels and change its state from one with few or no dye molecules to one with some or more dye molecules. And the biggest state changes will occur in regions where voxels near to each other are most different, just because they can be so easily (albeit randomly) changed.

This is a very different way of looking at the process of diffusion. Rather than there being some rule imposed from above, telling dye molecules that they should move to areas of high concentration to low concentration, all these molecules are moving around randomly. And over time areas with sharp differences tend to even out, just by random motion. From above and from a distance this even looks well-ordered and like it could be directed. The random motion of all the components results in an emergent macro-level pattern that can be modeled and predicted by a fairly simple mathematical expression. The movement of each individual molecule is random and unpredictable, but the resulting behavior of the system, the aggregate of all those random motions, is ordered and highly predictable. I just think that’s quite elegant!

Miller, Streveler, and Olds give another example that neatly illustrates different ways of understanding a physical process at the three different scales: macroscopic, microscopic, and molecular. Their second example is of momentum transport. An example of momentum transport is pumping a fluid through a pipe. As a brief overview, when a fluid like water is moved through a pipe under pressure the velocity of the fluid is highest at the center of the pipe and lowest near the walls. This is a velocity gradient, often called a “velocity profile”, where you have this cross-sectional view of a pipe showing the velocity vectors of different magnitudes at different positions along the radius of the pipe. When you have this velocity gradient there is also a transfer of momentum to areas of high momentum to areas of low momentum. So in this case momentum will transfer from the center of the pipe toward the walls of the pipe.

The model for momentum transport has a similar structure to the model for mass transport. Recall that in Fick’s Law of Diffusion, mass transport, i.e. diffusion, was proportional to the concentration gradient and the constant of proportionality was this property called diffusivity. The equation was:

J = -D(dC/dx)

The corresponding model for momentum transport is Newton’s law of viscosity (Newton had a lot of laws). The equation for that is:

τ = -μ(dv/dx)

Where

τ is shear stress, the flux of momentum transport
v is velocity
x is position
μ is viscosity, the applicable constant of proportionality in this case

So in Newton’s law of viscosity the momentum transport, i.e. shear stress, is proportional to the velocity gradient and the constant of proportionality is viscosity. You have higher momentum transport with a higher gradient, i.e. change, in velocity along the radius of the pipe. Why does that happen?

So they actually asked some students to explain this in their own words to see on what geometric scales they would make their descriptions. The prompt was: “Explain in your own words (no equations) how momentum is transferred through a fluid via viscous action.” And they evaluated the descriptions as one being of the three scales (or a mixture of them) using this rubric. So here are examples from the rubric of explanations at each of those scales:

Macroscopic explanation: The pressure at the pipe inlet is increased (usually by pumping) which causes the fluid to move through the pipe. Friction between fluid and pipe wall results in a pressure drop in the direction of flow along the pipe length. The fluid at the wall does not move (no-slip condition) while fluid furthest away from the wall (at the pipe centerline) flows the fastest, so momentum is transferred from the center (high velocity and high momentum) to the wall (no velocity and no momentum).

Microscopic explanation: Fluid in laminar flow moves as a result of an overall pressure drop causing a velocity profile to develop (no velocity at the wall, maximum velocity at the pipe centerline). Therefore, at each pipe radius, layers of fluid flow past each other at different velocities. Faster flowing layers tend to speed up [and move] slower layers along resulting in momentum transfer from faster layers in the middle of the pipe to slower layers closer to the pipe walls.

Molecular explanation: Fluid molecules are moving in random Brownian motion until a pressure is applied at the pipe inlet causing the formation of a velocity gradient from centerline to pipe wall. Once the gradient is established, molecules that randomly migrate from an area of high momentum to low momentum will take along the momentum they possess and will transfer some of it to other molecules as they collide (increasing the momentum of the slower molecules). Molecules that randomly migrate from low to high momentum will absorb some momentum during collisions. As long as the overall velocity gradient is maintained, the net result is that momentum is transferred by molecular motion from areas of high momentum to areas of low momentum and ultimately to thermal dissipation at the pipe wall.

With these different descriptions as we move from larger to smaller scales we also move from causal to emergent explanations. At the macroscopic level we’re looking at bulk motion of fluid. At the microscopic scale it’s getting a little more refined. We’re thinking in terms of multiple layers of fluid flow. We’re seeing the gradient at a higher resolution. And we can think of these layers of fluid rubbing past each other, with faster layers dragging slower layers along, and slower layers slowing faster layers down. It’s spreading out a deck of cards. In these explanations momentum moves along the velocity gradient because of a kind of drag along the radial direction.

But with the molecular description we leave behind that causal explanation of things being dragged along. There’s only one major top-down, causal force in this system and that’s the pressure or force that’s being applied in the direction of the length of the pipe. With a horizontal pipe we can think of this force being applied along its horizontal axis. But there’s not a top-down, external force being applied along the vertical or radial axis of the pipe. So why does momentum move from the high-momentum region in the center of the pipe to the low-momentum region near the pipe wall? It’s because there’s still random motion along the radial or vertical axis, which is perpendicular to the direction of the applied pressure. So molecules are still moving randomly between regions with different momentum. So if we think of these layers, these cylindrical sheets that are dividing up the sections of the pipe at different radii, these correspond to our cube voxels in the diffusion example. Molecules are moving randomly between these sheets. The state of each layer is characterized by the momentum of the molecules in it. As molecules move between layers and collide with other molecules they transfer momentum. As in the diffusion example the overall pattern that emerges here is the result of random motion of the individual molecular components.

So, does this matter? My answer to that question is usually that “it”, whatever it may be, matters when and where it matters. Miller, Streveler, and Olds say: “If the macroscopic and microscopic models are successful in describing the global behavior of simple systems, why should we care if students persist in incorrectly applying causal models to processes such as dye diffusion into water? The answer is simple – the causal models can predict some but not all important behavioral characteristics of molecular diffusional processes.” And I think that’s a good criterion for evaluation. I actually wouldn’t say, as they do, that the application of causal models is strictly “incorrect”. But I take their broader point. Certainly macroscopic and causal models have their utility. For one thing, I think they’re easier to understand starting off. But as with all models, you have to keep in mind their conditions of applicability. Some apply more broadly then others.

One thing to notice about these transport models is that they have proportionality constants. And whenever you see a constant like that in a model it’s important to consider what all might be wrapped up into it because it may involve a lot of complexity. And that is the case with both the diffusion coefficient and viscosity. Both are heavily dependent on specific properties of the system. For the value of viscosity you have to look it up for a specific substance and then also for the right temperature range. Viscosity varies widely between different substances. And even for a single substance it can still vary widely with temperature. For diffusivity you have to consider not only one substance but two, at least. If you look up a coefficient of diffusivity in a table it’s going to be for a pair of substances. And that will also depend on temperature.

At a macroscopic scale it’s not clear why the rates mass transport and momentum transport would depend on temperature or the type of substances involved. But at a microscopic scale you can appreciate how different types of molecules would have different sizes and would move around at different velocities at different temperatures and how that would all play into the random movements of particles and the interactions between particles that produce, from that molecular scale, the emergent processes of diffusion and momentum transport that we observe at the macroscopic scale.

Once you open up that box, to see what is going on behind these proportionality constants, it opens up a whole new field of scientific work to develop – you guessed it – more and better models to qualify and quantify these phenomena.

Category Theory

Jakob and Todd discuss category theory, an important field in modern mathematics that focuses on the relations (morphisms) between mathematical objects. We discuss the importance of abstraction and the development in the history of mathematics beyond solving particular problems to studying the general nature of mathematical structures as such, the kinds of problems that can and can’t be solved, their properties, etc. We also consider the significance of a relation-centered approach to other fields, how things like languages, theories, and beliefs can be analyzed by the relations between their constituent elements.

For the visual aids referred to in the discussion see the video version on YouTube.

Stokes’ Beautiful Theorem: Differential Forms, Boundaries, Exterior Derivatives, and Manifolds

Stokes’ Theorem in its general form is a remarkable theorem with many applications in calculus, starting with the Fundamental Theorem of Calculus. The pattern in each case is that the integral of a function over a region is equal to the integral of a related function over the boundary of the region. We can use information about the boundary of a region to get information about the entire region, which is both useful and mathematically elegant.

For the visual aids that go with this episode see the video version on YouTube.

I’ve been reviewing some calculus recently but I’ve been approaching it in a different way than I have before, like when I learned it in school to be an engineer. Instead of studying it for the purpose of solving specific problems I’m looking at it more from a high-level, trying to understand it conceptually to see the general structure and patterns. It’s in line with my philosophical penchant to take things up a level or look behind things at another level of abstraction. One of the patterns I’ve seen across calculus has been something that falls under the generalized version of Stokes’ Theorem. I find the generalized Stokes’ Theorem quite beautiful, in that way that mathematics can be beautiful. It can express concisely and compactly a concept that has broad and varied applications in its more particular forms.

I’ll go through the generalized Stokes’ Theorem and some of its special applications. Since a lot of this is better understood visually I’ve made the YouTube video to go along with this so those listening to the podcast might want to check it out as well if this stuff is hard to picture.

The generalized Stokes’ Theorem states that the integral of some differential form of dimension k-1 over the boundary of some orientable manifold of dimension k is equal to the integral of that differential form’s exterior derivative over the whole of that orientable manifold.

The four key concepts here (the ones listed in the subtitle of this episode) are:

1. Differential forms
2. Boundaries
3. Exterior derivatives
4. Manifolds

That’s very abstract, not that there’s anything wrong with that. We need to be abstract to be general. The key concept, the idea I want to drive home with this episode, is that under certain conditions information about a boundary can give you information about the entire region that it bounds. In the generalized form we’re getting information about a whole manifold from the boundary of that manifold. That will be the case in all these examples. But now let’s look at examples to illustrate. Interestingly enough this comes into play from the very beginning of calculus, with the Fundamental Theorem of Calculus.

The Fundamental Theorem of Calculus

As a very quick crash course in calculus for the uninitiated, in calculus the two most important operations are derivatives and integrals. When you take the derivative of a function it produces another function that tells you the instantaneous rate of change of the original function at any point. So for example, if you have an equation for distance from some starting point with respect to time, the derivative of that function will tell you what the velocity is at any point in time. Very useful. You can also do the opposite of that, which is an antiderivative, or integral. Say you were starting with the function of velocity with respect to time. You could take the antiderivative of that function and get a function for the position at any point in time. You’d just need to know what your starting point was and add that to it. One of the most important applications of these operations uses the Fundamental Theorem of Calculus.

The Fundamental Theorem of Calculus is probably the most recognizable thing to calculus students, even if they don’t remember that that’s what it was called. It’s the principle behind finding the area under a curve. For example, if you have a function for the velocity with respect to time plotted on a graph, the area under that curve, between the curve and the horizontal axis, sweeps out an area that will give you the value of the distance traveled between any two points in time you choose. The cool thing is that you only need to know the values of the antiderivative at your starting time and at your ending time. You don’t need to know the values in between them.

So finding the area under a curve, finding it analytically as opposed to numerically, is really quite a remarkable thing if you think about it. Because you’re basically taking some function, doing an operation on it, and then applying the resulting function only to the two points bounding the region you’re interested in. If you’re integrating from point a to point b you’re only paying attention to points a and b, not to any of the points in between them. But you’re still getting information about the whole region. You’re getting the area under all those points in between a and b. I think that’s quite remarkable. And that’s what happens in each particular version of Stokes’ Theorem. You’re able to get information about an entire region from its boundaries.

We might forget this sometimes if we’re using numerical methods more than analytical methods of integration. Using numerical methods like the trapezoidal or the rectangle method we actually do go in and add up all the regions between the boundaries that approximate the total area under the curve. But for analytical integration we don’t need to do that. We only need the antiderivative and the boundary points. And that tells us everything about the region in between those points. It seems almost magical.

Circling back to those four key concepts in the generalized Stokes’ Theorem, with the Fundamental Theorem of Calculus what we have is a differential form of dimension 0 and a manifold of dimension 1. For a function, lower case f(x), the antiderivative, upper case F, is the (0-form) differential form. The closed interval from a to b is a 1-dimensional manifold. (For the sake of simplicity think of a manifold as a surface of some dimension. A 1-dimensional manifold here being a line.) The boundaries a and b are 0-dimensional; they’re points. 

 And the exterior derivative is lower case f(x)dx.

Green’s Theorem

What other applications are there of this generalized Stokes’ Theorem? There’s also Green’s Theorem. Green’s Theorem has a similar form to the Fundamental Theorem of Calculus but instead of looking at a curve bounded by points we’re looking at a plane region bounded by a curve. With the Fundamental Theorem of Calculus we have the integral of a 0-dimensional differential form over the boundary of a 1-dimensional manifold. With Green’s Theorem we have the integral of a 1-dimensional differential form over the boundary of a 2-dimensional manifold.

As with all these theorems we’re looking at, in Green’s Theorem we are able to determine features of a region by looking at its boundaries. If we have a closed curve C that surrounds a region D we can figure out the area of D from the closed curve C. This is actually how planimeters work. And planimeters are pretty cool. A planimeter is a device that you can use to trace out, with a mechanical arm, a curve of any shape, and when you return to the position you started at it calculates the area that that shape encloses. This is exactly what Green’s theorem does.

So now to state Green’s Theorem: Say we have curve C, and functions M and N defined on a region containing D, the region enclosed by C. Green’s Theorem states that the double integral of partial derivative of M with respect to y, minus the partial derivative of L with respect to x, δM/δy – δL/δx over a region D is equal to the line integral of Ldx + Mdy over the curve C. This is that perimeter to area connection, where we can get an area, normally found using a double integral, from a perimeter, found using a line integral. The differential form is Ldx + Mdy. The region D is the two-dimensional manifold. The curve C is the 1-dimensional boundary of the manifold. And the exterior derivative is δM/δy – δL/δx.

The Divergence Theorem

We can bump all this up another dimension to get the Gauss’s Theorem, also known as the Divergence Theorem. With Green’s Theorem we have the integral of a 1-dimensional differential form over the boundary of a 2-dimensional manifold. With the Divergence Theorem we have the integral of a 2-dimensional differential form over the boundary of a 3-dimensional manifold.

With the Divergence Theorem we’re able to get information about a 3-dimensional region from it’s 2-dimensional boundary. Call the 3-dimensional region V and the 2-dimensional surface S. Then we have a vector field F. The Divergence Theorem states that the triple integral or volume integral of the divergence of vector field F over the 3-dimensional region V is equal to the surface integral of F over the surface S.

In terms of the generalized Stokes’ Theorem, with the Divergence Theorem the differential form is F·dS. The space E is the 3-dimensional manifold. The surface S is the 2-dimensional boundary of the manifold. And the exterior derivative is the divergence div F dV.

One way to understand this is that the net flux out of the region gives the sum of all sources of the field in a region. And this has many physical applications. For example, the Divergence Theorem has application to the first two of Maxwell’s four equations in physics. All four of Maxwell’s Equations have an integral form and a differential form. But the integral and differential forms are really equivalent. For the first two of Maxwell’s Equations the Divergence Theorem shows the equivalence between these two forms.

The first of Maxwell’s Equations, Gauss’s Law, relates an electric field to its source charge. This is a perfect application for the Divergence Theorem because the divergence operator gives information about sources and sinks. And an electric charge is a source. Gauss’s Law states that the net outflow of the electric field through any closed surface is proportional to the charge enclosed by the surface. In the integral form the way this is expressed is that the surface integral of electric field E over an enclosed boundary is equal to the charge divided by the permittivity of free space. In the differential form the way this is expressed is that the divergence of the electric field is equal to the charge density divided by the permittivity of free space.

The Divergence Theorem tells us that the triple integral of the divergence of electric field E over a volume is equal to the surface integral of the electric field E over the surface boundary. Since by the differential form of Gauss’s Law the divergence of the electric field is equal to the charge density divided by the permittivity of free space, if we take the triple integral of both sides we see that the triple integral of the divergence is equal to the charge divided by the permittivity of free space. By the integral form of Gauss’s Law the surface integral of electric field E is also equal to the charge divided by the permittivity of free space. So both the surface integral of the electric field E over the surface boundary and the triple integral of the divergence of the electric field E are equal to the charge divided by the permittivity of free space, and so they are equal to each other, which is exactly what the Divergence Theorem says. So these two forms are actually equivalent. 

The second of Maxwell’s Equations, Gauss’s Law for Magnetism has a similar form but demonstrates that there are no magnetic monopoles. The surface integral of a magnetic field B over some surface S is always equal to 0. Magnetic field lines neither begin nor end but make loops or extend to infinity and back. Any magnetic field line that enters a given volume must somewhere exit that volume.  In the integral form the way this is expressed is that the surface integral of magnetic field B over an enclosed boundary is equal to 0. In the differential form the way this is expressed is that the divergence of the magnetic field is equal to zero. The Divergence Theorem tells us that the triple integral of the divergence of magnetic field B over a volume is equal to the surface integral of the magnetic field B over the surface boundary. If we take the triple integral of the divergence of the magnetic field this is still equal to zero, as is the surface integral of the magnetic field over the enclosed boundary. So again these two forms are also equivalent.

Kelvin-Stokes’ Theorem

The last of the particular applications of the generalized Stokes’ Theorem is also called Stokes’ Theorem or Kelvin-Stokes’ Theorem. With Kelvin-Stokes’ Theorem as with Green’s Theorem we have the integral of a 1-dimensional differential form over the boundary of a 2-dimensional manifold, but in R3, i.e. 3-dimensional space. Given a vector field F the theorem states that the double integral or surface integral of the curl of the vector field over some surface is equal to the line integral of the vector field around the boundary of that surface. Here again, the boundary gives us information about the region inside it. 

In terms of the generalized Stokes’ Theorem, with Kelvin-Stokes’ Theorem the differential form is F·dr. The surface S is the 2-dimensional manifold. The curve C is the 1-dimensional boundary of the manifold. And the exterior derivative is curl F·dS.

Curl is another vector operator, like divergence, and it’s easier to get the gist of it from physical examples, which we can get from the other two of Maxwell’s Equations.

The third of Maxwell’s Equations is also known as Faraday’s Law of Induction. Faraday’s Law describes how a time varying magnetic field creates, or induces, an electric field, which is the reason we’re able to generate electricity from turbines. In the integral form the way this is expressed is that the line integral of electric field E is equal to the negative derivative with respect to time of the surface integral of the magnetic field B. In the differential form the way this is expressed is that the curl of the electric field E is equal to the negative derivative of the magnetic field B with respect to time. Kelvin-Stokes’ Theorem tells us that the surface integral of the curl of the electric field E over some surface is equal to the line integral of the electric field E around the boundary of that surface. If we take the surface integral of the curl of the electric field E this is equal to the surface integral of the negative partial derivative of the magnetic field B with respect to time. And by the integral form of Faraday’s Law this is also equal to the line integral of the electric field around the surface boundary. So these two forms are also equivalent.

The fourth of Maxwell’s Equations is also known as Ampère’s Law. Ampère’s Law describes how a magnetic field can be generated by (a) an electric current and (b) a changing electric field. In the integral form the way this is expressed is that the line integral of magnetic field B is equal to the permeability of free space times the surface integral of current density J, plus the permittivity of free space, times the derivative with respect to time of the surface integral of the electric field E. In the differential form the way this is expressed is that the curl of the magnetic field B is equal to the permeability of free space times the current density J plus the permittivity of free space times the partial derivative of the electric field E with respect to time. Kelvin-Stokes’ Theorem tells us that the surface integral of the curl of the electric field E over some surface is equal to the line integral of the electric field E around the boundary of that surface. If we take the surface integral of the curl of the magnetic field B this is equal to the permeability of free space times the surface integral of current density J, plus the permittivity of free space, times the derivative with respect to time of the surface integral of the electric field E. And by the integral form of Ampere’s Law this is also equal to the line integral of magnetic field B around the surface boundary. So these two forms are also equivalent.

Maxwell’s Equations and Differential Forms

All of Maxwell’s equations actually simplify considerably in the language of differential forms. I’m just going to brush over this quickly without going into detail. We can describe both the electric and magnetic fields jointly by a 2-form, F,  in a 4-dimensional spacetime manifold. And we can describe electric current by a 3-form, J. Then we’ll need the exterior derivative operator, d, and the Hodge star operator, *. And Maxwell’s Equations are just:

dF = 0
d*F = J

That’s it. And one benefit of this is that thinking of the equations in terms of differential forms lets them generalize more easily to manifolds and relativistic settings.

To summarize, the general pattern with all these forms of Stokes’ Theorem is that the integral of a function over a region is equal to the integral of a related function over the boundary of the region. We can get information about an entire region from its boundary. And this is something that applies in interesting ways at different dimensions. Mathematically it’s aesthetically satisfying and elegant.

State Spaces, Representations, and Transformations

Sunny Y. Auyang gives a useful model for thinking about the way states and state spaces of objective reality are represented in ways accessible to us and how transformations between a plurality of representations imply not relativism but common states and state spaces that they represent.

I’ve been reading a fascinating book that’s been giving me lots of ideas that I’ve been wanting to talk about. I was thinking to wait until I had finished it but I changed my mind because there are some ideas I want to capture now. It’s one of the books I call my “eye-reading” books because I’m usually listening to a number of audiobooks simultaneously. And I don’t have much time to sit down and actually read a book in the traditional way. But I sometimes save space for one if it looks really interesting and it’s not available in audio. And that applies to this one. The book is How is Quantum Field Theory Possible?, written by Sunny Y. Auyang. I heard about it while listening to another podcast, The Partially Examined Life, which is a philosophy podcast. One of the guys on there, Dylan Casey, mentioned it in their episode on Schopenhauer. It peaked my interest and I knew I had to get it.

The part of the book I want to talk about today is a model she puts together to think about the different ways an objective state can be represented in our scientific theories. To the extent that our scientific models and measurements are conventional what should we think if they represent things differently? Are we condemned to relativism and the arbitrariness of convention? She argues that we are not gives a model that takes things up a level to see different representations from the outside, how they relate to each other through transformations and how they relate to the objective states that they represent. This is necessarily a philosophical project, particularly a question in the philosophy of science. It is to get behind the work of science itself to think about what it is we’re doing when we do science and what it means when we say that things are a certain way and work in a certain way, as described by some theory.

I’d like to give a brief overview of some of those concepts and the vocabulary Auyang uses. And this will just be to get the concepts in our head. John von Neumann had a very funny quip that “in mathematics you don’t understand things. You just get used to them.” Now, I think that’s an overstatement. But in a way I think it’s kind of helpful whenever we’re getting into a discipline that has lots of unfamiliar terms and concepts that can seem really overwhelming. I think it’s helpful to just relax and not worry about fully understanding everything right away. But to take time to just get used to stuff, which takes time. Eventually things will start to come together and make more sense.

So the first idea I want to talk about is a phase space or state space. A phase space is the set of all possible states of a system. That’s very abstract so I’ll start with a concrete example. Say we have a single particle. At any given time this particle has a position in three-dimensional space that we can specify with three numbers along three spatial axes. For example, you could have a north-south axis, an east-west axis, and an elevation axis. You can also add momentum to this. So a particle’s momentum would be its mass multiplied by its velocity. Mass is scalar quantity – it doesn’t have direction – but velocity is a vector, so it does have direction. And in three-dimensions the velocity has three components along the same spatial axes as position. So you can specify the particle’s position and momentum with six numbers: three numbers to give its position and three numbers to give its momentum.

The really cool move from here is that you can then make use of what’s called a phase space. So for a single particle with these six axes we’ve selected this is a six-dimensional space. This is also called a manifold. Don’t worry about trying to visualize a six dimensional space. It’s not necessary. Just go along with the idea that we’re using such a thing. This is an abstract space. It’s not supposed to represent the kind of space we actually live it with length, width, and height. Any point in this six-dimensional space represents a possible state of the particle. You can represent any combination of position and momentum as a point in this phase space. So for example, the 6-tuple in parentheses with the six numbers (0,0,0,0,0,0) represents a state where a particle is at rest and it is sitting at the origin of whatever spatial reference frame we’ve set up. And you can put in any set of numbers to get any possible state of that particle. If we’re looking at multiple states of this particle through time we can think of it tracing out a trajectory in this state space.

Now, here’s where things get crazy. You can add more than one particle to this system. Say we add a second particle. How many dimensions does our phase space have now? It has twelve dimensions because we have axes for the positions and momentum components for both particles in three-dimensional space. And then we’ll have a 12-tuple, twelve numbers in parentheses, to call out the state of the system. And you can add as many particles as you like. For whatever N number of particles we have in our system the phase space will have 6N dimensions. So you can imagine that dimensions will start to pile up very quickly. Let’s say we take a liter of air. That has something on the order of 1022 molecules in it; over a billion billion. The number of dimensions in our phase space for that system will be six times that. Now, in practice we’d never actually specify a state in this kind of system. With gases for instance we don’t worry about what’s going on with every single particle in the system. We use properties like temperature and pressure to generalize the average behavior of all the particles and that’s much, much more practical. But as a conceptual device we can think of this phase space underlying all of that.

In quantum mechanics the state space of a system is called a Hilbert space. So this is the space of all possible states of a quantum system. Then any particular state of the quantum system is represented by a state vector, usually written with the Greek letter phi: |φ⟩. When we run an experiment to get information about the quantum system we look at a particular property that is called an observable. And you can think of an observable as pretty much what it sounds like, i.e. something that can be observed. And this is associated mathematically with an operator. An operator, as the name implies, operates on a function. And there are all kinds of operators. There are operators for position, momentum, total energy, kinetic energy, potential energy, angular momentum, and spin angular momentum. One way to think of this is that with an operator you’re conducting an experiment to measure the value of some property type. Then the result of that experiment is some number. The name for the resulting value is an eigenvalue. So for all those different operators I just listed off they will spit out corresponding eigenvalues. But an eigenvalue is an actual value. So with a kinetic energy operator, for example, your eigenvalue will actually be a number for the value of kinetic energy in some unit for energy, like Joules or whatever your choice of units.

Recall that in our phase space for particles each dimension, and there were many, many dimensions, had an axis in that phase space. In quantum mechanics the state space, the Hilbert space, has a collection of axes that are called a basis. And the basis of a Hilbert space is composed of eigenstates. And we can think of this as the coordinate system, the axes, of the state place of the system. The eigenvalue is what we get when we run an experiment but one of the interesting things about quantum systems is that we don’t always get the same value when we run an experiment, even if we’re applying the same operator to the same system. That’s because a quantum system is a combination (more specifically a linear combination or superposition) of many eigenstates. And each eigenstate has a certain amplitude. As we repeat several measurements of an observable we’ll observe eigenstates with higher amplitudes more often than eigenstates with lower amplitudes. We can actually quantify this. For any given eigenstate the probability that it will be observed with a measurement of an operator is its amplitude squared. So amplitude is a very important property in a system.

So there are many similarities there between the phase space of the system of classical particles and the Hilbert space of a quantum mechanical system. I just wanted to give an overview of those to introduce and talk about the vocabulary in the interest of starting to “get used to it” as von Neumann said, even if that’s a long way from having a comprehensive understanding of it.

Having laid that groundwork down I want to summarize this section of the book where Auyang introduces a model to analyze the relationship between the objective state space of system and its representations in different scientific theories. The objective state space is what is “out there” independent of our observations or awareness of it. The representations are what we interact with. We could definitely invoke Immanuel Kant here with his concepts of the “thing in itself”, that he calls the “noumena”, and the “phenomena” that we experience of it. And Auyang definitely draws on Kant repeatedly in her book.

There’s a figure she refers to over several pages and I’ve posted this on the website. But for those listening on the podcast I’ll try to describe it in a way that hopefully isn’t too difficult to follow. In her diagram she has three boxes. The top box is the state space, “M”. So that’s the set of all possible states of a system. Then in this state space there’s one state, “x”. x is what is objectively out there, independent of our observations and theories of it. But we don’t observe or interact with x directly. What we observe are the representations of x. And those are the lower two boxes.

These lower two boxes are fα(M) and fβ(M). These are the representations of certain properties of state space M. fα and fβ are property types that we could be looking for and then fα(M) and fβ(M) are the possible representations we can find when we run experiments to measure for those properties. Inside each of these lower boxes is a smaller box for the representation of the single objective state x. So these would be fα(x) and fβ(x). These are the definite predicates or values we get from our experiments. These are the things we come into contact with.

To tie this back to quantum mechanics real quick, in the quantum mechanical case the way this general picture would play out is that M would be a Hilbert space, x would be a state vector (x being one state in that state space), fα would be an observable, fα(M) would be a representation, and fα(x) would be an amplitude of x in the basis of the observable fα.

What’s important to understand here is that the values measured by the two representations are not equivalent. Someone trying to get at the objective state x in the objective state space M from fα will see something different than someone trying to get at it from fβ. One will get fα(x) and one will get fβ(x). Which one is right? Well they’re both right. But what does that mean? It depends on how much of this picture we see.

So we’ll look at parts of this model in pieces before we get back to the whole, comprehensive picture. But first I want to make another comparison because this is all quite abstract. I think it’s helpful to compare this to sentences in different languages. Say we have a sentence in English and Spanish. In English we say “The dog eats his food” and in Spanish we say “El perro come su comida”. These are different utterances. They sound very different. But we want to say that they mean roughly the same thing. We can translate between the two languages. And people can respond to the utterances in similar ways that indicate that there is something in common to both. But whatever it is that is common to both is not expressible. We only express the sentences in particular languages. But because they are translatable into each other it makes sense to think that there is some third thing that is the meaning the both share.

OK, so keep that example in mind as we get back to physical theories and the objective states they represent. Looking at our model again say we look only at one of the lower boxes, fα(M). In this picture as far as we’re concerned this is all there is. So one thing to say about this is that the meaning of fα(x) is what Auyang calls “unanalyzable”. And why is that? Well, it’s because fα(x) is “absolute, self-evident, and theory-free”. It’s just given. There is no objective state space M that fα(M) is representing. Rather fα(M) is the immediate bottom level. So there’s nothing here to analyze. We don’t have to think about the process of representation.

OK, well let’s add the second lower box, fβ(M). So now we have just the two lower boxes but still no objective state space M. What do we have now? Well we have plurality. There are multiple representations of the same thing and we don’t have a way of knowing which one is true. And neither can we say that they point to one common thing. So this gets to be a very confusing situation because we have both plurality and unanalyzability. Plurality in that we have two different values representing a state, fα(x) and fβ(x). Unanalyzability because, as with the previous view with only the one box, there’s not objective state space that either of these correspond to. No process of representation to analyze here. What we have are conventions. This is the kind of picture Thomas Kuhn gives in his book The Structure of Scientific Revolutions. And this is a picture of relativism. The conventions are incommensurate and the choice among them is arbitrary. I think it’s fair to say that there’s much that’s unsatisfying with this picture.

Well, now let’s add the top box back in so we have everything. This brings what I’d consider an explanatorily robust conceptual device. As Auyang says, “the introduction of the physical object whose state is x not only adds an element in our conceptual structure; it enriches the elements discussed earlier,” fα(M) and fβ(M). In other words. fα(M) and fβ(M) look a lot different with M than without it. And I’d say they also make a lot more sense.

For one thing, the picture is no longer unanalyzable but analyzable. We understand that there is a process of representation occurring when we collect numerical data from experiments. When we look at property types fα and fβ we understand that these both represent M in different ways. As Auyang says, “Various representations can be drastically different, but they represent the same object.” She gives a concrete example: “The same electromagnetic configuration that is a mess in the Cartesian coordinates can become simplicity itself when represented in the spherical coordinates. However, the two representations are equivalent.” What’s key to understand here, and what makes this third, fuller picture, more powerful and coherent is that there is one objective state space, one object that the various representations point back to. So we circumvent relativism. The picture only looks relativistic when we only have the partial view. But when we see state space M and that fα(M) and fβ(M) map onto it we can appreciate that even though fα(x) and fβ(x) are different, they both correspond to one objective state x.

Another important thing to consider is that there is a transformation between fα(M) and fβ(M) that Auyang calls fβ•fα-1. The transformation is the rule for transforming from representation fα(M) to fβ(M). That there is such a transformation and that it is possible to transform between representations arguably evinces the existence of the objective state space that they represent. As Auyang states: “Since fα to fβ are imbedded in the meaning of [fα(x) to fβ(x)], the transformation fβ•fα-1 connects the two representations in a necessary way dictated by the object x. fβ•fα-1 is a composite map. It not only pairs the two predicates [fα(x) to fβ(x)], it identifies them as representations of the same object x, to which it refers via the individual maps fα-1 and fβ. Since fβ•fα-1 always points to an object x, the representations they connect not only enjoy intersubjective agreement; they are also objectively valid. To use Kant’s words, the representations are no longer connected merely by habit; they are united in the object.” This is related to the example I gave earlier about two sentences in different languages as representations of a common referent.

And I just think that’s a lovely picture. One of my favorite thinking tools is to take things up a level to try and see things that weren’t observable before. Like how you can see more from an airplane than when you’re on the ground. It’s not that the way things are has changed. But we see more of it. And with a more complete pictures the parts that seemed random or even contradictory make more sense and work together as a rational system.

So that’s one bit of the book I wanted to talk about. There are a few other things I’ve read that I want to talk about later too. And I’m only about halfway through. And if it continues to be as jam-packed with insights as it has been up to now I’m sure there will be more I’ll want to talk about.