A Dream

June 2018

Disheartened, you sit alone in a darkened lab, staring at a glowing laptop. Signals pulse along a simulated network on the screen. You have spent the last three months painstakingly building a simulation of a complicated cellular process, the pathway leading from insulin detected at surface of a cell to changes to its metabolic functions. Now you have hit another bug: after correcting one parameter value, the simulation is again doing something miserably wrong.

You let your mind wander, looking down at your hands, thinking about the cells in your own body. Somehow, right there in front of you, the jumble of biochemical reactions you have been trying to understand is mindlessly churning away, transforming signals received by each cell into appropriate responses. But the workings remain hazy and mysterious, involving so many interacting parts that even your simplified simulation has a baroque character reminiscent of Rube Goldberg. It seems amazing that such precisely adapted, robust behavior can be the result of dynamics distributed over this mess of a network.

You have lighting fast access to a database chock full of data. You have the latest machine learning software and access to a giant computing cluster in the cloud. Yet in your late-night funk the problem seems insurmountable: Even once you have fixed this bug to get the simulation behaving in the right way, how do you understand how the collective actually functions as a whole? You have been working on this for months, and still you can barely keep the entire system in your head at once.

Sighing, you let your head fall to the keyboard. Maybe a quick rest... You nod off, imagining the biochemical soup as a computer in disguise, transforming inputs like insulin levels into the expression of particular genes. Like neurons transforming sensory input into coordinated motor output; like a flock of birds amplifying the detection of a predator. How do we understand the logic of these distributed computers?

You jolt awake as someone taps you on the head. You look up into a spooky mirror image of your own face, with a few more wrinkles and grey hairs—a faithful copy of yourself, aged ten or fifteen years.

The face smiles. "Don't panic. I come from a future time, when technological and conceptual advances are moving us toward answering your questions. I thought you might need some inspiration."

You are handed a thick sheet of paper impossibly active with appearing and disappearing inky designs. A slider interface at the edge of the page is the only static element.

"Here's an old USB adapter—give it some data." Dumbfounded, you plug the USB stick into your laptop and upload a few thousand measurements of the biochemical state of cells, each measuring hundreds of types of proteins and genes. A white LED blinks. The paper refreshes to display "Auto-inferrer," and you tap to confirm: "System input: insulin concentration. System output: glucose concentration. System type: protein interaction network." A progress bar appears and marches quickly.

"This should be quick with such a small amount of data." Your doppelgänger winks at you.

A detailed network representation appears—you guess there is one node for each of the hundreds of measured proteins. You see the resemblance, but it is more refined than the handmade version on your primitive looking laptop screen. The nodes pulse with colored ink.

"It's demonstrating now that the model can reconstruct the data you fed it. You haven't measured every protein species in the cell, so of course the connections you see are effective, phenomenological.
But you can predict the effect of perturbations." Your twin reaches down and presses on one of the nodes, and it gradually darkens, affecting in subtle ways the broader network patterns. "But that's just for show. Here's the real magic." Your future self holds up the sheet and points at the slide bar running up the page. "Try it."

You nudge the slider up a bit. The network bubbles and distorts, forming a column of three large circles connected by thick arrows. The original nodes are faintly visible inside each circle. "Aha, a simpler representation!" Of course! you think. You had been on the verge of hypothesizing this yourself: basically, the genes separate into three modules that form a cascade to transmit the signal. Your double's finger points at a statistic written to the side: "You can see that this simplified version still captures 90 percent of the variance in the input-output mapping."

Automated dimensionality reduction! Impressed, you excitedly push the slider all the way to the top. The network disappears, replaced by a plain-looking plot. "The plot shows a single curve that branches into two. As the simulation progresses, a dot marches from the single curve side to fall onto one of the two curves on the right. "Looks like you've got a symmetry breaking circuit." Pressing on the upper curve produces a representation of the three-state system with the final group active, and the lower curve shows it inactive. "Oh, it's beautiful!" you exclaim. "The two stable output states arise from a bifurcation—and we can immediately see how they correspond to aggregate states of the individual genes!"

Your new friend smiles and continues to show off what the gadget can do on more complicated datasets. Using behavioral data from a fish school, the abstract setting shows a critical instability and demonstrates how information about predators is maximized when the school is closer to the instability. Using measurements from millions of simultaneously measured neurons, you see the neurons grouped into functional clusters that create an aggregate level gated sensory classifier. Using insulin expression and other cell-phone measured health data consisting of a few noisy data points for each of millions of patients produces patient-specific predictions and proposes the existence of important unmeasured variables at the most detailed setting, blames oscillatory dynamics on a Hopf bifurcation at the abstract setting, and suggests perturbations that promote or inhibit the oscillations.

Your friend is on a roll: "And you see how this combination of statistical inference and model selection can start to answer questions about what these distributed systems are doing and how they work? How we are able to look both at the individual scale dynamics and the aggregate scale logic? We quickly extract in each case how individual components control the behavior of the whole. We begin to understand what is carefully tuned and what is compensated for through active adjustment—the feedback that allows collective systems to remain adaptive and informative!"

You inspect the front and back of the paper-thin sheet and get caught up in the excitement. You stutter, "But—how does this work?"