The retina's architecture has been thoroughly mapped. As summarized vividly in Richard Masland's 2010 Proctor Lecture,
1 we know most of its roughly 60 neuron types and their arrays on a scale of millimeters.
2 We also know many of their synaptic circuits down to a scale of micrometers,
3–6 —and ion channels,
7 receptors, and synaptic vesicles down to a scale of nanometers.
8,9 And we know many of the functional responses.
6,10,11 Thus, across a scale of one-million-fold—the US mapped down to your house lot—we know the retina's basic design (
Fig. 1).
Some will object to this term, because it implies a designer. But to Webster, “design” is simply “outline showing the main features of something to be executed.” So now we can ask, just what is this “something to be executed”? What is the retina “for”?
Obviously, it is for processing photoreceptor signals, but consider a broader context (
Fig. 2). Olfactory receptors couple directly to a spiking axon and so do touch receptors; auditory receptors use one synapse to drive a spiking axon. But photoreceptors require two layers of processing by a substantial chunk of brain before finally sending a spike. This neural investment implies some big problem to be solved. So if we could identify the problem and grasp how the retina solves it, we might find some core principles that govern retinal design.
The problem is that a cone in daylight captures information at tremendous rates, approximately 10,000 quanta per second. This creates a finely graded voltage that travels passively to the synaptic terminal, ∼10 μm away. Conceivably, the cone might extend an axon over centimeters into the brain, but the passive signals would decay with a space constant of approximately 1 millimeter.
12 To solve this problem, the cone axon could express voltage-gated Na channels and send its own spikes. However, to requantize the input would require the same number of events at the output—10,000 spikes per second—but this is 100-fold greater than the brain's highest mean spike rate. This suggests the retina's key purpose: it is to edit and recode the cone signal in order to transmit essential information at lower spike rates.
But how low? And what principles govern the design?
Consider this example (
Fig. 3). A brief flash delivers ∼10
9 photons to a patch of cones. These isomerize only 10
7 cone opsin molecules, reducing quanta by 100-fold. The cone synapses release only 10
5 vesicles, reducing quanta by another 100-fold. This we know from horizontal cell recordings.
13 Next, the bipolar cell synapses achieve a radical transformation: they collapse the tonic vesicle rate to nearly zero! Now they release quanta in small bursts, well-timed to the pattern's onset. Approximately 100 quanta delivered to a ganglion cell suffice to reliably trigger one spike.
14 In short: a pattern reaching the photoreceptors as 10
9 events is compressed by retinal circuits to a
single event—one spike. And this is sufficient to be detected behaviorally (reviewed in Borghuis et al.
13 ).
Considerable information is discarded, and this reduces sensitivity. As the photon rate steps down by 100-fold at the cones, sensitivity falls by 10-fold (
Fig. 3). This is the “square-root law,” which determines the signal-to-noise (S/N) ratio when it is based on random processes, such as photon arrival.
15 Yet, vesicle release at the cone synapse is also random, but the 100-fold decrease in rate reduces sensitivity by only ∼4-fold. And the more than 100-fold decrease in rate at the bipolar synapse reduces sensitivity by only ∼2.5-fold. Since the losses are multiplicative, overall neural loss at the retinal output is ∼10-fold. These results are from guinea pigs, but a similar result has been reported for primates.
16 This matches behavioral sensitivity, indicating that once retinal signals reach the brain, no additional information is discarded (see Borghuis et al.
13 ).
The reason that these stages outperform the square-root law is that neural circuits filter the signal to discard what is least informative, thus preserving signaling capacity for what is most informative. The cone terminal removes high and low frequencies, and the bipolar terminal initiates “sparse coding.”
The sources of two filtering operations can be visualized in a slice through a cone terminal (
Fig. 4). Although it is isolated from neighboring terminals by glia, where the glia part, it couples to neighbors via gap junctions.
17,18 These attenuate high frequencies, which are mostly noise, allowing the terminal to use its lower quantal rate to transmit more signals.
19 Also, a large component of this cone's signal is shared with neighbors, due to correlations across the visual scene. Horizontal cells measure this redundant component by sensing every synaptic vesicle (
Fig. 4). The horizontal cell sums these signals across the patch of 1000 cones to compute the mean and then subtracts it by feeding back negatively to the cone (reviewed by Sterling
3 ).
Thus, the cone terminal transmits only differences from the mean—that is contrast, which is nearly as informative as the full signal and can be quantized with far lower vesicle rates. Each vesicle releases a nano-puff of glutamate, approximately 2000 molecules, which diffuse in the cleft to bipolar dendrites, where they are detected by glutamate receptors. This specifies a purpose for the outer synaptic layer: discard noise and redundant signals to reduce the quantal rate by 100-fold with only 4-fold loss of sensitivity. The next stage reduces the quantal rate still further, while losing even less.
A pattern comprises negative and positive contrasts (
Fig. 5). Both components are encoded by the cone voltage and also by both bipolar cell voltages. However, the continual excitation by glutamate from cone synapses holds the bipolar cells near −45 mV, where the voltage-gated Ca channels in their terminals are mostly closed.
20 This largely suppresses tonic vesicle release, so the terminals are silent. Only when a negative contrast depolarizes one class or a positive contrast depolarizes the other class does one or the other produce a calcium current. Then vesicles are released in a burst that evokes a ganglion cell spike.
14 This begins “sparse coding”: low tonic rate plus brief bursts.
21
This step—sending only half of the total pattern by each bipolar class—reduces quantal rates by more than half. Consequently, using parallel channels to send information at lower rates conserves neural resources (see
Fig. 10). Moreover, because negative contrasts are more frequent than positive contrasts in natural scenes, this scheme allows a better match of neural resources to the available information. This explains why the retina employs more OFF than ON bipolar cells
22 and ganglion cells.
23 The rectification (separate pathways for negative and positive contrasts) is incomplete because the ON pathways retain, via clever circuits, some capacity to encode negative contrasts.
24,25 Moreover, these circuits allow the ON pathway to increase coding efficiency in the OFF pathway (Liang Z and MA Freed, unpublished observations, 2012). These circuit features serve the general principle: apportion more neural resources to encode the richer sources of information.
26,27
The OFF and ON bipolar cells receive identical glutamate puffs but respond with opposite polarity. The difference is that the OFF glutamate receptors open a cation channel, whereas the ON glutamate receptors close a cation channel.
28 This paring down of information per neuron continues as OFF and ON channels express subtypes (
Fig. 6).
This figure shows nine subtypes of bipolar cell whose dendrites collect information from the same cone. They all share the contents of every vesicle (
Fig. 6, upper
29 ). These types divide the range of temporal frequencies, further reducing information per neuron. Each type, to get its share of information, places dendrites at a particular distance from the release sites, as illustrated for the ON types in
Figure 6 (lower left). Then, as a nano-puff of glutamate spreads out by diffusion, each ON type sees a different pulse (
Fig. 6, lower right
30 ). The high, fast pulse carries more information, and the low, slow pulse carries less. Glutamate receptors on each type optimize their binding constants, recovery times, and numbers for these pulses and thus encode information at different rates.
8
In short, each synaptic vesicle transfers information to all nine bipolar types, but unequally—to each a particular rate. This impressively efficient mechanism, where a nano-puff of glutamate filters information for nearly a dozen neuron types (bipolar plus horizontal cells), shapes all subsequent circuits in retina and beyond. Bipolar types with lower information rates use fewer outputs and supply upper ON strata (
Fig. 7, left). Types with higher information rates use more outputs and supply deeper strata.
31 Now we can interpret Cajal's memorable figure: ganglion cells stratify in order to select different information rates (
Fig. 7, right). Each type, carrying only part of the total bandwidth, can reduce its spike rate.
Consider a large ganglion cell, the brisk-transient type, with 5000 contacts from high-rate bipolar cells versus a small ganglion cell, the local-edge type, with 500 contacts from low-rate bipolar cells
32 (
Fig. 8, upper). In the next panels, an intact retina watched a nature video while bipolar quanta were recorded as excitatory postsynaptic currents in ganglion cells. A fast feature from the video evoked a burst of quanta from high-rate bipolar cells, causing spikes in the brisk-transient ganglion cell. This feature failed to drive low-rate bipolar cells, so the local-edge ganglion cell was silent. However, a low frequency feature—an “edge” going dim then bright—triggered a burst of bipolar quanta that evoked spikes.
When various ganglion cell types watched the same video, each responded with a characteristic firing pattern, for which it is named. In
Figure 9, the camera jumped across a natural scene to mimic saccades. The brisk-transient cells fired brief bursts and the brisk-sustained cells gave prolonged responses, both at high mean rates. The direction-selective and local-edge cells also fired in characteristic patterns but at low mean rates. When the camera moved smoothly to mimic optic flow, the result was similar. So, despite different motions and different scenes, response patterns are similar within a type. Initially this seems surprising, but reflecting further, one realizes that it could be no other way: a filter built to extract a certain feature from natural scenes must always “see” the same thing!
The value of these filters is that each downstream user needs to know something
particular, for example, slow motion in a particular direction. If its ganglion cell supplier can discard all information that is
irrelevant to that specific need, such as higher frequencies and other directions of motion, it can send far fewer spikes. This is one key task for amacrine circuits: to carve away all that is unneeded—in the spirit of Michelangelo (
I saw the angel in the marble and carved until I set him free). This action for each of 20 ganglion cell types probably explains much of the amacrine cells' great diversity.
1 This carving reduces the local-edge firing rate to half of the brisk-transient rate (
Fig. 9). Note that each spike carries approximately 2
bits, the physicists' measure of information. This connects spike rates to physical laws of information transmission (reviewed by Balasubramanian and Sterling
27 ). Because the local-edge array is denser, it sends nearly twice the information as the brisk-transient array. And, in general, the low-rate types send nearly two-thirds of the total information traveling down the optic nerve.
33
Because low-rate ganglion cells are most numerous, firing rates distribute asymmetrically, peaking near 4 Hz and tailing off sharply (
Fig. 10), and as it turns out, axon diameters distribute the same way.
34 Thus, low rates can go over thin fibers, but high rates demand thick ones. This explains why most optic axons are thin, which is fortunate indeed because cross-sectional area and volume rise as the diameter squared. Therefore, if most axons were thick, an optic nerve with 10
6 fibers would be huge. Mitochondrial concentration in axoplasm is constant with fiber diameter.
34 Therefore, as axon volume rises as diameter squared, so does energy capacity.
From these distributions, one can construct a cost function (
Fig. 10): information in bits per second versus axon volume and energy. The curve is steep for low information rates but then flattens, showing a law of diminishing returns; that is higher information rates are disproportionately expensive in space and energy. The obvious design goal would be to stay on the steep part of this curve, and that is exactly what the retina achieves. Auditory fibers, which send spikes directly to the brain, use 10-fold higher mean rates. Correspondingly, they operate high on the cost function and require 100-fold more space and energy (
Fig. 10 35 ).
In conclusion, we can identify the retina's purpose: to capture images at high event rates and recode to lower rates. The outer retina discards what is generally unneeded (noise and the mean). It creates two bipolar classes that mutually invert the contrast signal and subtypes that collect different information rates. The inner retina reduces tonic event rate to nearly 0 and halves the information per neuron (negative or positive contrasts). It also sparsifies the code at the bipolar output, which produces sparse coding in the ganglion cell that carries forward to the brain. The inner retina also carves away what is specifically unneeded by each ganglion cell.
We now recognize a key constraint on design: higher information rates cost disproportionately. So the retina tries to operate low on the rate versus cost function. Therefore, it obeys two principles (among others; Sterling P, Laughlin S. unpublished observations, 2013: send only information that is needed; send information at the lowest rate acceptable to each downstream user.