Abstract
Purpose:
Perception necessarily entails combining separate sensory estimates into a single coherent whole. The perception of three-dimensional (3D) motion, for instance, can rely on two binocular cues: one related to the change in binocular disparity over time (CD) and the other related to interocular velocity differences (IOVD). Although previous work has shown that neither cue is strictly necessary for the perception of 3D motion, observers are able to judge 3D motion in displays in which one or the other cue has been eliminated, it is unclear whether or how the two cues are combined in situations in which both are present.
Methods:
We tested the visual performance of a sample of 81 individuals (Mage = 20.34, 49 females) in four main conditions that measured, respectively, static stereoacuity, CD, IOVD, and combined CD+IOVD sensitivity.
Results:
We show that the sensitivity to the two binocular cues to 3D motion varies substantially across observers (CD: Md′ = 1.01, SDd′ = 1.1; IOVD: Md′ = 1.16, SDd′ = 1.03). Furthermore, sensitivity to the two cues was independent across observers (r[48] = 0.12, P = 0.42). Importantly, however, observed CD+IOVD performance was well-predicted based on the assumption that each observer combines the two cues in a statistically optimal fashion (r[79] = 0.75, P < 0.001).
Conclusions:
Our findings provide an explanation for the previously puzzling variability found in 3D perception across observers and laboratories, with some results suggesting that motion-in-depth percepts are largely determined by changes in binocular disparity, whereas others indicate that interocular velocity differences are key. Our results underline the existence of two complementary binocular mechanisms underlying 3D motion perception, with observers relying on these two mechanisms to different extents depending on their individual sensitivity.
Our sensory systems provide us with a host of independent measurements about objects in the world. One of the key challenges of the perceptual system is thus to rationally combine these disparate estimates into a reasonable whole. This is true whether the estimates arise via different sensory modalities (e.g., combining auditory and visual estimates of an object's location
1) or come from a single sensory modality (e.g., combining monocular and binocular visual estimates of the slant of a surface
2). In the case of three-dimensional (3D) motion, in addition to a number of monocular cues, there are two binocular cues that can contribute to perception: changing binocular disparities and interocular velocity differences.
3 Under natural viewing conditions, changing disparity (CD) and interocular velocity differences (IOVD) tend to co-occur, with the primary functional difference arising due to a difference in the order of operations. In CD, binocular disparity for a feature is computed first, followed by computation of the change in disparity over time; in IOVD, change in monocular feature position is computed first, and the difference in velocity is computed subsequently
3,4 (cf. fig. 1 of Nefs et al.
5; also fig. 1 of Peng and Shi
6).
The relative importance of these binocular cues to 3D motion perception has been debated in recent years, with some researchers claiming that 3D motion perception largely depends on disparity-based cues,
4–7 whereas others have argued that there is a considerable role for velocity-based cues.
8–13
Neurophysiological studies have not yet been able to adjudicate between the possibilities. Indeed, it is not obvious from the neurophysiology what course the visual system takes, as both binocular disparity and monocular direction are processed at multiple cortical sites.
14–21 For example, both types of motion-in-depth information seem to be processed in cortical area hMT+,
20 with IOVD being the main driver of motion-in-depth selectivity.
21 At least one study reports CD signals may be processed in a cortical area directly anterior to hMT+ in the later occipital complex.
17
Our goal here was thus to identify the contribution of both binocular cues to 3D motion perception and determine if and how these cues are combined. We therefore tested the visual performance of a large sample of individuals in four main conditions that measured, respectively, static stereoacuity, CD, IOVD, and combined CD+IOVD performance. The relationship between CD and IOVD sensitivity both within and between observers demonstrates the degree to which these cues are processed independently. Furthermore, the relationship among CD, IOVD, and combined CD+IOVD sensitivity gives insight into how these cue sensitivities are used when a stimulus contains both cues (as is typically the case).
Static.
Dynamic.
We assessed sensitivity to 3D motion by using three versions of a dynamic 3D stimulus in which specific cues to 3D motion (changes in disparity and interocular velocity) could be isolated. In all stimuli, configuration of the display was similar to that described above for the static condition (extent, distribution, and contrast of dots), with the exception that the dots in the two arrays moved, indicating opposite directions of motion-in-depth (toward and away from the observer). On the first frame of each trial, one of the arrays was randomly selected to appear behind the plane of fixation while the other array was presented in front of it (at 0.125 degrees of crossed/uncrossed binocular disparity). The arrays moved in opposite directions in depth at a speed of 0.25 degrees per second for 1 second, so that one array started 0.125 degrees in front of the plane of fixation and receded to 0.125 degrees behind the plane of fixation (and vice versa for the opposite array) on each trial. The array of dots that was presented behind fixation always approached and the array presented in front of fixation receded. Participants reported which dot array appeared to move toward them.
Changing Disparity Cue Stimulus.
Interocular Velocity Difference Cue Stimulus.
Combined Cues Stimulus.
A Tumbling E task was used to measure participants' visual acuity at 5° and 15° of eccentricity (measured in separate blocks), which provides a measure of peripheral acuity. During the task, an “E” appeared either to the left or right of fixation, at which point the participant responded at which direction the E was facing by using the arrow keys (four cardinal directions). After each trial, participants received audio feedback as to whether or not they answered correctly. The stimulus size was controlled via a 3:1 staircase (i.e., after three correct responses the stimulus was reduced in size, after one incorrect response the stimulus was increased in size). The stimulus was changed by 50% during the first 20 trials, by 30% for the next 20 trials, and by 20% for the final 40 trials (80 trials in total). The task at each eccentricity (5° and 15°) took approximately 4 minutes. A short practice (approximately 30 seconds) was completed before the experimental Tumbling E task (at 5 degrees).
The measures of visual acuity, speed of processing, and SOA were not significantly correlated with any of the stereo measures (all P > 0.30).
Before the stereo experiment, participants completed 20 practice trials of the CD+IOVD cue condition, with audio feedback on whether or not they answered correctly. The practice trials used the combined CD+IOVD cue condition so that participants had equal prior experience with all cues. Participants always completed the CD+IOVD cue stimulus block next; the order in which participants completed the other three conditions (static, CD, and IOVD) was randomized among participants.
We estimated observer sensitivity by computing d′ as the z-score of hit rate (upper array moved toward, observer reported “up”) minus the z-score of false-alarm rate (lower array moved toward, observer reported “up”), divided by √2. We adjusted hit rates of 100% down to the next highest possible score (99%) in accordance with a 1/2N adjustment (see Ref. 30, under “general comments”). Likewise, we adjusted 0% false-alarm rates up to the next lowest possible score (1%).
Most psychophysical investigations into stereo-blindness and stereo-anomaly report stereo-blindness in between 1% and 14% of participants.
31,34–38 A recent, carefully controlled large-cohort investigation
38 reports stereo-blindness in 2.2% of their participants. These accounts are markedly lower than the 37% of participants we classify as “stereo-anomalous” for static disparity in the current study. This discrepancy can likely be explained by the wide variability in criteria, stimuli, subject training, and task properties across these studies. For instance, to ensure that the different stimulus conditions were equated, we used the same disparity ranges and dot densities across all conditions, and limited stimulus presentation time to 1 second. Conversely, typical clinical assessments of stereo acuity provide much longer or even unlimited viewing time. Large improvements in stereo test performance after “encouraging” participants to “tune in” to the stimulus have been reported.
38 On the other hand, psychophysical assessments of stereo-acuity thresholds
47–49 typically produce thresholds less than 1 minute of arc, and one might thus have expected performance to be at ceiling for all our non–stereo-blind observers. However, we would like to emphasize that such psychophysical studies typically rely on a small number of highly experienced observers, and it is known, although unfortunately not frequently reported, that performance for truly naïve observers can initially fall well short of such performance levels. Indeed, in initial piloting, using disparity values and stimulus presentation times informed by expert observer performance, we found near-floor performance for the vast majority of naïve observers.
Although we did expose participants to 20 training trials (in which they received feedback) with the combined CD+IOVD cue stimulus before the experiment, we cannot be sure this feedback helped “tune in” their stereovision. We do not observe a significant increase in performance when comparing the first and last 50 trials of the CD+IOVD blocks across observers, suggesting significant perceptual learning did not occur during this task. Accordingly, because performance on stereo (3D) tasks strongly depends on threshold criteria as well as task and stimulus properties, we opted to use the term “stereo-anomalous” rather than stereo-blind in our current study and focused on the variation of performance across the population.
We did not explicitly assess monocular motion perception. No observer reported an inability to see motion and there are no reports in the literature of observers being unable to see the direction of monocular motion in fully coherent displays as used here. This means that any inability to perceive the direction of motion in depth is specific to impairment in combination of the monocular motion signals. The neural origin of this impairment remains poorly understood.
Supported by a Hilldale Undergraduate/Faculty Research Fellowship (TH) and a grant from the Wisconsin Alumni Research Fund (WARF; BR).
Disclosure: B. Allen, None; A.M. Haun, None; T. Hanley, None; C.S. Green, None; B. Rokers, None