March 2021
Volume 62, Issue 3
Open Access
Visual Psychophysics and Physiological Optics  |   March 2021
Binocular Enhancement of Multisensory Temporal Perception
Author Affiliations & Notes
  • Collins Opoku-Baah
    Neuroscience Graduate Program, Vanderbilt University, Nashville, Tennessee, United States
    Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee, United States
  • Mark T. Wallace
    Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee, United States
    Department of Psychology, Vanderbilt University, Nashville, Tennessee, United States
    Department of Hearing and Speech, Vanderbilt University Medical Center, Nashville, Tennessee, United States
    Vanderbilt Vision Research Center, Nashville, Tennessee, United States
    Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States
    Department of Pharmacology, Vanderbilt University, Nashville, Tennessee, United States
Investigative Ophthalmology & Visual Science March 2021, Vol.62, 7. doi:https://doi.org/10.1167/iovs.62.3.7
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Collins Opoku-Baah, Mark T. Wallace; Binocular Enhancement of Multisensory Temporal Perception. Invest. Ophthalmol. Vis. Sci. 2021;62(3):7. doi: https://doi.org/10.1167/iovs.62.3.7.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: The goal of this study was to examine the behavioral effects and to suggest possible underlying mechanisms of binocularity on audiovisual temporal perception in normally-sighted individuals.

Methods: Participants performed two audiovisual simultaneity judgment tasks—one using simple flashes and beeps and the other using audiovisual speech stimuli—with the left eye, right eye, and both eyes. Two measures, the point of subjective simultaneity (PSS) and the temporal binding window (TBW), an index for audiovisual temporal acuity, were derived for each viewing condition, stimulus type, and participant. The data were then modeled using causal inference, allowing us to determine whether binocularity affected low-level unisensory mechanisms (i.e., sensory noise level) or high-level multisensory mechanisms (i.e., prior probability of interring a common cause, pC=1).

Results: Whereas for the PSS there was no significant effect of viewing condition, for the TBW, a significant interaction between stimulus type and viewing condition was found. Post hoc analyses revealed a significantly narrower TBW during binocular than monocular viewing (average of left and right eyes) for the flash-beep condition but no difference between the viewing conditions for the speech stimuli. Modeling results showed no significant difference in pC=1 but a significant reduction in sensory noise during binocular performance on flash-beep trials.

Conclusions: Binocular viewing was found to enhance audiovisual temporal acuity as indexed by the TBW for simple low-level audiovisual stimuli. Furthermore, modeling results suggest that this effect may stem from enhanced sensory representations evidenced as a reduction in sensory noise affecting the measurement of physical asynchrony during audiovisual temporal perception.

A fundamental component of human vision is the combination of the signals received separately from the two eyes into a single image.1,2 Besides stereopsis and a widened field of view, using two eyes compared to one often yields improved performance on a number of measures, a phenomenon termed binocular summation; see detailed reviews by Blake and Fox.1,2 These summation effects are seen on tasks using both threshold (i.e., contrast detection)36 and suprathreshold stimuli (i.e., contrast discrimination,5,7 Vernier acuity,8 visual acuity,9,10 reaction times,1113 etc.). Collectively, these psychophysical studies have revealed that using two eyes compared with one eye can result in performance improvements ranging from 30% to 70%. In addition, evidence from electrophysiological studies in humans has shown that binocular viewing elicits evoked potentials of approximately 25% greater amplitude when compared with monocular viewing.14,15 
Although most work is consistent with the general finding of binocular summation, the magnitude of summation differs across studies and can even include instances where binocular viewing results in poorer performance or lower evoked potential amplitudes compared with that of one eye.14,16,17 Factors such as task and stimulus characteristics, individual differences, as well as differences in monocular performance, can influence the magnitude of binocular summation.3,1820 For example, Frisén and Lindblom18 discovered that binocular summation was relatively high (resulting in performance gains of about 40%) for tasks with low stimulus complexity (i.e., differential light sensitivity of target luminance) and nonexistent for tasks with high stimulus complexity (i.e., pattern recognition of digits against a random checkboard background). Among clinical populations, such as patients with amblyopia, a neurodevelopmental disorder of the visual system associated with disrupted binocular vision,2123 studies have reported reduced magnitude of binocular summation compared to age-matched controls.2426 
Although binocular summation has been well studied for a variety of visual tasks, the study of the effects of binocularity on tasks that involve the interaction of visual and non-visual stimuli (i.e., multisensory tasks) has received much less attention. Although humans are highly visual, a large number of real-world events are multisensory, giving rise to information that concurrently stimulates multiple senses. In fact, there is mounting evidence that supports the view that multisensory processing (i.e., the interaction and integration of information from multiple senses) may be a ubiquitous operation in the brain occurring at various levels of sensory processing hierarchies, including areas once considered classical unisensory processing hubs.27,28 The integration of multisensory information has both neural and perceptual consequences.29,30 At the neural level, studies have reported increased spiking activity of neurons in response to stimulus combinations (with responses that can exceed the simple summation of unisensory spiking responses), whereas at the perceptual level,29,30 multisensory integration has been shown to increase performance in detection, discrimination, localization, and reaction time tasks.3135 
One of the key facets of this multisensory integration is the determination regarding which signals arose from the same source. Important information about which stimuli should be integrated or bound is found in some of the low-level features of the multisensory pairing, such as their spatial and temporal coincidence.30,36 For example, in the temporal realm, sensory signals generated by the same event are likely to arrive at the sensory organs in close temporal proximity, and, hence, this proximity represents a powerful statistical cue with regard to the likelihood that the signals originated from the same event. 
Psychophysically, a number of studies have focused on understanding how the brain deals with multisensory temporal factors using simultaneity judgment (SJ) tasks.37 In a typical SJ task, participants are presented with paired multisensory stimuli (such as a visual flash and an auditory beep) with varying stimulus-onset asynchronies (SOAs) and are asked to determine whether the stimulus pair was “synchronous” or “asynchronous.” In other multisensory temporal tasks, subjects are asked to make temporal order judgements regarding which stimulus of the multisensory pairing appeared first.38 Participants’ reports of synchrony across the various SOAs can be used to create response distributions and allow the derivation of two important measures of multisensory temporal function—namely the point of subjective simultaneity (PSS) and the temporal binding window (TBW). The PSS is defined as the SOA at which perceived simultaneity is maximal. Interestingly, the PSS is not always at objective simultaneity (i.e., zero) but is usually found on the visual-leading side of the response distributions; see more discussion, Murray and Wallace.36 In addition, as opposed to being a fixed construct, the PSS tends to vary depending on a variety of factors. These factors can be stimulus related (such as stimulus duration and intensity),3941 task related (such as judging the onset vs the offset in an SJ task),42 or attention related (such as being asked to attend to one modality).4345 On the other hand, the TBW is the range of stimulus onset asynchronies within which two stimuli are likely to be perceptually bound or integrated,46 thus serving as a proxy measure for multisensory temporal acuity. Like the PSS, the TBW is modulated by stimulus-related factors such as effectiveness or reliability47 and stimulus complexity (e.g., flash-beep versus speech).48 
Although the PSS and the TBW have served as key constructs for understanding audiovisual temporal perception, the fact that they are descriptive measures derived by fitting Gaussian models limits the ability to make direct connections to neural mechanisms underlying audiovisual temporal perception.49 Consequently, Magnotti et al.49 developed a variant of the causal inference model (see reference 50) in an effort to provide greater mechanistic insights into how an observer makes synchrony judgments using the temporal relationship between the multisensory cues. This model breaks the processes involved in audiovisual simultaneity perception into low-level unisensory processes involving the encoding and processing of the individual cues and higher-level multisensory processes involving the binding or integration of these multiple sensory stimuli.49 In the implementation of the model, the reliability of unisensory encoding is indexed by σ, which represents the level of sensory noise in the measurement of the physical asynchrony (i.e., the relative onsets of the visual and auditory signals). In this framework, when the reliability of the visual (or auditory) information decreases, the value of the sensory noise parameter increases, leading to measurements that are less precise. For example, Magnotti et al.49 demonstrated that blurring the visual speech in an audiovisual simultaneity judgment task decreased reliability of the visual information and thus increased the level of sensory noise associated with the estimation of physical asynchrony. On the other hand, the high-level multisensory mechanisms are indexed by pC=1, the observer's prior probability of inferring a common cause. Thus, as pC=1 increases, there is an increase in the tendency to bind the audiovisual signals. 
Clinically, patients with conditions such as autism, schizophrenia, and amblyopia exhibit widened TBW compared to age-matched controls, suggesting that impaired multisensory temporal function may have cascading effects into domains of clinical interest.46,51,52 Although these patients show a similar phenotype (i.e., widened TBW), using the causal inference model, Noel et al.51 demonstrated that the widened TBW in patients with autism may result from atypical priors (i.e., increased pC=1), whereas that of patients with schizophrenia may stem from a combination of atypical priors and weakened sensory representations (i.e., increased σ). In the case of amblyopia, there still remains questions about whether the widened TBW is due to impaired binocular vision (i.e., deficits in formation of sensory representations) or impaired multisensory interactions (i.e., deficits in priors), which could occur as a result of abnormal visual experience during development.53,54 
The purpose of this study was to understand the effect of binocularity on audiovisual temporal perception in normally sighted individuals. Specifically, our objective was to determine whether binocular viewing could affect audiovisual temporal perception as indexed via the PSS and TBW. Moreover, we were interested in determining whether differences in monocular versus binocular viewing were dependent on the nature of the stimuli used in the task, and thus used both simple low-level stimuli (i.e., flashes and beeps) and complex higher-level stimuli (i.e., speech). Last, we used the causal inference model to determine whether binocular viewing affected low-level unisensory mechanisms (i.e., level of sensory noise) or high-level multisensory mechanisms (i.e., prior probability of interring a common cause, pC=1) during audiovisual temporal perception. On the basis of evidence from prior studies, we established several hypotheses. First, we hypothesized that binocular viewing would shift the PSS toward the auditory leading side (signifying more visual-biased responses) and reduce the size of the TBW (signifying improved audiovisual temporal acuity). This hypothesis was based on the well-established fact that binocular viewing enhances perceived stimulus intensity7,16,55 and the fact that increasing intensity of the visual stimulus in an SJ task shifts the PSS toward the visual leading side and reduces the TBW. Second, we hypothesized that the effects of binocular viewing on these measures would be greater for the simple flash-beep stimuli when compared with the speech stimuli based on prior evidence that binocular summation tend to decrease with increasing stimulus complexity. Last, given the fact that binocular viewing enhances stimulus reliability, we hypothesized that binocular viewing would reduce sensory noise but not participant's prior probability of inferring a common cause. Importantly, the findings of this study would contribute to the understanding of the effects of binocular vision and, to some degree, visual processes on multisensory perception. 
Methods
Participants
Nineteen participants (male 5, age [mean ± SD] 19.8 ± 1.7 years) performed audiovisual SJ tasks with the flash-beep stimuli and with the speech stimuli and were compensated with either gift cards or course credits. All participants presented normal or corrected-to-normal vision, normal binocular vision, and normal hearing. Normal vision was defined as both eyes having a visual acuity better than 20/30 whereas binocular vision was defined as stereo acuity better than 60 arc-seconds. Visual acuity and stereoacuity measurements were made using a Snellen chart at 6m and a Randot stereo chart, respectively. Each participant gave informed consent before being allowed to participate. All recruitment and experimental procedures were approved by the Vanderbilt University Institutional Review Board and were carried out in accordance with the Declaration of Helsinki. Four participants were excluded from further analysis because of high proportions of synchrony reports for high SOA values in one or more experiments. 
Stimulus and Apparatus
All experimental procedures for both the flash-beep and speech SJ tasks took place inside a dimly lit WhisperRoom (SE 2000 Series). The visual stimuli for both stimulus types were displayed on a gamma-corrected monitor (21-inch Asus LCD) with 120-Hz refresh rate while the auditory stimuli were presented binaurally through headphones (Sennheiser HD559). For the flash-beep task, the visual stimulus was a white annular ring with an outer and inner diameter of 6° and 3°, respectively. The ring was displayed at the center of fixation and at 50 cd/m2 luminance on a screen with luminance of 10 cd/m2. The auditory stimulus was an 1800Hz brief tone presented at ∼70dB. While the visual stimulus was presented for 17ms, the auditory stimulus was presented for 10ms and was linearly ramped up and down each for 2 ms. Both the visual and auditory stimuli for the flash-beep task were generated and presented using MATLAB (Math Works Inc., Natick, MA, USA) software with the Psychophysics Toolbox Version.56,57 On the other hand, the stimuli for the audiovisual speech task consisted of a video of a female talker uttering the phoneme /ba/, including all prearticulatory movements, with a pixel resolution of 1920 × 1080 and a duration of ∼2300 ms.58 The auditory component of the video was presented at ∼70 dB. All speech stimuli for SJ tasks were presented using E-Prime version 2.0.8. A Minolta Chroma Meter CS-100 and a sound level meter were used to verify the luminance and sound intensity levels, respectively. The durations of all visual and auditory stimuli, as well as the SOAs, were confirmed using a Hameg 507 oscilloscope (Hameg Instruments, Mainhausen, Germany) with a photovoltaic cell and microphone. 
Procedure
Each participant completed two sessions of the flash-beep (FB) task and two sessions of the speech (SP) task, arranged in an FB-SP-FB-SP or SP-FB-SP-FB order. This order was randomized and counterbalanced across participants. In each sub-session, participants performed the task with either the left eye, the right eye, or both eyes in separate, randomized blocks. During monocular viewing, the untested eye was covered with an opaque patch and after each monocular viewing block, participants took a five-minute break to reduce the effects of deprivation on subsequent sessions. For both tasks, participants judged whether the visual stimulus (which was flash ring for the FB task and lip movements for the speech task) and the auditory stimulus (which was brief tone for the FB task and “/ba/" sound for the speech task) occurred at the same time or at different times. From trial to trial, the onsets of the visual and auditory stimuli were separated by a set of predefined SOAs (FB task: ±400, ±300, ±200, ±150, ±100, ±50 and 0; SP task: ±500, ±400, ±300, ±250, ±200, ±150, ±100, and 0) where negative and positive SOA values corresponded to auditory-preceding-vision and vision-preceding-auditory SOAs respectively. For each block, each SOA was presented 10 times in randomized fashion totaling 260 trials for each viewing condition for the FB task and 300 trials for each viewing condition for the SP task. Each trial began with a brief fixation period which lasted between 700 and 1000ms (Fig. 1). During this period, participants viewed a centrally displayed plus sign on the screen. After the fixation period, the audiovisual stimulus was presented and participants were then asked to provide their responses by pressing “1” on the keyboard if the pair of audiovisual stimuli was synchronous or by pressing “2,” if the pair was asynchronous. Before participants began the main experiment, each was given brief initial practice sessions using the highest SOAs for each task to ensure task familiarization and comprehension. Participants were not provided with feedback on the correctness of their responses during the main experiment. 
Figure 1.
 
Schematic of the procedure for the (A) flash-beep SJ task and (B) speech SJ task. Participants judged the simultaneity of a visual stimulus (flash of light [A] and lip movements [B]) and an auditory stimulus (auditory beep [A] and phoneme /ba/ [B]) presented with varying stimulus onset asynchronies. On each trial, there was a brief fixation period (700–1000 ms), followed by the stimulus presentation. Participants were then asked to respond by pressing the keyboard after which the next trial began automatically.
Figure 1.
 
Schematic of the procedure for the (A) flash-beep SJ task and (B) speech SJ task. Participants judged the simultaneity of a visual stimulus (flash of light [A] and lip movements [B]) and an auditory stimulus (auditory beep [A] and phoneme /ba/ [B]) presented with varying stimulus onset asynchronies. On each trial, there was a brief fixation period (700–1000 ms), followed by the stimulus presentation. Participants were then asked to respond by pressing the keyboard after which the next trial began automatically.
Derivation of Behavioral Measures
For each participant, we pooled responses from blocks for each viewing condition and stimulus type and then computed proportions of synchrony reports as a function of SOA using the pooled data. To determine the PSS and TBW values for each viewing condition and stimulus type, we fitted a single-term Gaussian distribution model with the amplitude, mean and standard deviation as free parameters. Although the mean and standard deviation parameters ranged from negative infinity to positive infinity, the range of possible values for the amplitude parameter was bound between 0 and 1. The averaged r2 values for flash-beep task (0.92 ± 0.05) and the speech task (0.91 ± 0.06) showed reasonable fits to the data. We derived the PSS and the TBW as the mean and standard deviation of the best fitting Gaussian model, respectively. 
Fitting the Causal Inference Model
The causal inference model provides a mechanistic understanding of how an observer makes synchrony judgments between two stimuli from different sensory modalities during the performance of an SJ task.49 We point to Magnotti et al.49 for a more detailed derivation of this model. Moreover, although the model was originally derived using speech stimuli, in principle the model should work for other stimuli such as flash-beep used in SJ tasks. 
According to the causal inference model, the brain first infers the underlying causal structure of cues from multiple sensory modalities before combining them (Fig. 2). This underlying causal structure can be one of two possibilities, which are (1) the events having a common cause (C = 1) or (2) the two events having different causes (C = 2). Naturally, events emanating from a common source such as auditory and visual speech results in a narrow distribution of physical asynchronies with a mean that is characteristic of the relationship between the two cues. For instance, the asynchrony distribution of audiovisual speech has a positive mean owing to the small delay between the visual and the auditory onsets. This delay stems from the fact that pre-articulatory facial movements occur before the engagement of the vocal cords during speech. In the case of nonspeech stimuli, the auditory and visual stimuli most likely have similar onsets and thus may result in an asynchrony distribution with a mean of zero. When the two events have different causes, the distribution of physical asynchronies is broad and has a mean of zero because of the lack of relationship between the cues. Furthermore, the model posits that the observer's measured asynchrony is subject to sensory noise and, hence, follows a broader distribution than physical asynchrony. When these component distributions are overlaid, a window of measured asynchronies for which the probability of inferring a common cause outweighs the probability of inferring different causes emerges. This window termed the Bayes-optimal synchrony window is independent of the physical asynchrony between the cues observed and, hence, represents a decisional structure used by the observer in making synchrony judgments. In its implementation, the causal inference model uses six parameters, which can be grouped into two subject parameters and four stimulus parameters. The first subject parameter is σ, which represents sensory noise that corrupts the measurement of the physical asynchrony, and thus, as σ increases, there is a decrease in the precision of measuring physical asynchrony. The second subject parameter is pC=1, which represents the prior probability of a common cause. When pC=1 is high, there is an increased tendency to report synchrony. The stimulus parameters include the mean and standard deviation of the C = 1 (µC=1, σC=1) and C=2 (µC=2, σC=2) distributions. 
Figure 2.
 
Causal inference model for audiovisual SJ tasks. Before multiple cues are combined, the brain determines whether they originate from a common source (C = 1) or different sources (C = 2). Auditory and visual stimuli that share a common source have a narrow distribution of physical asynchronies (middle, blue) and a mean that suggest a relationship between the cues (e.g., positive mean for speech or zero mean for flash-beep). When the paired stimuli have different sources, the distribution is broad, and the mean is zero due (middle, red). According to the model, each participant possesses a prior tendency to bind multiple sensory information across time (pC=1, top) and samples information from the sensory world with a certain level of noisiness (sensory noise, bottom). Combining these components creates of window of measured asynchronies where the probability of inferring a common cause is more likely than that of separate causes (middle right). This window termed the Bayes’ optimal window is asynchrony serves a decision structure for judging the simultaneity of these events. Figure modified from Noel, Stevenson, and Wallace.51
Figure 2.
 
Causal inference model for audiovisual SJ tasks. Before multiple cues are combined, the brain determines whether they originate from a common source (C = 1) or different sources (C = 2). Auditory and visual stimuli that share a common source have a narrow distribution of physical asynchronies (middle, blue) and a mean that suggest a relationship between the cues (e.g., positive mean for speech or zero mean for flash-beep). When the paired stimuli have different sources, the distribution is broad, and the mean is zero due (middle, red). According to the model, each participant possesses a prior tendency to bind multiple sensory information across time (pC=1, top) and samples information from the sensory world with a certain level of noisiness (sensory noise, bottom). Combining these components creates of window of measured asynchronies where the probability of inferring a common cause is more likely than that of separate causes (middle right). This window termed the Bayes’ optimal window is asynchrony serves a decision structure for judging the simultaneity of these events. Figure modified from Noel, Stevenson, and Wallace.51
To fit the model to our data, we used routines from source codes available freely on this website: http://openwetware.org/wiki/Beauchamp:CIMS. Following procedures in Magnotti et al.,49 we fitted the model to the data for each viewing condition (left, right and both eyes), stimulus condition (i.e., flash-beep and speech) and subject. Each model had five free parameters that were the σ, pC=1 and three stimulus-based parameters (σC=1, µC=2, σC=2); µC=1 was set to zero. The ranges for the possible parameter values were set as follows: pC=1 [0.01, 0.99], σC=1 [0, 150], µC=2 [-200, 200], and σC=2 [100, 300]. For each subject, viewing condition and stimulus condition, we determined the parameter values for 200 models using different initial positions of the starting parameter values and maximizing the binomial log-likelihood function on the observed data. Of the 200 models, the best fitting model was determined by first excluding models that had parameters values within 5% of the predefined parameter limits and second, choosing the model with the highest r2 value. In a scenario where two or more models had the same r2 value, the final model was determined by averaging across these models. 
Results
Behavioral Results
We recorded synchrony judgments on two SJ tasks, one using a flash-beep stimuli and the other using speech stimuli, from 19 subjects, of which four were excluded from further analysis (see Methods section). Figure 3 shows the mean proportions of synchrony reports plotted as a function of SOA for the binocular condition (blue) and the averaged monocular conditions (orange) for (A) the flash-beep stimulus and (B) the speech stimulus. Audiovisual temporal perception was indexed for the two stimulus types via two perceptual measures—the PSS and the TBW. 
Figure 3.
 
Mean proportions of synchrony reports. Proportion of synchrony reports averaged across participants is plotted as a function of SOA (in ms) for (A) the flash-beep stimulus condition and (B) for the speech stimulus condition. Results for the binocular and the monocular conditions are represented in blue and orange colors, respectively. Filled circles represent mean values across participants; error bars represent standard error of the mean; and solid lines represent best fitting Gaussian distribution to the averaged data across participants.
Figure 3.
 
Mean proportions of synchrony reports. Proportion of synchrony reports averaged across participants is plotted as a function of SOA (in ms) for (A) the flash-beep stimulus condition and (B) for the speech stimulus condition. Results for the binocular and the monocular conditions are represented in blue and orange colors, respectively. Filled circles represent mean values across participants; error bars represent standard error of the mean; and solid lines represent best fitting Gaussian distribution to the averaged data across participants.
To determine the effect of binocularity on audiovisual temporal perception, we conducted 2 × 2 repeated measures ANOVA with Greenhouse-Geisser correction on each of the performance measures (i.e., PSS and TBW) with viewing condition (i.e., binocular vs. monocular) and stimulus type (i.e., flash-beep and speech) as the within-subject factors using the JASP software version 0.11.1.59 Here, monocular performance was defined as the averaged performances of the left and right eye conditions. We were able to pool the results for the right and left eyes since there was no statistically significant difference between them for both the PSS and TBW for both stimulus types (i.e., flash-beep and speech); all P > 0.3. Results were represented in mean ± SE, and all statistical analyses were two-tailed with an alpha (α) of 0.05. 
Binocular Viewing has no Effect on PSS for SJ Tasks Using Either Flash-Beep or Speech Stimuli
For the PSS, a two-way repeated measures ANOVA revealed a significant main effect of stimulus type (F(1,14) = 58.9, P = 2.2 × 10−6, η2p = 0.81; Fig. 4). The PSS averaged across all viewing conditions was significantly shifted toward more positive values for the speech stimulus (85.87 ± 12ms) compared to the flash-beep stimulus (−0.51 ± 10 ms). Surprisingly, our analysis showed no significant effect of viewing condition (F(1,14) = 0.17, P = 0.689, η2p = 0.012) and no significant interaction between viewing condition and stimulus type (F(1,14) = 0.101, p = 0.755, η2p = 0.007), indicating no effect of binocular viewing on the PSS for either stimulus type. 
Figure 4.
 
Effects of viewing condition and stimulus type on point of subjective simultaneity (PSS). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) PSS values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents line of equality between binocular and monocular PSS values. (B) Mean PSS results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 4.
 
Effects of viewing condition and stimulus type on point of subjective simultaneity (PSS). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) PSS values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents line of equality between binocular and monocular PSS values. (B) Mean PSS results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Binocular Viewing Enhances Audiovisual Temporal Acuity for SJ Tasks Using Flash-Beep Stimuli But Not Speech Stimuli
Consistent with our hypothesis, a two-way repeated measures ANOVA conducted on the TBW revealed a significant effect of stimulus type (F(1,14) = 6.34, P = 0.025, η2p = 0.312), a significant effect of viewing condition (F(1,14) = 7.35, P = 0.017, η2p = 0.344) and a significant interaction between stimulus type and viewing condition (F(1,14) = 4.73, P = 0.047, η2p = 0.253; Fig. 5). Furthermore, to investigate the dependence of this TBW difference on stimulus type, we conducted a post-hoc simple effects analysis with Bonferroni correction on the ANOVA results. Our analysis revealed that for the flash-beep stimulus, the TBW for binocular viewing (218.6 ± 17 ms) was significantly narrower than that for monocular viewing (243.1 ± 17 ms; t(14) = −3.91, P = 0.002, d = −1.01, adjusted α = 0.025). In contrast, for the speech stimulus, there was no significant difference between the TBW for binocular viewing (266.3 ± 19 ms) and that for monocular viewing (269.4 ± 22 ms; t(14) = −0.396, p = 0.698, d = −0.102, adjusted α = 0.025). 
Figure 5.
 
Effects of viewing condition and stimulus type on the size of the temporal binding window (TBW). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) TBW size values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents the line of equality between binocular and monocular TBW size values. (B) Mean TBW results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ±SEM.
Figure 5.
 
Effects of viewing condition and stimulus type on the size of the temporal binding window (TBW). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) TBW size values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents the line of equality between binocular and monocular TBW size values. (B) Mean TBW results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ±SEM.
Causal Inference Model Results
To provide more mechanistic insights into our findings, we used the causal inference model developed by Magnotti et al.49 As described earlier, this model provides a first-principles analysis of how the temporal relationship between cues can be leveraged to determine whether these cues originate from a common source (C = 1) or different sources (C = 2) (Fig. 2). The model uses six parameters, which include four stimulus-based parameters (µC=1, σC=1, µC=2, σC=2) and two subject-based parameters. The subject-based parameters consist of a sensory noise parameter, σ, which is a proxy for reliability of unisensory encoding or the level of noisiness in the formation of sensory representations, and pC=1, which represents the prior probability of inferring a common cause or the tendency to bind the multisensory cues. 
We determined parameter values for the best-fitting models for each viewing condition, stimulus condition and subject. To determine the effect of binocularity on these parameters, we conducted 2 × 2 repeated measures ANOVA with Greenhouse-Geisser correction on each of the parameter values with viewing condition (i.e., binocular vs. monocular) and stimulus type (i.e., flash-beep and speech) as the within-subject factors. Again, monocular performance was defined as the averaged performances of the left and right eye conditions, results were represented in mean ± SE, and all statistical analyses were two-tailed with an α of 0.05. 
Binocular Viewing Does Not Affect Stimulus-Based Parameters Derived From the Causal Inference Model
Because the same set of stimuli was presented across the viewing conditions, we expected no difference in the stimulus-based parameters across the viewing conditions but differences between the stimulus conditions. Indeed, for each of the three stimulus-based parameters (µC=2, σC=1, σC=2), there was a significant main effect of stimulus (µC=2: F(1,14) = 4.72, P = 0.048, η2p = 0.252; σC=1: F(1,14) = 50.03, P = 5.56 × 10−6, η2p = 0.781; σC=2: F(1,14) = 5.06, P = 0.041, η2p = 0.266). However, neither viewing condition (µC=2: p = 0.653; σC=1: P = 0.325; σC=2: P = 0.607) nor an interaction between the stimulus and viewing conditions (µC=2: p = 0.077; σC=1: P = 0.963; σC=2: P = 0.958) had an effect on any of the stimulus-based parameters (Fig. 6). 
Figure 6.
 
The effects of viewing condition and stimulus type on stimulus-based parameters of the causal inference model. (A, C, E) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the µC=2 (A), σC=1 (C), and σC=2 (E) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D, F) Mean values of the µC=2 (B), σC=1 (D), and σC=2 (F) parameters are plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 6.
 
The effects of viewing condition and stimulus type on stimulus-based parameters of the causal inference model. (A, C, E) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the µC=2 (A), σC=1 (C), and σC=2 (E) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D, F) Mean values of the µC=2 (B), σC=1 (D), and σC=2 (F) parameters are plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 7.
 
The effects of viewing condition and stimulus type on the prior (pC=1) and the sensory noise (σ) parameters of the causal inference model. (A, C) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the pC=1 (A) and σ (C) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D) Mean values of the pC=1 (B) and σ (D) parameters are plotted for the stimulus and the viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 7.
 
The effects of viewing condition and stimulus type on the prior (pC=1) and the sensory noise (σ) parameters of the causal inference model. (A, C) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the pC=1 (A) and σ (C) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D) Mean values of the pC=1 (B) and σ (D) parameters are plotted for the stimulus and the viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Binocular Enhancement in Audiovisual Temporal Acuity Could be Explained by a Reduction in Sensory Noise Affecting the Measurement of Physical Asynchrony
On the basis of prior literature, binocular integration is predominantly a low-level visual phenomenon and thus is more likely to affect the encoding process of the visual sensory information. Consequently, we hypothesized that binocular viewing would most likely affect the sensory noise parameter and not the participant's prior probability of inferring a common cause. In line with our hypothesis, first, we observed no effect of stimulus condition (F(1,14) = 1.69, P = 0.214, η2p = 0.108), viewing condition (F(1,14) = 0.55, P = 0.470, η2p = 0.038) and stimulus-viewing interaction (F(1,14) = 1.33, P = 0.268, η2p = 0.087) on the participant's prior probability of inferring a common cause, pC=1 (Fig. 7). Conversely, sensory noise (σ) was affected by both stimulus condition (F(1,14) = 5.68, P = 0.032, η2p = 0.288) and viewing condition (F(1,14) = 17.39, P = 9.43 × 10−4, η2p = 0.554) but not their interaction (F(1,14) = 0.65, P = 0.433, η2p = 0.045). Moreover, in accordance with our behavioral findings, we expected the difference in sensory noise across the viewing conditions to occur for the flash-beep stimulus condition but not for the speech stimulus condition. Therefore we conducted paired t-tests with Bonferroni correction between the binocular and monocular sensory noise values for the flash-beep condition and speech condition separately. Although there was a statistically significant reduction in sensory noise during binocular viewing for the flash-beep condition (t(14) = −4.0, P = 0.001, d = −1.032, Bonferroni-adjusted α = 0.025), the binocular difference in sensory noise for the speech condition was not statistically significant (t(14) = −1.78, P = 0.097, d = −0.46, Bonferroni-adjusted α = 0.025). These findings indicate that the effect of binocular viewing on audiovisual temporal perception observed for the flash-beep stimuli may stem from enhanced low-level unisensory mechanisms in the form of a reduction in sensory noise affecting the measurement of physical asynchrony during audiovisual temporal perception. 
Discussion
This study provides the first clear evidence of binocular summation in audiovisual temporal perception in normally sighted individuals. The key finding was that audiovisual temporal acuity, as indexed by the TBW, was improved under binocular viewing conditions. Consistent with prior studies, this benefit was only seen when low-level audiovisual stimuli were used and was absent with the use of audiovisual speech stimuli. Causal inference modeling suggests that the binocular benefit was a result of a reduction in sensory noise affecting the measurement of physical asynchrony during audiovisual temporal perception. 
Although our study investigated binocular summation using a multisensory (i.e., audiovisual) task, our finding that binocular viewing enhances audiovisual temporal acuity is in line with studies that have reported binocular summation in several suprathreshold visual tasks such as contrast and orientation discrimination tasks, visual and Vernier acuity tasks and reaction times tasks.313 Previous physiologically plausible models explaining these findings of binocular summation in visual tasks (especially using contrast and luminance detection and discrimination tasks) posited that the inputs from the corresponding retinal points in the two eyes are linearly transduced before they undergo binocular summation and finally, suppressive ocular interactions, mostly in the primary visual cortex.6 However, recent work challenges this framework and demonstrates that models that include suppressive ocular interactions before summation may provide better fits and explanation to these findings of binocular summation.4,7 
In the case of audiovisual temporal perception, studies have benefitted from Bayesian modelling approaches including the causal inference model applied in this study.32,49,50 Generally, these models comprise parameters that index processes occurring at the unisensory level and those that involve the binding and or the integration of multisensory cues. Considering the fact that binocular integration is a low-level visual phenomenon occurring predominantly in the primary visual cortex,1 we believe that the role binocular integration plays in audiovisual simultaneity perception can be explained by considering the summation of the luminance energies of the suprathreshold visual stimuli received from the two eyes prior to multisensory integration. Following the evidence that binocular viewing enhances perceived stimulus intensity, our finding of binocular summation of audiovisual temporal acuity (i.e. reduction in the TBW) for the flash-beep task fits studies that have demonstrated that increasing the effectiveness of the stimuli in an SJ task improves audiovisual temporal acuity.47,49 For instance, Fister et al.47 investigated the effect of increasing stimulus intensity on the probability of making synchrony judgments for visual-leading SOAs in an SJ task. They discovered that as SOA increased, the probability of making synchrony judgments fell more rapidly for the highly effective stimuli than for the lowly effective stimuli. This finding implied that increasing the effectiveness of the stimuli decreased the tolerance for audiovisual asynchrony, which manifests as a narrowing of the TBW. 
Using the causal inference model, our study showed that the binocular enhancement in the TBW observed for the flash-beep could be explained by a reduction in the level of sensory noise affecting the observer's judgment of asynchrony. Indeed, this finding agrees with the study by Magnotti et al.49 that demonstrated that manipulating stimulus reliability affects the noisiness in the formation of sensory representations, parameterized in the causal inference model as sensory noise, σ. Specifically, Magnotti et al.49 showed that when the reliability of the visual stimulus during the performance of an SJ task was decreased through blurring, there was an increase in the level of sensory noise (σ) affecting the judgment. Fitting Gaussian models to the data showed that the non-blurry stimulus condition (i.e., more reliable) had a narrower TBW, in concordance with our results where the binocular viewing condition decreased the TBW for the flash-beep stimuli. Besides SJ tasks, Beierholm et al.60 applied the causal inference model to an audiovisual spatial localization task and showed that high-contrast stimuli decreased the standard deviation of visual likelihood parameter signifying decreased noisiness in visual sensory representations. Although the two models (i.e. Magnotti et al.49 and Beierholm et al.60) were developed for different problems (i.e., audiovisual speech perception and audiovisual spatial localization respectively), Magnotti et al.49 highlighted that both problems are mathematically similar and that the models share the same theoretical framework. Hence, it is plausible to conclude that the sensory noise parameter in the model of Magnotti et al.49 and the standard deviation of visual likelihood in the model of Beierholm et al.60 serve a similar function because both relate to the noisiness in sensory representations. 
Although the causal inference model is able to differentiate between the contributions of low-level unisensory mechanisms (i.e., level of sensory noise) and high-level multisensory mechanisms (i.e., prior probability of inferring common cause) to changes in audiovisual temporal perception, when it comes to the unisensory mechanisms, it does not provide any insight into the type of sensory noise (i.e., whether internal or external) driving these changes. Moreover, the causal inference model does not make explicit assumptions about the sources of the sensory noise. However, considering the nature of the audiovisual temporal paradigm, it is plausible to hypothesize that the estimation of the physical synchrony using the visual and auditory cues may be based on the reliability of the binocular and the binaural outputs. Consequently, this may suggest that the source of the sensory noise in the model is found after binocular and binaural integration. Blake and colleagues1,2 discussed the plausibility of a model with late stage noise. Nevertheless, we believe that these details about the types and sources of noise should be incorporated into future developments of this model to facilitate the understanding of the different sensory noise mechanisms affecting audiovisual temporal perception. 
Although binocular viewing reduced the TBW for the flash-beep stimuli, it did not affect the TBW for the speech task. Based on prior studies, there are several possible explanations. First, prior work has shown that binocular summation is more likely to occur for tasks or stimuli with low complexity (i.e., differential light sensitivity of target luminance) as opposed to those with high complexity (i.e., pattern recognition of digits against a random checkboard background). Indeed, Frisén and Lindblom18 posited that the more complex the stimuli, the higher the level of cortical processing required and the smaller the magnitude of binocular summation. Second, the lack of binocular summation for the speech stimuli could be explained by studies that have shown that stimuli with higher energy (i.e. luminance or contrast) yield less binocular summation. For example, Home9 showed that for a pattern recognition task, binocular summation was high for low target contrasts and absent at higher contrasts. Additionally, the dependence of binocular summation on stimulus contrast has been demonstrated for discrimination tasks of contrast,5,7 orientation,61 and Vernier acuity.8 Thus the lack of binocular summation for the speech stimulus may stem from high stimulus complexity and/or high stimulus reliability. If the latter is true, then it can be hypothesized that the TBW of the speech stimulus may benefit from binocular enhancement if the reliability of the stimulus is reduced through blurring or addition of noise. 
Considering the evidence that binocular viewing can increase the neural response and perceived intensity of viewed visual targets, our finding of no effect of viewing condition on the PSS contradicts studies that have shown that increasing stimulus effectiveness affects the PSS.41,62 For example, Boenke, Deliano and Ohl41 revealed that increasing the intensity of the visual stimulus in a temporal order judgement task (a variant of SJ task) significantly shifted the PSS toward the auditory leading side, in other words, maximum perceived simultaneity was achieved with a stimulus pair of larger auditory-lead under increased visual intensity. However, to explain the seeming discrepancy here, it is essential that we consider how amenable the PSS is to changes in stimulus intensity assuming all other factors remain constant. For instance, in the study by Boenke et al.,41 increasing the intensity of the visual stimulus by approximately fivefold shifted the PSS by 27 ms to the left (i.e., toward the auditory leading side). On the basis of this analysis, one would expect that for binocular viewing, which enhances perceived brightness by approximately 40%, there would be a shift in the PSS of only 2 ms assuming a linear relationship between PSS and stimulus intensity. In fact, the lack of PSS shift under binocular viewing is consistent with studies that have assessed the impact on audiovisual temporal perception by visual phenomena that modulate perceived stimulus effectiveness. For example, Opoku-Baah and Wallace63 showed that a brief period of monocular deprivation, a phenomenon known to boost perceived contrast in the deprived eye, did not significantly affect the PSS, although changes in the TBW were observed. 
Importantly, we believe our findings have clinical implications for understanding the underlying mechanisms of the multisensory perceptual deficits observed in patients with impaired binocular vision such as in amblyopia. Several studies have shown that patients with amblyopia suffer several visual deficits including reduced visual acuity, reduced stereopsis21,64,65 and even deficits in higher-level perceptual functions such as global shape detection,66 motion processing,67 and real-world scene perception.68 Recently, amblyopia has been linked with deficits in audiovisual integration.52,69,70 For instance, Narinesingh et al.69 showed that adult patients with amblyopia exhibited reduced susceptibility to the McGurk effect compared to age-matched controls. With regard to audiovisual temporal perception, Richards et al.52 demonstrated that amblyopes compared to age-matched controls exhibited significantly widened TBW but no difference in the PSS when tested on an SJ task with the flash-beep stimuli. Using a subset of six amblyopes, they also showed that the size of the TBW was not different across viewing conditions, which were binocular, better eye and amblyopic eye conditions.52 Interestingly, although the widened TBW observed in amblyopes indicates impaired multisensory temporal integration, the absence of an effect of viewing condition on the TBW measured in amblyopes and the finding of binocular enhancement in audiovisual simultaneity perception in normally-sighted individuals provided by this study suggest a possible role of impaired binocular vision in the observed multisensory deficits in amblyopia. These suggestions warrant further studies geared at understanding the relative contributions of impaired binocular vision and impaired multisensory integration to the observed deficits in multisensory temporal function. It will also be interesting to investigate how the relative contributions of these mechanisms differ based on factors such as amblyopia severity and etiology. Furthermore, we believe that the causal inference model as applied in our study will be a useful tool in providing an interesting picture of whether the deficits in audiovisual temporal perception observed in amblyopia stem from impaired binocular vision (formation of sensory representations) and or impaired multisensory processing (prior probability of inferring a common cause, also known as the binding tendency). Such a finding will help inform whether multisensory perceptual training paradigms should be developed to target these mechanisms separately in the management of amblyopia. Together, these studies will enrich our understanding of the overall sensory and perceptual deficits in amblyopia and their underlying mechanisms and enable the development of behavioral therapies that address these mechanisms. 
Acknowledgments
The authors thank to the lab undergraduates, Brian Hou and Julia Olsen, for helping with data collection. 
Disclosure: C. Opoku-Baah, None; M.T. Wallace, None 
References
Blake R, Fox R. The psychophysical inquiry into binocular summation. Perception Psychophysics. 1973; 14: 161–185. [CrossRef]
Blake R, Sloane M, Fox R. Further developments in binocular summation. Perception Psychophysics. 1981; 30: 266–276. [CrossRef] [PubMed]
Baker DH, Lygo FA, Meese TS, Georgeson MA. Binocular summation revisited: Beyond√ 2. Psychol Bull. 2018; 144: 1186. [CrossRef] [PubMed]
Meese TS, Georgeson MA, Baker DH. Binocular contrast vision at and above threshold. J Vis. 2006; 6: 7–7. [CrossRef]
Legge GE. Binocular contrast summation—I. Detection and discrimination. Vis Res. 1984; 24: 373–383. [CrossRef] [PubMed]
Legge GE. Binocular contrast summation—II. Quadratic summation. Vis Res. 1984; 24: 385–394. [CrossRef] [PubMed]
Georgeson M, Meese T, Baker D. Binocular interaction: Contrast matching and contrast discrimination are predicted by the same model. Spatial Vis. 2007; 20: 397–413. [CrossRef]
Banton T, Levi DM. Binocular summation in Vernier acuity. JOSA A. 1991; 8: 673–680. [CrossRef]
Home R. Binocular summation: a study of contrast sensitivity, visual acuity and recognition. Vis Res. 1978; 18: 579–585. [CrossRef] [PubMed]
Cagenello R, Arditi A, Halpern DL. Binocular enhancement of visual acuity. JOSA A. 1993; 10: 1841–1848. [CrossRef] [PubMed]
Blake R, Martens W, Di Gianfilippo A. Reaction time as a measure of binocular interaction in human vision. Invest Ophthalmol Vis Sci. 1980; 19: 930–941. [PubMed]
Westendorf D, Blake R. Binocular reaction times to contrast increments. Vis Res. 1988; 28: 355–359. [CrossRef] [PubMed]
Yehezkel O, Sterkin A, Sagi D, Polat U. Binocular summation of chance decisions. Sci Rep. 2015; 5: 16799. [CrossRef] [PubMed]
Pardhan S, Gilchrist J, Douthwaite W, Yap M. Binocular inhibition: psychophysical and electrophysiological evidence. Optom Vis Sci. 1990; 67: 688–691. [CrossRef] [PubMed]
Harter MR, Seiple WH, Salmon L. Binocular summation of visually evoked responses to pattern stimuli in humans. Vis Res. 1973; 13: 1433–1446. [CrossRef] [PubMed]
Levelt WJ. Binocular brightness averaging and contour information. Br J Psychol. 1965; 56: 1–13. [CrossRef] [PubMed]
Curtis DW, Rule SJ. Binocular processing of brightness information: A vector-sum model. J Exp Psychol. 1978; 4: 132.
Frisén L, Lindblom B. Binocular summation in humans: evidence for a hierarchic model. J Physiol. 1988; 402: 773–782. [CrossRef] [PubMed]
Pardhan S. A comparison of binocular summation in young and older patients. Curr Eye Res. 1996; 15: 315–319. [CrossRef] [PubMed]
Pardhan S, Gilchristt J. The effect of monocular defocus on binocular contrast sensitivity. Ophthalmic Physiol Opt. 1990; 10: 33–36. [CrossRef] [PubMed]
Levi DM, Knill DC, Bavelier D. Stereopsis and amblyopia: a mini-review. Vis Res. 2015; 114: 17–30. [CrossRef] [PubMed]
Birch EE. Amblyopia and binocular vision. Progr Retinal Eye Res. 2013; 33: 67–84. [CrossRef]
Hamm LM, Black J, Dai S, Thompson B. Global processing in amblyopia: a review. Front Psychol. 2014; 5: 583. [CrossRef] [PubMed]
Thompson B, Richard A, Churan J, Hess RF, Aaen-Stockdale C, Pack CC. Impaired spatial and binocular summation for motion direction discrimination in strabismic amblyopia. Vis Res. 2011; 51: 577–584. [CrossRef] [PubMed]
Dorr M, Kwon M, Lesmes LA, et al. Binocular summation and suppression of contrast sensitivity in strabismus, fusion and amblyopia. Front Hum Neurosci. 2019; 13: 234. [CrossRef] [PubMed]
Pardhan S, Gilchrist J. Binocular contrast summation and inhibition in amblyopia. Doc Ophthalmol. 1992; 82: 239–248. [CrossRef] [PubMed]
Ghazanfar AA, Schroeder CE. Is neocortex essentially multisensory? Trends Cogn Sci. 2006; 10: 278–285. [CrossRef] [PubMed]
Driver J, Noesselt T. Multisensory interplay reveals crossmodal influences on “sensory-specific” brain regions, neural responses, and judgments. Neuron. 2008; 57: 11–23. [CrossRef] [PubMed]
Stein BE, Stanford TR. Multisensory integration: current issues from the perspective of the single neuron. Nat Rev Neurosci. 2008; 9: 255–266. [CrossRef] [PubMed]
Stein BE, Meredith MA. The merging of the senses. Cambridge, MA: The MIT Press; 1993.
Frassinetti F, Bolognini N, Làdavas E. Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res. 2002; 147: 332–343. [CrossRef] [PubMed]
Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002; 415: 429–433. [CrossRef] [PubMed]
Zou H, Müller HJ, Shi Z. Non-spatial sounds regulate eye movements and enhance visual search. J Vis. 2012; 12: 2–2. [CrossRef] [PubMed]
Lovelace CT, Stein BE, Wallace MT. An irrelevant light enhances auditory detection in humans: a psychophysical analysis of multisensory integration in stimulus detection. Cogn Brain Res. 2003; 17: 447–453. [CrossRef]
Diederich A, Colonius H. Bimodal and trimodal multisensory enhancement: effects of stimulus onset and intensity on reaction time. Perception Psychophysics. 2004; 66: 1388–1404. [CrossRef] [PubMed]
Murray MM, Wallace MT. The neural bases of multisensory processes. Boca Raton, FL: CRC Press; 2011.
Zampini M, Guest S, Shore DI, Spence C. Audio-visual simultaneity judgments. Perception Psychophysics. 2005; 67: 531–544. [CrossRef] [PubMed]
Zampini M, Shore DI, Spence C. Audiovisual temporal order judgments. Exp Brain Res. 2003; 152: 198–210. [CrossRef] [PubMed]
Jaśkowski P. Reaction time and temporal-order judgment as measures of perceptual latency: The problem of dissociations. Adv Psychol. 1999; 129: 265–282. [CrossRef]
Sanford A. Effects of changes in the intensity of white noise on simultaneity judgements and simple reaction time. Q J Exp Psychol. 1971; 23: 296–303. [CrossRef]
Boenke LT, Deliano M, Ohl FW. Stimulus duration influences perceived simultaneity in audiovisual temporal-order judgment. Exp Brain Res. 2009; 198: 233–244. [CrossRef] [PubMed]
Wen P, Opoku-Baah C, Park M, Blake R. Judging relative onsets and offsets of audiovisual events. Vision. 2020; 4: 17. [CrossRef]
Schneider KA, Bavelier D. Components of visual prior entry. Cogn Psychol. 2003; 47: 333–366. [CrossRef] [PubMed]
Stelmach LB, Herdman CM. Directed attention and perception of temporal order. J Exp Psychol. 1991; 17: 539.
Zampini M, Shore DI, Spence C. Audiovisual prior entry. Neurosci Lett. 2005; 381: 217–222. [CrossRef] [PubMed]
Wallace MT, Stevenson RA. The construct of the multisensory temporal binding window and its dysregulation in developmental disabilities. Neuropsychologia. 2014; 64: 105–123. [CrossRef] [PubMed]
Fister JK, Stevenson RA, Nidiffer AR, Barnett ZP, Wallace MT. Stimulus intensity modulates multisensory temporal processing. Neuropsychologia. 2016; 88: 92–100. [CrossRef] [PubMed]
Stevenson RA, Wallace MT. Multisensory temporal integration: task and stimulus dependencies. Exp Brain Res. 2013; 227: 249–261. [CrossRef] [PubMed]
Magnotti JF, Ma WJ, Beauchamp MS. Causal inference of asynchronous audiovisual speech. Front Psychol. 2013; 4: 798. [CrossRef] [PubMed]
Körding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, Shams L. Causal inference in multisensory perception. PLoS One. 2007; 2: e943. [CrossRef] [PubMed]
Noel JP, Stevenson RA, Wallace MT. Atypical audiovisual temporal function in autism and schizophrenia: similar phenotype, different cause. Eur J Neurosci. 2018; 47: 1230–1241. [CrossRef] [PubMed]
Richards MD, Goltz HC, Wong AM. Alterations in audiovisual simultaneity perception in amblyopia. PloS One. 2017; 12: e0179516. [CrossRef] [PubMed]
Carriere BN, Royal DW, Perrault TJ, et al. Visual deprivation alters the development of cortical multisensory integration. J Neurophysiol. 2007; 98: 2858–2867. [CrossRef] [PubMed]
Wallace MT, Perrault TJ, Hairston WD, Stein BE. Visual experience is necessary for the development of multisensory integration. J Neurosci. 2004; 24: 9580–9584. [CrossRef] [PubMed]
Legge GE, Rubin GS. Binocular interactions in suprathreshold contrast perception. Perception Psychophysics. 1981; 30: 49–61. [CrossRef] [PubMed]
Brainard DH. The psychophysics toolbox. Spatial Vis. 1997; 10: 433–436. [CrossRef]
Pelli DG. The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vis. 1997; 10: 437–442. [CrossRef]
Simon DM, Wallace MT. Integration and temporal processing of asynchronous audiovisual speech. J Cogn Neurosci. 2018; 30: 319–337. [CrossRef] [PubMed]
JASP Team. JASP (Version 0.11. 1)[Computer software]. JASP Team: Amsterdam, Netherlands. 2019.
Beierholm UR, Quartz SR, Shams L. Bayesian priors are encoded independently from likelihoods in human multisensory perception. J Vis. 2009; 9: 23–23. [CrossRef] [PubMed]
Bearse MA, Jr, Freeman RD. Binocular summation in orientation discrimination depends on stimulus contrast and duration. Vis Res. 1994; 34: 19–29. [CrossRef] [PubMed]
Smith WF. The relative quickness of visual and auditory perception. J Exp Psychol. 1933; 16: 239. [CrossRef]
Opoku-Baah C, Wallace MT. Brief period of monocular deprivation drives changes in audiovisual temporal perception. J Vis. 2020; 20: 8–8. [CrossRef] [PubMed]
Levi D, Harwerth RS. Spatio-temporal interactions in anisometropic and strabismic amblyopia. Invest Ophthalmol Vis Sci. 1977; 16: 90–95. [PubMed]
Levi DM, Waugh SJ, Beard BL. Spatial scale shifts in amblyopia. Vis Res. 1994; 34: 3315–3333. [CrossRef] [PubMed]
Hess RF, Wang Y-Z, Demanins R, Wilkinson F, Wilson HR. A deficit in strabismic amblyopia for global shape detection. Vis Res. 1999; 39: 901–914. [CrossRef] [PubMed]
Simmers AJ, Ledgeway T, Hess RF, McGraw PV. Deficits to global motion processing in human amblyopia. Vis Res. 2003; 43: 729–738. [CrossRef] [PubMed]
Mirabella G, Hay S, Wong AM. Deficits in perception of images of real-world scenes in patients with a history of amblyopia. Arch Ophthalmol. 2011; 129: 176–183. [CrossRef] [PubMed]
Narinesingh C, Goltz HC, Wong AM. Temporal binding window of the sound-induced flash illusion in amblyopia. Invest Ophthalmol Vis Sci. 2017; 58: 1442–1448. [CrossRef] [PubMed]
Chen Y-C, Lewis TL, Shore DI, Maurer D. Early binocular input is critical for development of audiovisual but not visuotactile simultaneity perception. Curr Biol. 2017; 27: 583–589. [CrossRef] [PubMed]
Figure 1.
 
Schematic of the procedure for the (A) flash-beep SJ task and (B) speech SJ task. Participants judged the simultaneity of a visual stimulus (flash of light [A] and lip movements [B]) and an auditory stimulus (auditory beep [A] and phoneme /ba/ [B]) presented with varying stimulus onset asynchronies. On each trial, there was a brief fixation period (700–1000 ms), followed by the stimulus presentation. Participants were then asked to respond by pressing the keyboard after which the next trial began automatically.
Figure 1.
 
Schematic of the procedure for the (A) flash-beep SJ task and (B) speech SJ task. Participants judged the simultaneity of a visual stimulus (flash of light [A] and lip movements [B]) and an auditory stimulus (auditory beep [A] and phoneme /ba/ [B]) presented with varying stimulus onset asynchronies. On each trial, there was a brief fixation period (700–1000 ms), followed by the stimulus presentation. Participants were then asked to respond by pressing the keyboard after which the next trial began automatically.
Figure 2.
 
Causal inference model for audiovisual SJ tasks. Before multiple cues are combined, the brain determines whether they originate from a common source (C = 1) or different sources (C = 2). Auditory and visual stimuli that share a common source have a narrow distribution of physical asynchronies (middle, blue) and a mean that suggest a relationship between the cues (e.g., positive mean for speech or zero mean for flash-beep). When the paired stimuli have different sources, the distribution is broad, and the mean is zero due (middle, red). According to the model, each participant possesses a prior tendency to bind multiple sensory information across time (pC=1, top) and samples information from the sensory world with a certain level of noisiness (sensory noise, bottom). Combining these components creates of window of measured asynchronies where the probability of inferring a common cause is more likely than that of separate causes (middle right). This window termed the Bayes’ optimal window is asynchrony serves a decision structure for judging the simultaneity of these events. Figure modified from Noel, Stevenson, and Wallace.51
Figure 2.
 
Causal inference model for audiovisual SJ tasks. Before multiple cues are combined, the brain determines whether they originate from a common source (C = 1) or different sources (C = 2). Auditory and visual stimuli that share a common source have a narrow distribution of physical asynchronies (middle, blue) and a mean that suggest a relationship between the cues (e.g., positive mean for speech or zero mean for flash-beep). When the paired stimuli have different sources, the distribution is broad, and the mean is zero due (middle, red). According to the model, each participant possesses a prior tendency to bind multiple sensory information across time (pC=1, top) and samples information from the sensory world with a certain level of noisiness (sensory noise, bottom). Combining these components creates of window of measured asynchronies where the probability of inferring a common cause is more likely than that of separate causes (middle right). This window termed the Bayes’ optimal window is asynchrony serves a decision structure for judging the simultaneity of these events. Figure modified from Noel, Stevenson, and Wallace.51
Figure 3.
 
Mean proportions of synchrony reports. Proportion of synchrony reports averaged across participants is plotted as a function of SOA (in ms) for (A) the flash-beep stimulus condition and (B) for the speech stimulus condition. Results for the binocular and the monocular conditions are represented in blue and orange colors, respectively. Filled circles represent mean values across participants; error bars represent standard error of the mean; and solid lines represent best fitting Gaussian distribution to the averaged data across participants.
Figure 3.
 
Mean proportions of synchrony reports. Proportion of synchrony reports averaged across participants is plotted as a function of SOA (in ms) for (A) the flash-beep stimulus condition and (B) for the speech stimulus condition. Results for the binocular and the monocular conditions are represented in blue and orange colors, respectively. Filled circles represent mean values across participants; error bars represent standard error of the mean; and solid lines represent best fitting Gaussian distribution to the averaged data across participants.
Figure 4.
 
Effects of viewing condition and stimulus type on point of subjective simultaneity (PSS). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) PSS values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents line of equality between binocular and monocular PSS values. (B) Mean PSS results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 4.
 
Effects of viewing condition and stimulus type on point of subjective simultaneity (PSS). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) PSS values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents line of equality between binocular and monocular PSS values. (B) Mean PSS results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 5.
 
Effects of viewing condition and stimulus type on the size of the temporal binding window (TBW). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) TBW size values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents the line of equality between binocular and monocular TBW size values. (B) Mean TBW results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ±SEM.
Figure 5.
 
Effects of viewing condition and stimulus type on the size of the temporal binding window (TBW). (A) Scatterplot showing binocular (y-axis) and monocular (x-axis) TBW size values of participants for the flash-beep condition (green) and the speech condition (magenta). Dashed line represents the line of equality between binocular and monocular TBW size values. (B) Mean TBW results plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ±SEM.
Figure 6.
 
The effects of viewing condition and stimulus type on stimulus-based parameters of the causal inference model. (A, C, E) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the µC=2 (A), σC=1 (C), and σC=2 (E) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D, F) Mean values of the µC=2 (B), σC=1 (D), and σC=2 (F) parameters are plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 6.
 
The effects of viewing condition and stimulus type on stimulus-based parameters of the causal inference model. (A, C, E) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the µC=2 (A), σC=1 (C), and σC=2 (E) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D, F) Mean values of the µC=2 (B), σC=1 (D), and σC=2 (F) parameters are plotted for stimulus and viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 7.
 
The effects of viewing condition and stimulus type on the prior (pC=1) and the sensory noise (σ) parameters of the causal inference model. (A, C) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the pC=1 (A) and σ (C) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D) Mean values of the pC=1 (B) and σ (D) parameters are plotted for the stimulus and the viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
Figure 7.
 
The effects of viewing condition and stimulus type on the prior (pC=1) and the sensory noise (σ) parameters of the causal inference model. (A, C) Scatterplots showing binocular (y-axis) and monocular (x-axis) values for the pC=1 (A) and σ (C) parameters for each participant. Data points for the flash-beep condition and the speech condition are shown in green and magenta, respectively. Dashed line represents line of equality between binocular and monocular parameter values. (B, D) Mean values of the pC=1 (B) and σ (D) parameters are plotted for the stimulus and the viewing conditions (binocular [blue] and mean monocular [orange]). The error bars represent ± SEM.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×