Attentional Orienting by Non-informative Cue Is Shaped via Reinforcement Learning

Cho, Sang A.; Cho, Yang Seok

doi:10.3389/fpsyg.2019.02884

ORIGINAL RESEARCH article

Front. Psychol. , 15 January 2020

Sec. Cognition

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.02884

Attentional Orienting by Non-informative Cue Is Shaped via Reinforcement Learning

$\r\nSang A. Cho$ Sang A. Cho

Yang Seok Cho^*

Department of Psychology, Korea University, Seoul, South Korea

It has been demonstrated that a reward-associated stimulus feature captures attention involuntarily. The present study tested whether spatial attentional orienting is biased via reinforcement learning. Participants were to identify a target stimulus presented in one of two placeholders, preceded by a non-informative arrow cue at the center of the display. Importantly, reward was available when the target occurred at a location cued by a reward cue, defined as a specific color (experiments 1 and 3) or a color–direction combination (experiment 2). The attentional bias of the reward cue was significantly increased as trials progressed, resulting in a greater cue-validity effect for the reward cue than the no-reward cue. This attentional bias was still evident even when controlling for the possibility that the incentive salience of the reward cue color modulates the cue-validity effect (experiment 2) or when the reward was withdrawn after reinforcement learning (experiment 3). However, it disappeared when the reward was provided regardless of cue validity (experiment 4), implying that the reinforcement contingency between reward and attentional orienting is a critical determinant of reinforcement learning-based spatial attentional modulation. Our findings highlight that a spatial attentional bias is shaped by value via reinforcement learning.

Introduction

Our cognition selects only a small amount of information from various sensory inputs in the environment at any given moment because of limits in our cognitive capability. Thus, our attention is allocated to a specific object, location, or feature to choose information for further processing. Visual attentional allocation is accomplished in two modes of attentional orienting. One depends on top–down factors, such as the task goal or performer’s intention, which is called endogenous attentional orienting. The other mode of attentional orienting is based on bottom–up factors, such as the physical salience of stimuli (spatial or temporal discontinuity), which is referred to as exogenous attentional orienting (Posner, 1980). Particularly, the exogenous mode of orienting is known to induce an involuntary capture based on stimulus features. Theories of an exogenous attentional-orienting mechanism have established what determinant is critical for the priority of involuntary attentional processing. For example, the contingent attentional capture account suggests that attention is allocated to a stimulus containing the target-defining feature for the task at hand (e.g., Folk et al., 1992). In contrast, the salience-driven attention theory insists that attentional deployment depends on the physical salience of stimuli (e.g., Theeuwes, 1992).

Recently, reward history has been suggested as another critical factor determining involuntary attentional orienting (see Wolfe and Horowitz, 2017; Failing and Theeuwes, 2018). In Anderson et al. (2011) experiment, participants were instructed to respond to the target inked in one of two colors in a search array that consisted of heterogeneously colored stimuli. Importantly, different amounts of monetary reward were given depending on the color of the target. In a subsequent visual-search task, in which the target was defined as a shape singleton, a stimulus inked in one of the two rewarded colors was presented as a distractor. They found that the distractor elicited attentional interference even though it was irrelevant and physically non-salient. Moreover, the amount of interference increased with the size of the associated value, independent of participants’ explicit knowledge about stimulus–reward relationships. This value-driven attentional capture (VDAC) has been consistently reported under other experimental manipulations using different types of stimulus features other than color, such as auditory stimuli (Anderson, 2015a), Gabor orientations (Laurent et al., 2015), neutral exogenous cues (Failing and Theeuwes, 2014), singleton distractors (Le Pelley et al., 2015), onset distractors (Munneke et al., 2016), or scenic pictures in a visual stream (Le Pelley et al., 2017). Importantly, these considerable amounts of evidence for VDAC by different stimulus features are commonly based on the relationships between features and their associated reward (Anderson, 2013, 2016; Bucker and Theeuwes, 2017). Specifically, attention is oriented toward the stimulus feature signaling reward because of its incentive salience, which is an emotional or motivational value of stimulus features, suggesting that when one feature of a stimulus predicts a significant outcome, such as reward, the feature becomes conspicuous as a conditioned stimulus (CS), resulting in attentional orienting toward the incentive-salient feature (MacLeod et al., 1986; Berridge and Robinson, 1998; Hickey et al., 2010; Hickey and Peelen, 2015). Specifically, Berridge and Robinson suggested that incentive salience is stimulus’ perceptual and motivational features that can capture attention. Thus, the findings of those studies on the relationship between attention and reward imply that the Pavlovian association between a stimulus feature and reward induces feature-based attentional allocation.

Critically, it has been suggested that VDAC is understood as a result of reinforcement learning. Importantly, while the attentional modulation by Pavlovian learning is dependent on the association between a stimulus with a reward-signaling feature and reward, the attentional modulation by reinforcement learning depends on habitual associations between spatial attentional orienting behaviors with a given specific stimulus feature and reward (see Balleine and O’doherty, 2010; Anderson, 2016; Bourgeois et al., 2016; Failing and Theeuwes, 2018). As attentional orienting toward a high reward target stimulus is reinforced repeatedly in a learning phase, this reinforced spatial attentional bias is persistently maintained even when this orienting response is unfavorable for the task performance in a subsequent test phase (e.g., Anderson et al., 2011). Recently, Le Pelley et al. (2015) tested this possibility by adopting a visual-search experiment in which the target was defined as a shape singleton, while a color singleton distractor signaled an obtainable reward. Participants were instructed to ignore a distractor for not only fast and accurate performance but also reward acquisition. They hypothesized that if reward reinforces the spatial attentional orienting toward the target stimulus in an instrumental manner, the target would be selected primarily, but a distractor signaling high reward would be more easily ignored than a distractor signaling low reward, resulting in a less interference effect for the distractor associated with high reward than one associated with low or no reward. Interestingly, however, they found significantly greater interference with the high-reward distractor than with the low-reward distractor, consistent with the findings of other studies with a similar method (Bucker et al., 2015; Pearson et al., 2015; Munneke et al., 2016). These results suggest that the value-driven attentional bias was obtained depending on the Pavlovian association between reward and stimulus feature rather than the reinforcement of spatial attentional orienting toward the target (see Bucker and Theeuwes, 2017). However, there is a possibility that these inconsistent findings regarding the influences of reinforcement learning on spatial attentional orienting was due to the interference from the competing influence of the feature-based attention based on the stimulus–reward association. For instance, in the experiment of Le Pelley et al. (2015), the reinforced orienting was measured as the attentional orienting toward the target when the reward-signaled distractor was presented in the target display. Thus, attentional orienting toward the target based on reinforcement learning could have competed with the attentional capture by the distractor based on the stimulus feature–reward association.

Although many previous studies have investigated whether value reinforces spatial attentional orienting (Harsay et al., 2011; Padmala and Pessoa, 2011; Chelazzi et al., 2013; Sawaki et al., 2015), the results were inconsistent across studies. Some studies reported no evidence for the reinforcement of attentional orienting when the orienting deteriorates search performance (Jiang et al., 2015; Won and Leber, 2016). For example, in the visual search experiments of Jiang et al. (2015) and Won and Leber (2016), although the target was randomly presented in each quadrant of the search display, a reward was provided only when it was presented at one of the four quadrants. If reward reinforced orienting to a particular quadrant, target search should have been more efficient in the rewarded quadrant than the other non-rewarded quadrants. They found that the reward effect on the spatial attentional bias to the rewarded quadrant was negligible or null. In contrast, some other studies found evidence for spatial attentional modulation by value reinforcement (Shomstein and Johnson, 2013; Chelazzi et al., 2014; Della Libera et al., 2017; Anderson and Kim, 2018a, b; McCoy and Theeuwes, 2018). For example, in the study of Chelazzi et al. (2014), when two targets were presented simultaneously in a visual search task, the target appearing at the locations previously associated with high reward was recognized more accurately than the other target appearing at the locations previously associated with low reward, which was maintained for several days after the end of learning. This suggests that participants allocated their attention to particular locations that had been imbued with high reward more frequently even though the targets were presented at random locations.

Of importance, these findings of the value modulation of spatial attention were compatible with the idea that reward reinforces a spatial attentional bias. That is, when reward was repeatedly delivered for attentional orienting toward a particular location, attention was deployed at the rewarded location. Importantly, these findings show that the spatial attentional bias was restricted to only a specific location. For example, in the study by Chelazzi et al. (2014), high rewarded locations were confined to only two among eight locations during reinforcement learning, while other six locations were associated with intermediate or low reward. In these cases, the spatial attentional bias is possibly understood as feature-based attentional modulation as the context-dependent phenomena resulted from Pavlovian-associative learning (or context specificity of Pavlovian learning). It indicated that the CS became conspicuous when the CS appeared within a particular context, in which Pavlovian conditioning has occurred (Lovibond et al., 1984; Bouton, 1993). Indeed, previous studies demonstrated that VDAC by a particular stimulus feature was modulated by the context information specified within a spatial context (Anderson, 2015b) or irrelevant background images in the target search display (Anderson, 2015c). Therefore, the Pavlovian learning can serve as possible explanations to these findings in that attentional allocation to the stimulus feature at a particular location was prioritized based on the association between reward and location.

The main purpose of the present study was to investigate whether spatial attentional orienting behaviors are implicitly strengthened by reinforcement learning. As reviewed above, previous literature demonstrated that attention is captured by a reward-signaling stimulus specified by a certain feature (e.g., Anderson et al., 2011; Le Pelley et al., 2015) or by a specific location (e.g., Chelazzi et al., 2014). Importantly, in these findings, two types of attentional modulation, feature-based attentional modulation based on Pavlovian learning and spatial attentional modulation based on reinforcement learning, were competed (e.g., Le Pelley et al., 2015) or at least confounded (e.g., Anderson et al., 2011; Chelazzi et al., 2014). Note that the attentional bias toward the reward-associated feature is a type of sign-tracking response, which refers to an approach response toward and engagement with the reward-predictive CS itself (Day and Carelli, 2007; Robinson and Flagel, 2009). Thus, the direction of the feature-based attentional modulation by reward is toward the location of the CS. In contrast, the reinforcement learning on spatial attentional orienting refers to the increased occurrence of attentional orienting as reward reinforces the orienting response. Critically, the direction of the reinforced attentional response is not determined by the location of the reward-signaling stimulus but by what attentional response is sufficiently reinforced. In other words, one of the important differences between the two types of attentional modulations is the direction of attentional orienting in that the orienting based on the Pavlovian association biases toward the location of the reward-signaling cue, whereas attentional orienting biases toward any location if the orienting response is strengthened based on reinforcement learning.

Therefore, the location of a reward-signaling feature and the direction of spatial attentional orienting should be separated to dissociate the spatial-based attentional modulation from the feature-based attentional modulation. For this, we adopted a non-informative central arrow in a modified version of the Posner cueing paradigm. Note that a non-informative symbolic cue (e.g., arrow) has been demonstrated to induce reflexive attentional orienting toward the arrow-pointed location, independent of exogenous and/or endogenous attentional control (Hommel et al., 2001; Tipples, 2002; Ristic and Kingstone, 2006, 2012). Thus, reward is expected to reinforce a particular spatial attentional response induced by a symbolic stimulus, which is understood as the reinforcement on a stimulus-response habit via reinforcement learning (Balleine and O’doherty, 2010). To reinforce the spatial attentional orienting selectively, reward delivery was contingent with the spatial attentional orienting toward the cued location, which was specified by the central arrow cue pointing to the left or right side of display. Specifically, participants were instructed to identify a target letter (L or T) presented at the left or right side of the display. Before the target presentation, the irrelevant non-informative arrow cue appeared at the center of display. There were two types of cues, reward and no reward, which were specified by the color of the arrow cue, such as red or green (experiments 1 and 3), or the combination of color and direction of the arrow cue, such as right-pointing red cue, left-pointing green cue, or vice versa (experiment 2). Importantly, reward was given only when the reward cue pointed validly to the location of the upcoming target, but no reward was provided when it was invalid or when the no-reward cue was presented regardless of its validity.

To observe a spatial attentional bias, the cue validity was measured with response time (RT) and percent error (PE) as indices of attentional orienting (Posner, 1980). Critically, if spatial attentional orienting is reinforced by reward, the reward cue would elicit a significantly greater cue-validity effect than the no-reward cue, even though these cues were irrelevant to the task. In addition, the reward modulation of the cue-validity effect would be gradually manifested as learning progresses, like the modulation of feature-based attentional allocation by Pavlovian learning (Anderson et al., 2014; Failing and Theeuwes, 2014; Le Pelley et al., 2015). Thus, to observe development of attentional modulation based on reinforcement learning, the cue-validity effect for the reward cue was compared to the no-reward cue in each block separately (experiments 1–4). If value reinforces an attentional orienting bias to the side corresponding to the direction of the reward cue, the cue-validity effect with the reward cue would increase as the block continued, whereas the effect for the no-reward cue would be not be changed across blocks. In addition, to examine whether the reinforced orienting persists after the reward was omitted, the cue-validity effect for each cue type was measured in a subsequent test phase after learning (experiments 3 and 4). Furthermore, to investigate whether the value-based spatial attentional bias was induced even without the reinforcement contingency between value and attentional orienting, reward was delivered regardless of cue validity in experiment 4. If the attentional bias to the cued location depends simply on the incentive salience of the cue based on the association between reward and cue feature, rather than the reinforcement contingency between reward and attentional orienting, the attentional bias would be greater for the reward cue than the no-reward cue.

Experiment 1

Experiment 1 examined whether the reward reinforces spatial attentional orienting with a non-informative central cue. A leftward or rightward arrow, colored red or green, was used as a cue. One of the two cue types was associated with reward to reinforce attentional orienting to its directing location. Specifically, participants were instructed to search for the target letter at one of two placeholders. Before the presentation of the target letter, a red or green arrow pointing to the left or right placeholder was presented at the center of the display. To reinforce attentional orienting, reward points were given when the target was presented at the location pointed to by one color cue (reward cue), but no reward was given when the other color cue (no-reward cue) was presented, regardless of its validity.

If attentional orienting by a cue is reinforced by reward, the cue-validity effect would be larger for the reward cue than the no-reward cue. In addition, to monitor the progression of reinforcement learning on attentional orienting, the cue-validity effect for each cue type was monitored in each block. According to previous studies (Anderson et al., 2014; Failing and Theeuwes, 2014; Le Pelley et al., 2015), the influence of the reward association on performance would be relatively manifest in the latter blocks of trials.

Materials and Methods

Participants

To determine proper sample sizes, we used G Power 3.1 (Faul et al., 2007). To estimate the difference of the cue-validity effect between the reward cue and the no-reward cue, we assessed the effect size from previous studies that tested the attentional orienting depending on the stimulus features (Lien et al., 2010) and the reward effect on attentional orienting (Failing and Theeuwes, 2014). The effect sizes in these studies (based on reported η_p² of the effects) ranged between 0.413 and 0.529. Accordingly, power analyses for within-sample analysis of variance (ANOVA) using a power of 0.95 and an alpha level of 0.05 yielded an appropriate sample size (n) between 14 and 21.

Sixteen undergraduate students (mean age = 23.4 years; 6 female and 10 male) were recruited from Korea University and provided with a monetary reward of KRW 6,000 (approximately 5 USD). To encourage their performance and earning of reward points, an incentive reward of KRW 1,000 (approximately 0.8 USD) was given when their response accuracy was above 80%. They were told that the sum of reward points should exceed a particular point for the additional incentive, but the amount was not explicitly stated to maintain their motivation to earn as many points as possible. All participants had normal or corrected-to-normal visual acuity and normal color vision by self-report. The current and following experiments were approved by the Institutional Review Board at Korea University (KU-IRB-16-138-A-1).

Apparatus

All experiments were programmed and presented using E-prime software (Version 2.0, Psychology Software Tools, Inc.). Stimuli were presented on a cathode ray tube monitor (17 in.) of a personal computer. Participants viewed the monitor from a distance of ∼60 cm in a dimly lit room. Responses were collected using a standard computer keyboard.

Stimuli

All stimuli were presented on a black background. Each trial consisted of a fixation display, a cue display, a target display, a feedback display, and a reward information display. In the fixation display, a white fixation cross (0.7° × 0.7° in visual angle) was presented at the center of the display, and two placeholders (2.1° × 2.1°) drawn with white lines (0.6°) were located at each side from the center of display (1.2°). In the cue display, an arrow stimulus (1.4° × 1.2°) colored red (R = 255, G = 0, B = 0; CIE color coordinates, x = 0.581, y = 0.346) or green (R = 0, G = 255, B = 0; CIE color coordinates, x = 0.285, y = 0.599) was presented at the center of the display, between the two placeholders, pointing to the left or right placeholder randomly (i.e., 50%). After another fixation display, the target display was presented; it consisted of a fixation cross at the center and two letters, each of which appeared in a placeholder. The target was defined as the letters L or T, and the non-target was a letter randomly selected from U, H, N, or E. Each character, in white Arial font, was 0.75° in width and 0.9° in height. The feedback display notified participants that their response was correct by presenting written feedback, “ www.frontiersin.org ” (“Correct” in Korean), but for an incorrect response, a 1,000 Hz tone was sounded for 500 ms. The reward display informed participants on how many reward points they had earned on a given trial and the total amount of accumulated reward.

Procedure

Participants performed 16 practice trials followed by 512 experimental trials in 4 blocks of 128 trials. Each trial began with the fixation display for a random interval of 600, 800, or 1,000 ms. After the fixation display, the cue display was presented for 350 ms. The fixation display reappeared for 150 ms followed by the target display for 350 ms, and a blank display was presented until a response was made or for 1,000 ms. The feedback display appeared for 550 ms, and then, the reward information display was presented for 600 ms. After the delay of 200 ms, the next trial started.

Participants were instructed to identify the target letter, L or T, appearing inside one of the two placeholders and to ignore any other letter. Half of them were instructed to press the “f” key of a standard computer keyboard to the letter “L” and the “j” key to the “T,” and the other half were given the opposite target-key mappings. Because the direction of an arrow cue predicted the location of the target only 50% of the time – that is, randomly – it was a non-informative cue. Points were given (e.g., 0 or 50 points) as a reward, and participants were informed that the accumulated reward points should be maximized to obtain a monetary incentive. The participants were naive to when the reward was given, but they were instructed that the response had to be correct, at least, to earn reward scores. However, a reward score was obtainable for the correct response only when one of the two color arrow cues pointed to the target location validly (Figure 1). For example, 50 points were given when the participants made a correct response to a target that appeared in the placeholder to which a red arrow (reward cue) pointed, whereas no reward was given on the trials with the green arrow (no-reward cue). However, whenever a response was incorrect, 5 points were deducted from the total amount of accumulated reward, regardless of cue type and cue validity.

FIGURE 1

Figure 1. An example of a trial sequence in experiment 1.

At the end of the experiment, participants were asked to complete an exit questionnaire to examine their explicit awareness about reward acquisition (e.g., Was there any specific type of trials for earning reward?) and how they perceived the predictability of the cue about the target location (e.g., How accurately did the cues predict the target location?) by open-ended questions. They did not see the questions before their participation.

Design

Cue type, cue color, cue direction, target location, and target letter were fully crossed and counterbalanced across participants. These types of trials were randomly presented. Mappings between target letter and response were also counterbalanced across participants.

Results

Trials were excluded from analyses if RTs were shorter than 150 ms or longer than three standard deviations above the participants’ mean (2.06%), and only correct trials were included in RT analyses. Mean correct RTs and PEs were calculated for each participant as a function of block (first, second, third, or fourth block), cue type (reward cue or no-reward cue), and cue validity (valid or invalid). Repeated-measures ANOVAs were conducted on the mean RT and PE data, with those as within-subject factors.

RT

The overall mean RT was 508 ms. The main effect of cue type was significant, F(1, 15) = 6.66, p = 0.021, MSE = 228, $η_{p}^{2}$ = 0.308, showing that the mean RT was greater for reward cue (M = 511 ms) than for no-reward cue (M = 506 ms). The main effect of cue validity was also significant, F(1, 15) = 6.99, p = 0.018, MSE = 4,555, $η_{p}^{2}$ = 0.318, indicating that the mean RT was shorter on valid trials (M = 497 ms) than that on invalid trials (M = 519 ms). The interaction of block and cue type was significant, F(3, 45) = 3.38, p = 0.026, MSE = 259, η_p² = 0.184, suggesting that the mean RT for reward cue (Ms = 516, 514, 508, and 505 ms in the first to fourth blocks, respectively) was greater than that for no-reward cue (Ms = 522, 506, 499, and 495 ms in the first to fourth blocks, respectively) in the second and fourth blocks, but not in first and third blocks. More importantly, the interaction of cue type and cue validity was significant, F(1, 15) = 5.54, p = 0.033, MSE = 711, $η_{p}^{2}$ = 0.270. Separate analyses showed that a significant cue-validity effect was obtained for reward cue (30 ms), F(1, 15) = 8.69, p = 0.01, MSE = 3,345, $η_{p}^{2}$ = 0.367, whereas there was no significant effect for the no-reward cue (14 ms), F(1, 15) = 3.49, p = 0.081, MSE = 1,920. The interaction of block, cue type, and cue validity was not significant, F(1, 15) = 1.37, p = 0.26, MSE = 14, $η_{p}^{2}$ = 0.084, indicating that the interaction of cue type and cue validity in each block did not significantly differ.

PE

The overall PE was 1.65%. No main effect or interaction was significant, Fs < 1.

Questionnaire

The responses from the exit questionnaire revealed that more than half of the participants (14 out of the 16) failed to notice the relationship between the reward availability and the cue type. All participants reported that the perceived predictability of the central arrow regarding the target location was random (e.g., 50%).

Discussion

Experiment 1 demonstrated that reward reinforced spatial attentional orienting by a central cue, resulting in a significantly larger cue-validity effect for the reward cue (30 ms) than that for the no-reward cue (14 ms), shown in Figure 2A. Although these asymmetric cue-validity effects did not differ across blocks statistically, the amount of the cue-validity effect for the reward cue tended to increase from the first block (16 ms) to the fourth block (33 ms), whereas that for the no-reward cue did not (17 ms in the first block and 11 ms in the fourth block), shown in Figure 2B. It is possible that the attentional modulation by reward emerged quite early in learning so that the block effect failed to modulate the interaction between cue type and validity.

FIGURE 2

Figure 2. (A) Mean reaction time (in milliseconds) as a function of cue type and validity in experiment 1. (B) Mean reaction time (in milliseconds) as a function of block, cue type, and validity in experiment 1. Error bars ± within-subject standard error of the mean (Cousineau, 2005).

Experiment 2

Experiment 1 showed that the central non-predictive cue strengthened a spatial attentional orienting on a basis of reinforcement learning. However, there is a possibility that this finding was simply resulted from the association between cue color and reward, rather than the reinforcement learning of spatial attentional orienting. Specifically, because the reward cue color became relatively incentive salient, it might have led attention toward its pointed lateral location. Thus, experiment 2 was conducted to examine whether the reward modulation of spatial attentional orienting obtained in experiment 1 resulted from the increased salience of the cue color rather than from the reinforcement learning of attentional orienting.

For this, the reward delivery depended on the combination of cue color and direction, rather than solely on cue color, in experiment 2. For example, reward was delivered for the correct response on valid trials only when the red arrow pointed to the left side or the green arrow pointed to the right side, whereas no reward was given for the correct response on valid trials when the cue had the opposite combinations of the features. Thus, since both colors were associated with the same amount of reward, the cue color itself did not signal reward. If the modulation of attentional orienting by reward found in experiment 1 was based simply on the association between cue color and reward, a similar amount of attentional bias would occur regardless of cue types. If the attentional bias to a specific location cued by a specific color of the central cue is reinforced by reward, the reward cues would induce a cue-validity effect, but the no-reward cues would not.