Performance Analysis With Different Types of Visual Stimuli in a BCI-Based Speller Under an RSVP Paradigm

Ron-Angevin, Ricardo; Medina-Juliá, M. Teresa; Fernández-Rodríguez, Álvaro; Velasco-Álvarez, Francisco; Andre, Jean-Marc; Lespinet-Najib, Veronique; Garcia, Liliana

doi:10.3389/fncom.2020.587702

ORIGINAL RESEARCH article

Front. Comput. Neurosci. , 05 January 2021

Volume 14 - 2020 | https://doi.org/10.3389/fncom.2020.587702

This article is part of the Research Topic Discrimination of Genuine and Posed Facial Expressions of Emotion View all 10 articles

Performance Analysis With Different Types of Visual Stimuli in a BCI-Based Speller Under an RSVP Paradigm

$\nRicardo Ron-Angevin$ Ricardo Ron-Angevin¹^*

M. Teresa Medina-Juliá¹

Álvaro Fernández-Rodríguez¹

Francisco Velasco-Álvarez¹

Jean-Marc Andre²

Veronique Lespinet-Najib²

Liliana Garcia²

¹UMA-BCI Group, Departamento de Tecnología Electrónica, Universidad de Málaga, Malaga, Spain
²Laboratoire IMS, CNRS UMR5218, Cognitique Team, Bordeaux INP-ENSC, Talence, France

Brain-Computer Interface (BCI) systems enable an alternative communication channel for severely-motor disabled patients to interact with their environment using no muscular movements. In recent years, the importance of research into non-gaze dependent brain-computer interface paradigms has been increasing, in contrast to the most frequently studied BCI-based speller paradigm (i.e., row-column presentation, RCP). Several visual modifications that have already been validated under the RCP paradigm for communication purposes have not been validated under the most extended non-gaze dependent rapid serial visual presentation (RSVP) paradigm. Thus, in the present study, three different sets of stimuli were assessed under RSVP, with the following communication features: white letters (WL), famous faces (FF), neutral pictures (NP). Eleven healthy subjects participated in this experiment, in which the subjects had to go through a calibration phase, an online phase and, finally, a subjective questionnaire completion phase. The results showed that the FF and NP stimuli promoted better performance in the calibration and online phases, being slightly better in the FF paradigm. Regarding the subjective questionnaires, again both FF and NP were preferred by the participants in contrast to the WL stimuli, but this time the NP stimuli scored slightly higher. These findings suggest that the use of FF and NP for RSVP-based spellers could be beneficial to increase information transfer rate in comparison to the most frequently used letter-based stimuli and could represent a promising communication system for individuals with altered ocular-motor function.

Introduction

Brain computer interfaces (BCI) was first described by Vidal (1973) as a man-computer dialogue using observable and controllable neuroelectric events. That is, BCIs are a type of system that allow users to interact with their environment, using no muscular movements but only their brain activity (Nicolas-Alonso and Gomez-Gil, 2012). Therefore, these systems serve as a last communication channel between severely motor-disabled patients, such as amyotrophic lateral sclerosis (ALS) patients or those with brainstem injuries in a locked-in state (LIS), and their environment.

The most frequently control signal BCI systems use is the brain bioelectricity recorded through electroencephalography (EEG) (Nicolas-Alonso and Gomez-Gil, 2012; Rezeika et al., 2018). The EEG data is processed, and different brain components could be studied depending on the stimulus type and system that is desired to be controlled. The most typical components used in BCI are steady-state visual evoked potentials (SSVEP), event-related potentials (ERP) and sensorimotor rhythms (SMR) (Nicolas-Alonso and Gomez-Gil, 2012). The present study will focus on ERP components, which are evoked after the appearance of an infrequent stimulus. The most studied component of this type is the P300 component, which was first discovered by Sutton et al. (1965) and described as a positive amplitude waveform alteration that reaches peak amplitude at about 300 ms after a sensory stimulus. This potential is mostly recorded in the parietal area (Polich, 2007).

This P300 component is usually employed as a control signal for a type of BCI system which is called a virtual speller (Rezeika et al., 2018). The first P300 based BCI speller was proposed by Farwell and Donchin (1988). This speller consisted of a 6 × 6 matrix table of letters and numbers, whose rows and columns were highlighted (i.e., the characters color turned from gray to white) pseudorandomly in order to evoke the P300 component each time the target character was highlighted. As a consequence, this BCI speller presentation paradigm is called row-column paradigm (RCP). On the other hand, to consistently elicit and classify the P300 component, users are often asked to focus their attention on their desired target letter and count the number of times it flashes, and a classification algorithm differentiates the target letter between many non-targets. Other temporal components generated earlier or later to P300 (P100, N170, N250, N400) are equally analyzed to detect stimuli features (Zheng et al., 2012; Jiang et al., 2017; Tian et al., 2018).

Different variations on the highlighting type and nature of the characters have been studied, such as the shape, color and size of the characters (Salvaris and Sepulveda, 2009; Ryan et al., 2017; Fernández-Rodríguez et al., 2019b) in order to improve system performance (classification accuracy, information transfer rate or ERP amplitude). Regarding the nature of stimuli, it has been demonstrated that the presentation of famous faces (FF) instead of letters leads to an improvement in performance (Kaufmann et al., 2011; Li et al., 2015). Other set of images, such as neutral images, might also help to increase performance as compared to letters (Fernández-Rodríguez et al., 2019a). Moreover, the study of Kellicut-Jones and Sellers (2018) suggests that the FF paradigm might not be significantly better than neutral images in RCP. On the other hand, in the single character presentation (SCP) paradigm –which consists of illuminating the matrix stimuli one by one– the use of faces (non-famous) seemed to increase performance as compared to neutral images (inanimate objects) (Zhao et al., 2011). Nevertheless, these study results should be carefully considered as they are derived from a small sample size. Even though this study is not completely adequate, these findings might suggest that a difference in performance might exist depending on the stimulus presentation paradigm used, in particular when applying FF and neutral images.

The stimulus presentation paradigms RCP and SCP present their stimuli in different locations of the monitor screen, but RCP presents them by row and column groups, and SCP, individually. However, this type of presentation paradigm might not be the most suitable for some patients with motor disabilities who also have no or residual ocular mobility, as the performance of these paradigms is greatly decreased under covert attention conditions (Brunner et al., 2010; Treder and Blankertz, 2010). Different type of visual gaze-independent BCIs have been researched by the literature in order to prevent this limitation. According to the BCI-Spellers review by Rezeika et al. (2018), two groups of main gaze-independent spellers have been proposed by previous literature: (i) those that display the stimuli to be selected in different close positions to control the speller under covert attention, such as Chroma Speller (Acqualagna et al., 2013), Geospell (Aloise et al., 2012), Gaze-Independent Block Speller (GIBS) (Pires et al., 2011) and Hex-O-Spell (Treder and Blankertz, 2010); or (ii) those based on rapid serial visual presentation (RSVP), which sequentially presents stimuli in the center of the screen (Acqualagna and Blankertz, 2013). The authors of this review stated that the RSVP-based BCIs show promising results and have been the most widely used to date.

Different visual configurations of the stimuli under RSVP had also been studied in the literature to increase the system performance for different applications like face recognition or RSVP spellers (Lees et al., 2018). In a recent study, Chen et al. (2016) tested if the characteristics of the stimuli can affect the performance of the system using colored balls, gray dummy faces and colored dummy faces. For each paradigm, six different stimuli were presented (six colors and six dummy face expressions). They found that the combination of colors and dummy face expressions could improve the bit rate. Regarding RSVP spellers, a previous study found a trend in which using colors and different capitalizations might improve the accuracy and bit rate compared to black letters (Acqualagna and Blankertz, 2013). Furthermore, the study of Won et al. (2018) proposed a RSVP speller whose colored stimuli were placed in six different near central positions. They found that using different locations for the letters increased the accuracy of the system in contrast to the classical RSVP paradigm.

Nevertheless, studies regarding the nature of the stimuli under RSVP have barely been carried out for communication purposes (i.e., RSVP spellers). In a preliminary study, neutral images and letters were compared in RSVP (Fernández-Rodríguez et al., 2019c). The results of this work showed that neutral images did not offer significant benefits as compared to letters under the RSVP paradigm. In the same way, to our knowledge, any studies regarding RSVP have compared FF to letters and it would be interesting to determine the efficacy of FF under a RSVP paradigm. However, the results of Fernández-Rodríguez et al. (2019c) should also be carefully considered as a small sample size was applied and no metrics regarding the user experience were considered (such as fatigue, preference and control). To better understand the effect of this sort of stimuli when applying an RSVP paradigm, an extended and complete study regarding neutral images and FF against letters should be carried out.

We hypothesized that using alternative stimuli under RSVP –i.e., famous faces and neutral pictures– instead of letters would increase system performance and user experience of the RSVP-based spellers, as previously demonstrated in the RCP and SCP presentation paradigms. Therefore, the aim of this study was to compare and evaluate the performance of three different types of stimuli (letters, famous faces and neutral pictures) as feasible communication stimuli for a gaze-independent BCI speller. The evaluation was carried out in terms of objective parameters (specifically, accuracy, information transfer rate and brain waveform analysis) and a subjective questionnaire regarding the perception of the participants. The main contribution of this study would be to experimentally (in)validate the usability of alternative stimuli under the RSVP paradigm for communication purposes.

Methods

Participants

Eleven French participants (aged 19.91 ± 0.83) took part in the present study. None of the participants had previous experience in the use of BCI systems. The study was approved by the Ethics Committee of the University of Malaga and met the ethical standards of the Helsinki Declaration. According to self-reports, none of the participants had any history of neurological or psychiatric illness. In addition, all of them provided written consent trough a protocol reviewed by the ENSC-IMS (Ecole Nationale Supérieur de Cognitique – Intégration du Matériau su Système) Cognitive and UMA-BCI teams.

Data Acquisition and Signal Processing

The EEG was recorded using the electrode positions: Fz, Cz, Pz, Oz, P3, P4, PO7, and PO8, according to the 10/20 international system. All channels were referenced to the right earlobe, using FPz as the ground.

The EEG was amplified through a 16 channel biosignal amplifier gUSBamp (Guger Technologies). The amplifier settings were from 0.5 to 100 Hz for the band-pass filter, the notch (50 Hz) was on, and the sensitivity was 500 μV. The signal was then digitized at a rate of 256 Hz. EEG data collection and processing were controlled by the UMA-BCI Speller software (Velasco-Álvarez et al., 2019), which serves as the front-end to BCI2000 (Schalk et al., 2004). Likewise, when the brain signal was recorded by the UMA-BCI Speller, a pass-band filter from 0.1 to 60 Hz was applied, and the notch filter was on at 60 Hz.

A stepwise linear discriminant analysis (SWLDA) of the data was performed to obtain the weights for the P300 classifier and calculate the accuracy. Alternative classification methods of the EEG signal have been proposed by the literature (Lotte et al., 2018; Xiao et al., 2020), however the SWLDA algorithm has been widely used and validated (Krusienski et al., 2008; Lees et al., 2018). Furthermore, this last is the algorithm that BCI2000 software, and thus the UMA-BCI Speller, has implemented.

According to the specifications described in the Wiki page of BCI2000¹, the EEG channels used and their respective weights in the classification matrix are dependent of specific parameters of the user. The different ERP components are commonly found in certain brain zones and certain latencies; but when analyzed particularly for each user, the specific channels and latencies may be different from one another (Luck, 2014). These weights are calculated in the calibration task. The time frame considered to train the classifier was from 0 to 800 ms after the onset of a stimulus (target or non-target). Note that the selection of the channels and calculation of the classification weights were automatically done by the classifier that the BCI2000 software has implemented.

Spelling Paradigms

Three different RSVP paradigms were evaluated in the present work. The only difference between paradigms was the type of stimulus used: (i) white letters (WL), (ii) famous faces (FF), and (iii) neutral pictures (NP) (Figure 1).

FIGURE 1

Figure 1. (A) RSVP paradigm over time with the Famous Faces (FF) interface as an example; (B) Example of a stimulus representation with its equivalent and corresponding white letter (WL, “S”), famous face (FF, “Shakira”) and neutral picture (NP, “Seau”). Note that due to copyright reasons, the images presented are pixelated in this figure.

Each paradigm presented nine different stimuli (Table 1). In the WL paradigm, the letters used were A, B, C, E, L, M, O, R, and S. On the other hand, each character in the FF stimuli was chosen so that the character's name or surname had to start with the same letter as the one used in the WL paradigm (e.g., W. Allen for the letter A, or Beyoncé for the letter B). Finally, for the NP stimuli, the criterion was the same: the picture had to start, in French, with the same letter as the one used in the WL paradigm (e.g., the picture of a tree –arbre, in French– for the letter A, or a boat –bateau, in French– for the letter B). The relationship between each stimulus and image (face or picture) was explicitly declared by the research staff to participants in order to avoid any mistake. See Table 1 for the letters and their corresponding image names (face and picture). The images used in the experiment are not shown in this paper due to copyright reasons.

TABLE 1

Table 1. Presentation of the nine stimuli names contained in each set of stimuli.

The number of elements was selected in order to avoid a target selection time that was too long, as the aim of this study was to validate the different sets of stimuli under RSVP for communication purposes. In previous studies with this kind of paradigm, an even smaller number of elements has been used to validate hypotheses (Chen et al., 2016; Fernández-Rodríguez et al., 2019c).

The duration of each stimulus presentation was equal to 187.5 ms and the inter stimuli interval (ISI) was equal to 93.75 ms. Therefore, the stimulus onset asynchrony (SOA) had a duration equal to 281.25 ms. The time for completing a sequence (i.e., single presentation or flashing of every stimuli) was 2.44 s. The pause time between one selection and the start of the next (i.e., between completed sets of sequences) was equal to 5 s.

The flashing stimuli were presented in the center of the screen. The dimensions regarding the type of stimuli were as follows: letters, around 3 × 4 cm; faces, around 6 × 8.5 cm; and Pictures, around 12 × 8.5 cm.

Procedure

A within-subject design was used, so that all users went through all experimental conditions. The experiment was carried out in one session. The order of the paradigms was counterbalanced across participants in order to prevent any undesired effects, such as learning or fatigue. Each condition consisted of two parts: (i) an initial calibration task to obtain the specific signal patterns associated with each user and (ii) an online task in which the user actually controlled the interface. Therefore, the main difference between both tasks was that in the first task the user did not receive any feedback.

For both phases, the task was to write different four-letter words. In the case of the calibration phase, the participant had to write two French words (“MARE,” pond in English, and “CLOS,” enclosed plot in English), so the total number of selections for this task was 8 letters. On the other hand, for the online phase, the user had to write three French words (“MALE,” male in English, “ROSE,” pink in English, and “BOLS,” bowl in English), so the number of selections would be 12 letters. Participants were told during the pause between selections which image (famous face or neutral picture) or letter they have to focus on in the next run. They were not asked to memorize the sets of stimuli used in the experiment (letter, face and picture related) as the purpose of this study was to test the effect of this type of stimuli in a preliminary RSVP-based speller. A short break between words (variable at the request of the user) was employed. The number of sequences (i.e., the number of times that each stimulus –target and non-target– was presented) was pre-fixed to 6 in the calibration task and adapted in the online phase depending on the user performance in the calibration phase. The number of sequences selected for the online task was two trials more than the minimum number of trials required to obtain 100% accuracy in the calibration phase.

At the end of the session, the user had to complete a questionnaire regarding his/her experience during the control of the paradigm.

Evaluation

Four parameters were used to evaluate the effect of the RSVP paradigm and stimulus type on the performance: (i) the accuracy in the calibration and online phases, (ii) the information transfer rate (ITR) (Wolpaw et al., 1998) in the calibration and online phases, (iii) the analysis of the event-related waveform during the calibration phase, and (iv) a subjective questionnaire.

Accuracy was defined as the number of correctly predicted selections divided by the total number of predicted selections, multiplied by 100. While for the online task this last definition was applied, for the calibration phase, the accuracy percentage was computed by the signal classifier after the classification of the word using the data from each sequence. The SWLDA classification algorithm applied was the one proposed by BCI2000.

The ITR (bits/min) is an objective measure to determine the communication speed of the system. This parameter considers accuracy, the number of elements available in the interface and time to select one element:

I T R = \frac{{l o g}_{2} N + P {l o g}_{2} P + (1 - P) {l o g}_{2} \frac{1 - P}{N - 1}}{T}

where P is the accuracy of the system, N is the number of elements available at the interface and T is the time needed to complete a trial (i.e., select an element).

The ITR was calculated similarly to the accuracy for both the calibration and the online tasks. It should be noted that the pause between selections was not considered when calculating the ITR.

The grand average of the ERP waveforms (from 0 to 800 ms) was analyzed in order to evaluate how the three different stimuli types affected the waveforms of the target, non-target and amplitude difference between target and non-target stimuli. In addition, to carry out a more exhaustive analysis concerning the ERP components frequently used in a BCI, a grand average topography was also carried out for target, non-target and amplitude difference between target and non-target stimuli. Next components were included in the topographical analyses: P100 (60–110 ms), N170 (110–180 ms), P300 (450–520 ms), and N400 (520–570 ms). These topographical maps were statistically compared between conditions. The interval time for each component were chosen according to previous literature and the specific EEG signal obtained in the present study (e.g., Tanaka, 2018; Mijani et al., 2019), except for the N400 component which was selected only according the EEG signal obtained. This last issue is discussed in the Discussion section.

To perform these analyses (i.e., comparison between conditions regarding the grand average of ERP waveforms and the grand average of topography), a baseline from −200 to 0 ms was used for the electrodes, and this was low-pass filtered at 30 Hz. Statistical analyses were carried out using EEGLAB (Delorme and Makeig, 2004), with which a false discovery rate (FDR) correction was applied.

Finally, a subjective questionnaire –specially configured for this experiment– was applied to investigate the experience of the users during the control of the spellers. This questionnaire required that the users scored the different conditions from 0 to 10 using a visual analog scale (VAS) according to the following dimensions: level of fatigue (fatigue), complexity of use (complex), level of speed felt during presentation of the stimuli (speed) and level of stress (stress). Where 0 is the lowest value and 10 is the highest for the fatigue, complex and stress dimensions. For the case of the speed dimension, 0 would mean that the interface had an adequate speed of stimuli presentation, and 10 would mean that the speed of the stimuli presentation was too fast.

Results

In this section, the different results are presented in different sections. First, the results of the calibration task (i.e., performance metrics and ERP waveforms) are presented, followed by the performance metrics of the online phase, and finally, the subjective questionnaire analysis.

Calibration Task

Performance Metrics

In order to find out if there were any significant differences between the different conditions, a Student's t-test was performed for repeated samples for each of the sequences. The accuracy (Figure 2) did not show significant differences for any sequence. However, the variable ITR (Figure 3) showed significant differences for the first sequence between conditions WL and NP [t₍₁₀₎ = 2.24; p = 0.049] (Supplementary Table 1). Likewise, some marginally significant differences were revealed when the average accuracy and ITR of all sequences were calculated (WL, 90.74 ± 5.44% and 20.95 ± 3.88 bits/min; FF, 94.32 ± 3.5% and 23.75 ± 3.34 bits/min; NP, 93.39 ± 4.36% and 23.23 ± 4 bits/min). Specifically, WL was observed to offer a marginally significant worst performance than FF [accuracy, t₍₁₀₎ = 2.161; p = 0.056; ITR, t₍₁₀₎ = 2.175; p = 0.055] and NP [ITR, t₍₁₀₎ = 1.89; p = 0.088].

FIGURE 2

Figure 2. Accuracy (mean ± standard error) of each condition (WL, white letters; FF, familiar faces; NP, neutral pictures) as a function of the number of sequences during the calibration task.

FIGURE 3

Figure 3. Information transfer rate (ITR, mean ± standard error) of each condition (WL, white letters; FF, familiar faces; NP, neutral pictures) as a function of the number of sequences during the calibration task.

ERP Waveform

Regarding the grand average ERP waveform, the statistical analyses showed significant differences between conditions at an early time interval (around 80–110 ms) for target stimuli in Cz and PO7 (Figure 4).

FIGURE 4

Figure 4. Grand average ERP waveform for target, non-target and amplitude difference between target and non-target stimuli signals in all used channels (Fz, Cz, Pz, Oz, P3, P4, PO7, and PO8) for the three conditions: white letters (WL), familiar faces (FF), and neutral pictures (NP). These plots were obtained from the EEG data recorded during the calibration phase.

On the other hand, regarding the grand average topography of the P100, N170, P300, and N400 components, only the P100 (60–110 ms) component showed significant differences in channels Cz, Pz, Oz, P3, P4, PO7, and PO8 for the target stimuli (Figure 5). These differences could indicate a difference in early processing depending on the type of visual stimulus. Specifically, it appeared that the FF condition obtained lower grand average ERP amplitude values than those obtained by the WL and NP conditions (Figure 4).

FIGURE 5

Figure 5. Topographical scalp map of each condition (WL, white letters; FF, familiar faces; NP, neutral pictures) for the next components: P100 (60–110 ms), N170 (110–180 ms), P300 (450–520 ms), and N400 (520–570 ms). These plots were obtained from the EEG data recorded during the calibration phase.

Online Task

The accuracy and ITR results achieved, as well as the number of sequences used by each participant in the online task, are shown in Table 2. In regard to the accuracy obtained for the online task (second main column of Table 2), the Student's t-test between conditions showed no significant differences between NP and the rest of conditions [NP vs. WL, t₍₁₀₎ = 0.183; p = 0.859; NP vs. FF, t₍₁₀₎ = 0.957; p = 0.361]. However, a comparison between WL and FF conditions showed a trend close to significance [t₍₁₀₎ = 1.961; p = 0.078]. On the other hand, for the ITR (third main column of Table 2), significant differences were found between the WL and FF conditions [t₍₁₀₎ = 2.973; p = 0.014], but not between WL and NP [t₍₁₀₎ = 0.595; p = 0.565] nor NP and FF [t₍₁₀₎ = 1.261; p = 0.236].

TABLE 2

Table 2. Accuracy (%), ITR (bits/min) and number of sequences used (mean ± standard deviation) of the three conditions (WL, white letters; FF, familiar faces; NP, neutral pictures) during the online task.

Subjective Questionnaires

With reference to the results obtained in the questionnaire (Figure 6), the condition NP, compared to WL, was found to be associated with significantly less fatigue [t₍₁₀₎ = 2.262; p = 0.047] and more appropriate speed presentation –speed– [t₍₁₀₎ = 3.13; p = 0.011]. Note that the comparison of FF and NP was near nominal significance for speed [t₍₁₀₎ = 2.085; p = 0.064]. All statistical comparisons made between conditions for the subjective questionnaires can be observed in Supplementary Table 2.

FIGURE 6

Figure 6. Scores (mean ± standard error) of each condition (WL, white letters; FF, familiar faces; NP, neutral pictures) for the variables collected in the subjective questionnaire.

Discussion

In this study we tested different kinds of stimuli –white letters (WL), famous faces (FF) and neutral picture (NP)– under a rapid serial visual presentation (RSVP) paradigm to analyse the system performance in terms of classification accuracy, information transfer rate (ITR), ERP waveform and user experience (fatigue, complexity, speed, and stress level). Main results showed that FF and NP might produce, respectively, better performance and better user experience compared to WL. These results suggest that the stimuli proposed (FF and NP) could enhance the system performance, and thus communication, of this type of gaze-independent BCI.