Prediction of Human-Computer Interaction Intention Based on Eye Movement and Electroencephalograph Characteristics

Qu, Jue; Guo, Hao; Wang, Wei; Dang, Sina

doi:10.3389/fpsyg.2022.816127

ORIGINAL RESEARCH article

Front. Psychol., 12 April 2022

Sec. Decision Neuroscience

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.816127

This article is part of the Research TopicHuman Decision-Making Behaviors in Engineering and Management: A Neuropsychological PerspectiveView all 18 articles

Prediction of Human-Computer Interaction Intention Based on Eye Movement and Electroencephalograph Characteristics

Jue Qu^1,2

Hao Guo²

Wei Wang^2*

Sina Dang²

¹School of Aeronautics, Northwestern Polytechnical University, Xi’an, China
²Air and Missile Defense College, Air Force Engineering University, Xi’an, China

In order to solve the problem of unsmooth and inefficient human-computer interaction process in the information age, a method for human-computer interaction intention prediction based on electroencephalograph (EEG) signals and eye movement signals is proposed. This approach is different from previous methods where researchers predict using data from human-computer interaction and a single physiological signal. This method uses the eye movements and EEG signals that clearly characterized the interaction intention as the prediction basis. In addition, this approach is not only tested with multiple human-computer interaction intentions, but also takes into account the operator in different cognitive states. The experimental results show that this method has some advantages over the methods proposed by other researchers. In Experiment 1, using the eye movement signal fixation point abscissa Position X (PX), fixation point ordinate Position Y (PY), and saccade amplitude (SA) to judge the interaction intention, the accuracy reached 92%, In experiment 2, only relying on the pupil diameter, pupil size (PS) and fixed time, fixed time (FD) of eye movement signals can not achieve higher accuracy of the operator’s cognitive state, so EEG signals are added. The cognitive state was identified separately by combining the screened EEG parameters Rα/β with the eye movement signal pupil diameter and fixation time, with an accuracy of 91.67%. The experimental combination of eye movement and EEG signal features can be used to predict the operator’s interaction intention and cognitive state.

Introduction

As an important part of intelligent human-computer interaction, human-computer interaction intention prediction of the operator according to fuzzy interaction information, and through data mining and analysis can provide cooperative services for operators, so as to improve operator operation efficiency, reduce operational errors, and improve task completion (Dupret and Piwowarski, 2008; Li et al., 2010). At present, intention prediction has been applied in aviation flight operation, weapons and equipment operation, computerized numerical control (CNC) machine tool operation, manufacturing operation system and many other fields, used to solve the operational efficiency and task completion problems of heavy operator interaction task, high information complexity, and large physiological and psychological load (Ganglei, 2021b).

Human-computer interaction intention prediction mainly includes behavioral intention prediction and cognitive intention prediction of operators (Shen et al., 2011; Ganglei, 2021a). The main method of intention prediction is to conduct interaction intention identification by collecting human-computer interaction data and analyzing the data, thus achieving the purpose of interaction intention prediction (Li et al., 2008; Teevan et al., 2008).

Intention is a mental state, a plan or reaction tendency to future behavior. An action is an action executed by an agent that points to its target state, which is envisaged according to the target state it ultimately wants to achieve, even if it may not be achieved in some cases. Cognition is a term referring to the mental processes involved in gaining knowledge and comprehension. In other words, it is the process of information processing of external things acting on people’s sensory organs. It includes feeling, attention, memory, thinking, and other psychological phenomena. Habitually, cognition corresponds to emotion and will.

In terms of behavioral intention prediction, an operation behavior analysis method of inspection robot based on Bayesian network was proposed by Tang et al. (2021). Although the proposed method realizes the function of rapid reasoning about the operation behavior intention of inspection robot, the realized human-computer interaction between robots does not study the human-computer interaction between operators and computers, and the established inspection system for inspection robot operation was subjective. A method of behavior intention recognition for target grabbing was proposed by Zhao et al. (2019). By taking the trajectory of human upper limbs as the main criterion, a user behavior intention model was established. The research on the intention of human-computer interaction between people and computers is realized. This behavior intention integrates the user’s upper limb trajectory into the prediction model, but there is some subjectivity in the modeling process. A multi-information fusion network architecture of human motion intention recognition combining eye movement information, position and attitude information, and scene video information was proposed by Zhang et al. (2021), which effectively reduced the subjectivity of prediction, but the research on user’s behavior intention was relatively simple. Action prediction for a full-arm prosthesis using eye gaze data was proposed by Krausz et al. (2020). A system for controlling the up-and-down movement of artificial limb by using single-channel electroencephalograph (EEG) signal was developed by Haggag et al. (2015), which is convenient and accurate to operate. Yao et al. (2018) combined EEG and eye movement signals to identify different action imagination modes of the same limb, focusing on verifying that the recognition accuracy of EEG combined with eye movement signals is higher than that of a single EEG signal.

The EEG signals are often used in psychological and medical research (Caixia et al., 2021), and also in cognitive intention prediction. The operator’s cognitive intention was successfully predicted by Lu and Jia (2015) and others by visualizing the eye movement data collected during the experiment and using four learning algorithms. A set of eye movement index system to evaluate cognitive load was established by Chen et al. (2011), which realized the prediction of operators’ cognitive intention. Ahern and Beatty (1979) found that with the increase of task difficulty and cognitive load, the pupil diameter will increase. Singh et al. (2020) realized the recognition of online operators’ cognitive intention by using the eye fixation characteristics. By combining EEG signals and eye movement signals to identify the emotions of operators in human-computer interaction system, the components of EEG signals and eye movement signals that have great influence on emotion identification were identified by Lu (2017). Park et al. (2014) explored the influence of EEG and eye movement signals on operators’ implicit interaction intention in the process of interaction, and the experimental results showed that the combination of EEG and eye movement signals was more accurate than a single physiological signal.

In conclusion, intention prediction using behavioral data and physiological indicators is the main research direction of human-computer interaction intention prediction. Behavioral data are predicted through the operator’s past operation experience and do not consider the randomness, universality, and real-time data processing during the operation, so the established intention prediction method has certain subjectivity. Therefore, physiological metrics are commonly applied to behavioral prediction and cognitive prediction. The existing indicators mainly focus on the study of eye movement indicators and EEG indicators. The intention prediction model established based on this one can objectively identify the operation intention of the operator and establish a more accurate intention prediction model.

In recent years, many scholars have also combined eye movement and EEG signals to explore the human-computer interaction intention of operators. Although Wei et al. (2021) also studied the prediction of human-computer interaction intention by combining EEG and eye movement characteristic signals, they focused on demonstrating that the prediction of human-computer interaction intention by using eye movement and EEG characteristic data is more accurate than that by using eye movement data alone, and did not study the prediction performance of specific eye movement and EEG characteristic indicators and their combinations.

This article integrates eye movement and EEG indexes, identifies real-time interaction intention for specific task types and task difficulty, selects significant difference indicators by one-way ANOVA, selects indexes by support vector machine (SVM), constructs task type and difficulty, and compares the prediction results of the prediction method.

Methodology

Subjects and Equipment

Subjects

A total of 50 college students (25 males and 25 females), aged 19 years old, ∼25 years old, right-handed, and no cognitive impairment were recruited. No central nervous system (CNS) diseases were found in the examination and the EEG showed no abnormalities. Both naked visual acuity and corrected visual acuity were above 5.0. The subjects should rest fully before the start of the experiment to avoid strong reactions and maintain emotional stability. At the same time, the hair should be washed clean and cut short. Medicine should not be taken within 24 h before the start of the experiment. Drinking tea and coffee was not permitted so that it does not to affect the reliability of relevant physiological parameters during the experiment.

Experimental Equipment

The experimental equipment mainly includes SMI-RED eye motor as shown in Figure 1A. The maximum sampling frequency is 250 Hz. Neuroscan-NuAmps EEG, as shown in Figure 1B, with a maximum sampling frequency of 1,000 Hz, is used to collect EEG signals during the experiment.

FIGURE 1

Figure 1. Physiological signals acquisition equipment. (A) Eye movement equipment. (B) Electroencephalograph (EEG).

Experimental Design About Task Type and Difficulty

In order to reasonably summarize the operator’s operation state in the process of human-computer interaction, this article defines the operator’s operation type and state as five operation types and three operation states in the experiment, combining with the human-computer interaction process in important and complicated fields such as air traffic control.

Task Type Setting

During the experiment, the subject needs to complete the designated operational task in accordance with the requirements of the experiment. In Experiment 1, the subjects need to complete five different experimental tasks. In order to reasonably sum up the operator’s intention of human-computer interaction in the process of operating the computer, this article defines the operator’s operation tasks in the process of human-computer interaction as five types as shown in Figure 2, combining with the human-computer interaction process in important and complicated fields such as air traffic control. The operation interface is shown in Figure 3; the top left ribbon (F1) is the target search task interface, and the induced interaction intention is a target search; the top right ribbon (F2) is the table query task interface, and the induced interaction intention is a table query; the lower left ribbon (F3) is the icon clicking the task interface, and the induced interaction intention is the icon clicks; the lower right row (F4) is the status tracking task interface, and the induced interaction intention is the target tracking. The whole scene (F0) is the monitoring alert task interface, and the induced interaction intention is monitoring alert.

FIGURE 2

Figure 2. Five kinds of human-computer interaction tasks.

FIGURE 3

Figure 3. Experimental interface.

“Monitoring alert” means that the operator does not need to complete any human-computer interaction tasks, but only needs to pay attention to the changes of main parameters on the interface and monitor whether there is any abnormality.

“Target search” refers to the process that the operator needs to find out a specific target in the background containing interference icons.

“Table query” refers to the process that the operator needs to query the required information in the table containing the target status parameters.

“Icon click” refers to the process that operators need to click an icon button to trigger corresponding instructions and complete specific operations.

“State tracking” refers to the process that operators need to pay real-time attention to the specific parameters of specific targets and judge whether the operation time is ripe or not.

Difficulty Gradient Design Based on Task Type

Each of the five tasks can be divided into three task difficulties.

In the second experiment, the subjects need to perform tasks in three special cognitive states, as shown in Figure 4.

FIGURE 4

Figure 4. Three operating states of human-computer interaction.

“Out-of-loop state” refers to the phenomenon that the operator may lose attention during long-term monotonous and boring interactive tasks, which can also be understood as what we usually call “distracted.”

“Quiet state” refers to the state in which the operator normally performs daily interactive tasks, which is similar to the “monitoring alert” mentioned in Experiment 1.

“Stress state” refers to the state in which the operator is at a loss when he/she is under great pressure and when he/she is in an abnormal situation during the execution of a stressful task.

The task interface of the second experiment induced a cognitive state as shown in Figure 5, which can be divided into four regions as the “radar monitoring task” as the background. The top left is the target search area, in which six kinds of targets move randomly, and concentric circles and angle lines can roughly show the distance and orientation information of targets. On the upper right is the state parameter area. All targets in the target search area can find accurate state parameters in the table. You can turn pages with the mouse. At the bottom left is the legend area, which shows the category information of icons in the target search area. On the lower right is the operation area, where the subjects perform corresponding operations according to the prompts in this area, and click the relevant buttons. The experimental interface under stress is shown in Figure 6.

FIGURE 5

Figure 5. Normal state experimental interface.

FIGURE 6

Figure 6. Stress state experimental interface.

Experimental Process

In the first experiment, in order to collect physiological data with five interaction intentions, all the subjects need to complete five experimental tasks according to the experimental flow shown in Figure 7.

FIGURE 7

Figure 7. Experimental process.

Before starting the experiment, first introduce the experimental tasks and specific requirements to the subjects, and at the same time, there should be some practice operations to ensure that the subjects have a systematic understanding of the experimental tasks, and then calibrate the eye tracker and wear the electrode cap. All the subjects need to complete five experimental tasks as follows:

Monitoring and alert interaction task: Under this task, all elements of the whole interface will change randomly, and the subjects can pay attention to the content they are interested in at will, and report the specific situation of an element in the interface in time, so as to make the subjects in the monitoring and alert interaction state. The task will be completed in about 1 min, and then the next task will be carried out after a short rest.

Target search interactive task: Under this task, the subjects need to search for specific characters in the function F1 whose character positions change randomly, and judge and report whether the number of characters listed on the right is correct, so that the subjects can be in the target search interactive state until the task of judging all the characters is completed, and then take a short rest and proceed to the next task.

Table query interaction task: Under this task, the content presented by the table in function area F2 is constantly updated. After the main test issues the query task, the subjects will try their best to find the corresponding data and report in time, in order to make the subjects in the form query interaction state, have a little rest after completing ten queries, and then carry out the next task.

Click interactive task: Under this task, the target icon that the subject needs to click will be displayed on the screen. Subjects need to complete the search and click on a large number of icons in functional area F3. The system will judge whether the subject clicks correctly, rest after clicking ten times, and then start the next task.

Interactive task of state tracking: Under this task, the position change of the target (black line segment) on the coordinate axis will be displayed in functional area F4, and the subjects are required to pay attention to the state of the target in real time, and answer the questions of the examiners about the state of the target at any time, so as to keep the subjects in the state tracking state, and the duration of this task is about 1 min.

During the experiment, the eye monitor and the electroencephalonete will automatically collect and save the physiological signals to provide the data for the subsequent analysis and processing. It is worth noting that, because one experiment lasts for a long time, the manual labeling time error generated by performing the time synchronization during signal acquisition is negligible.

Experiment 2 subjects were needed to perform the corresponding task manipulation in three different cognitive states.

The experimental task of calm state requires the participants to complete the normal operation. For example, select the target number they are interested in in the target search area, then query the specific parameters in the state parameter area, and respond to the questions of the main test timely. The task lasts about 1 min, and the next task is carried out after a little rest.

The experimental task in the off-loop state does not require any interactive operation of the subjects, but recalls the learning task in the past period, which also lasts for 1 min, and then goes on to the next task after a short rest.

Under the experimental task of stress state, the interface information is difficult to display normally, and the information is more difficult to obtain. Specifically, the target search area and the state parameter area in the experimental interface will produce the random flashing stripes as shown in Figure 6. However, at this time, the subjects still have to perform an intensive interaction task, such as finding the area of multiple specific numbered targets in the target search area, and clicking the relevant button in the operation area, respectively.

Experiment 1

Experimental Purpose

A visual interaction experiment that can induce 5 kinds of interaction intention, use eye movements and EEG instruments to collect eye movement characteristics and EEG signals under different interaction intentions, realize the discrimination of different interaction intentions through the classification algorithm, compare the classification effect of eye movement characteristics and EEG signals, and screen out better distinguishing characteristic indicators was designed.

Results

Due to the large differences in the processing methods of eye movement data and EEG signals, they should be discussed and analyzed separately. First, the eye movement indicators and EEG parameters with good classification effect were selected through differential analysis, and then the characteristic combination with the best discriminative effect was further explored.

Analysis of Eye Movement Data Processing

In the normal state, the hot maps of subjects in one set of experiments are shown in Figure 8.

FIGURE 8

Figure 8. Hot map of subjects in normal state. (A) Table query interaction task. (B) Monitoring and alert interaction task. (C) Target search interactive task. (D) Click interactive task. (E) Interactive task of state tracking.

As can be seen from Figure 8, the subject eye stay area greatly concerns the task completed. When completing a task, the subjects point will focus in the corresponding task area and focus on the data or status of the task target. In the table search task, the subjects focus on the F2 area and watch the most on the target number 0498; in the monitoring alert task, the fixation points are distributed in many areas, but ultimately in the F2 area; the target is in the F1 area, and the hot spots are relatively evenly distributed, probably because the target itself is scattered, the target will focus on the red line.

(1) Selection of characteristic components of eye motion.

The SMI RED5 eye moving instrument used in this article can measure dozens of eye movement parameters. In order to improve the efficiency of classification and identification, it is necessary to select better differentiated eye movement indicators through differential analysis. Based on the preliminary analysis results of the BeGaze software, the average pupil diameter (Average Pupil Size [APS]), fixation point abscissa mean (Average Position X [APX]), fixation point longitudinal mean (Average Position Y [APY]), average saccade amplitude (ASA), average saccade speed (Average Saccade Velocity [ASV]), and average fixation time AFD (Average Fixation Duration [AFD]) were selected for this experiment. The results of data processing by one-way ANOVA are shown in Table 1.

TABLE 1

Table 1. Analysis results of eye movement index differences.

Table 1 shows significant differences in APX [F_(4,115) = 34.59, p < 0.05], APY [F_(4,115) = 32.78, p < 0.05], and ASA [F_(4,115) = 21.84, P < 0.05], while APS [F_(4,115) = 3.83, p > 0.05], AAV [F_(4,115) = 4.95, p > 0.05], and AFD [F_(4,115) = 17.55, p > 0.05] differ significantly. Therefore, this article considers that the fixation point X coordinate PX, fixation point Y coordinate PY, and SA can be used to distinguish the interaction intention at this moment.

After the eye motion index screening is completed, it is necessary to choose the eye motion characteristic components for the interaction intention discrimination. The key is to select the number of sampled gaze points. To avoid the stochasticity of individual sampled fixation points, drawing on the treatment of literature (Fan et al., 2016), in this article, PX, PY, and SA of three consecutive sampled fixation points, namely the nine components shown in Table 2, were used as eye movement feature parameters for determining the interaction intention.

TABLE 2

Table 2. Eye movement characteristic component.

(2) Identification results of eye movement characteristics.

The training set of this experiment consists of 300 typical data (60 selected each under 5 interaction intentions). The test set consists of 200 typical data (40 selected each under 5 interaction intentions). It uses SVM algorithm and sets category label as (0,1,2,3,4). The SVM algorithm was used to train eye movement features and the parameters of the SVM algorithm were determined by the cross-contrast SVM method. Figure 9 shows the distribution diagram of 500 typical data of PX, PY, and SA.

FIGURE 9

Figure 9. A total of 500 Distribution of typical data.

As can be seen from Figure 9, under the five kinds of interaction intentions, there is a large number of cross-overlap in the range of the three kinds of indicators, which makes it difficult to effectively distinguish the five interaction intentions for a single indicator. Therefore, it needs to be more accurate through the combination of eye movement indicators. Considering that PX and PY represent the positional information of the sampled fixation points, this experiment was analyzed as a whole.

The MATLAB classification results for feature combinations of “PX and PY,” and “PX, PY, and SA” are as follows:

PX and PY:

Accuracy = 76.00%(152/200) (classification);

PX, PY, and SA:

Accuracy = 92.00% (184/200) (classification).

Figures 10A,B show the specific results of the combination discrimination classification of “PX and PY,” and “PX, PY, and SA,” respectively. The discriminant classification was shown in Table 3. Obviously, the former has a very poor classification effect on monitoring alert, seriously misjudged the monitoring alert as the other four intentions, and has a good classification effect on the other four intentions. After the addition of the saccade range SA, the discrimination accuracy of the monitoring alert is significantly improved, and the classification effect of the other four intentions has also been strengthened to a certain extent, which shows that the saccade range directly affects the inference results of the surveillance alert. At the same time, there are a few cases in which target search, table query, icon click, and status tracking are misjudged as surveillance alert. The reason may be that the location parameters PX and PY of the gaze point in the surveillance alert state overlap with the other four states.

FIGURE 10

Figure 10. The discriminant classification results for the test set. (A) “PX and PY.” (B) “PX,PY and SA.”

TABLE 3

Table 3. The discriminant classification results for PX and PY.

Analysis of the Electroencephalograph Signal Processing

(1) Preprocessing of EEG signals.

The EEG signals have strong individual variability, so the commonly used characteristic indicators are the proportion of average power to total power in each frequency segment, or the ratio of average power between different frequency bands, such as (α + β)/θ,α/β, (α + θ)/β, etc. But these indicators also have certain individual differences (Alazrai et al., 2019; Pei et al., 2018). In this paper, the collected EEG signals are first denoised by wavelet transformation, and then obtained by power spectral analysis. The average power of different frequency bands is calculated, and the corresponding ratio can be obtained after simple calculation. It is worth noting that, due to the complex processing of EEG signals, the average power of the basic time period is calculated at 0.5 s (In reference to Gu et al., 2016; Liu et al., 2021). From the experimental process, the duration of the five interaction tasks was above 1 min, so each interaction state has at least 120 sets of data that can meet the basic requirements of the classification algorithm.

The electrodes on the EEG instrument used in this experiment have been installed according to the 10–20 standardized electrode guidance method, but in order to improve the efficiency of the EEG signal processing and make the research results more generalized, the representative guides should be selected for processing and analysis. This experiment adopted the recommendations of the American EEG Number Association standards, and 14 guides including F7, F8, P7, and P8 as shown in Figure 11 were selected as the EEG signal acquisition channels.

FIGURE 11

Figure 11. Schematic location of the 14 conductive electrodes selected.

Denoising processing, power spectrum analysis of EEG signals in the middle of the target search interaction task, and the results are shown in Figure 12. It can be found that the effect of denoising is relatively ideal, and the power density curve of different channels has different degrees, but it shows a similar trend: the power density of slow wave (δ,θ) is relatively stable on the whole, while the power density of fast wave (β,γ) has large fluctuations. Meanwhile, intuitively seen from the power spectral density, the power density of the slow wave in this time period is significantly higher than that of the fast wave.

FIGURE 12

Figure 12. Preprocessing results of 14-lead EEG signals of a subject. (A) Denoising treatment. (B) Power spectrum analysis.

(2) Differential analysis of the EEG indexes.

In this experiment, 8 commonly used index parameters such as R_δ, R_θ, R_α, R_β, and R_γ and different frequency bands such as R_α/β, R_θ/β, and R_{(α +θ)/(α +β)} were selected, and one-way ANOVA was performed, and the results are shown in Table 4. The average power is obtained as shown in formula (1):

A_{α} = \frac{\int_{a}^{b} P (f) d f}{b - a} (1)

TABLE 4

Table 4. Results of EE parameters (first 5 rows:%).

Where A_α is the average power of the frequency band α, (a, b) is the upper and lower limits of the frequency band α, P (f) is the power spectral density of the frequency band α, and the average power method of the remaining frequency bands is similar to this. In addition, the meaning of the R_α/β value is the average power ratio between the frequency band α and the frequency band β, and the meaning of the R_α value is the average power ratio of the frequency band α to the total frequency band, which is equivalent to the R_{(α/(δ +θ +α +β +γ)}. The specific meaning of the other statements will not be repeated here.

From Table 4, the eight kinds of EEG parameters selected in this experiment in the difference of interaction intention is not obvious (p > 0.05), so it is difficult to realize the effective discrimination of interaction intention. The reason may be that the power ratio of different frequency bands is difficult to interpret the interaction intention. Whether other EEG features can successfully distinguish the interaction intention defined in this paper needs further exploration. As for other EEG characteristics can successfully distinguish the interaction intention defined in this article remains to be further explored. Therefore, the discrimination of interaction intention in this article is mainly based on the eye movement features, and the feature combination of “PX, PY, and SA” will be used for the real-time discrimination of interaction intent below. The discrimination dassifivation was been shown in Table 5.

TABLE 5

Table 5. The discriminant classification results for PX, PY, and SA.