Eye tracking research on readers’ interactions with multimodal texts: a mini-review

Gatcho, Al Ryanne Gabonada; Manuel, Jeremiah Paul Giron; Sarasua, Rocky James Guevarra

doi:10.3389/fcomm.2024.1482105

MINI REVIEW article

Front. Commun., 05 December 2024

Sec. Multimodality of Communication

Volume 9 - 2024 | https://doi.org/10.3389/fcomm.2024.1482105

Eye tracking research on readers’ interactions with multimodal texts: a mini-review

Al Ryanne Gabonada Gatcho¹^*

Jeremiah Paul Giron Manuel²

Rocky James Guevarra Sarasua³

¹School of Foreign Languages and Literature, Hunan Institute of Science and Technology, Yueyang, China
²Sekolah Pelita Harapan, Bogor, Indonesia
³Faculty of Teacher Development, Philippine Normal University, Manila, Philippines

This mini-review advocates for the role of eye-tracking research in understanding readers’ engagement with multimodal texts. Synthesizing findings from a variety of studies, the review reveals how eye-tracking gives insights into sophisticated interactions between the textual, visual, and auditory elements within reading environments that assist both cognitive processing and comprehension. Several gaps were revealed: limited demographic scope, integration of advanced technologies, and substantial impact to the area of eye tracking and multimodal literacy. Future directions must therefore include studies across diverse populations, innovative technologies, and cross-discipline research studies. These directions are critical for advancing literacy development in an increasingly multimodal digital world.

Introduction

Numerous research domains have used eye tracking. In marketing, its presence is apparent in consumer experience research, where viewers’ eyes were examined on various purchase stages (Duerrschmid and Danner, 2018; Ishibashi et al., 2019). In health care, it is acknowledged for its diagnostic (Sun et al., 2022), therapeutic (Harezlak and Kasprowski, 2018), and interactive properties (Tscholl et al., 2020). In education, examining learners’ experience with learning materials is facilitated by such technology (Conley et al., 2020; Serrano-Mamolar et al., 2023; Susac et al., 2023). Throughout the years, eye tracking is regarded as a tool for examining reading engagement (Child et al., 2020; Liu and Yu, 2022), comprehension (Abundis-Gutiérrez et al., 2018; Mézière et al., 2023), and atypical patterns like dyslexia or ADHD (Klein et al., 2019).

The ever-evolving technological landscape popularized the use of multimodal texts enabling learners to absorb information regardless of their learning styles (Bearne, 2012; Jancsary et al., 2016; Smith, 2012). Readers need to interact with multiple modes of information. The development of frameworks relative to multimodal literacies has taken up space in this field (Serafini, 2015; Shin et al., 2020). This area demands scoping examinations, constructing appropriate methodologies and optimizing its function. Eye-tracking provides perspectives on how readers can interact efficiently with multimodal materials (Holmqvist et al., 2011a, 2011b). Contrarily, the researchers find the tension posed by the utility of multimodal materials in literacy, believing that such practices cause strain to the limit of cognitive processing of the learners, thus limiting its potency (Mayer and Moreno, 2010). The insights gained from eye-tracking studies may guide the development of effective multimodal learning resources (Alemdag and Cagiltay, 2018), enabling researchers to analyze how long readers focus on particular elements of multimodal text (van der Sluis et al., 2018; Armfield, 2011; Schmidt-Weigand et al., 2010), which often demands transitions between different types of representations and careful consideration of information presented in graphs, as is the case with the Programme for International Student Assessment (PISA) items (Susac et al., 2018; Mason et al., 2015). Mason et al. (2022), showed how visual attention patterns can be used to inform the development of instructional video content. To illustrate the potential of eye-tracking in refining multimodal materials, consider its application in multimedia learning. Mayer and Fiorella (2021) discuss how students’ eye movements in multimedia lessons can reveal whether they are efficiently coordinating attention between explanatory diagrams and narration, highlighting areas where learners struggle to connect visual and verbal information, prompting educators to revise the layout or sequence of these materials for enhanced learning (Wiegand et al., 2017). Thus, this mini review validates the body of research on eye tracking in multimodal reading to identify the arguments that drive this expanding field. It reveals gaps in research while providing contrasting perspectives that influence current understandings. The review does not only contextualize current knowledge but also prognosticates on future implications.

Eye tracking in reading research

The application of eye tracking technology in reading is anchored on Just and Carpenter’s (1980) “eye-mind hypothesis,” positing a connection between gaze location and cognitive processing. Whoever started the eye tracking movement during reading remains unsettled. However, the experiments of Louis Emile Javal in 1879 concluded that reading is a nonlinear process since readers’ eyes exhibit a series of quick movements dubbed by brief moments of stillness on certain parts of the text (Płużyczka, 2018). Moving to the early 20th century, Edmund Huey invented the first yet intrusive eye tracker to understand reading behaviors (Walczyk et al., 2014). Following this, Buswell’s seminal work in the 1920s discovered that eye movements are not smooth but composed of saccades and fixations (Wade, 2020), concepts that were previously identified by Javal but were not named during his time. Since then several researchers demonstrated how saccadic movements correlated with cognitive processing during reading (Rayner, 1978; Taylor, 1965).

The late 20th and early 21st centuries yielded profound insights into how readers interact with the texts through eye tracking technology within conventional reading environments. For example, skilled readers have shorter fixations and longer saccades as opposed to struggling readers (Boland, 2004; Weger and Inhoff, 2006). Shifting the focus to how engaged readers interact with texts, they display longer fixations on meaningful parts, while disengaged readers demonstrate quick eye movement patterns, shorter fixations, and frequent regressions—which signify comprehension difficulty (Holmqvist et al., 2011a, 2011b; Rayner and Pollatsek, 2006). Additionally, dyslexics display atypical eye movement patterns, finally shedding light on their word recognition and processing speed difficulties (Hyönä and Olson, 1995; Jones et al., 2008). Learners with ADHD, on the other hand, have erratic fixations with frequent saccadic movements (Karatekin and Asarnow, 1998), thereby clarifying why reading is a predicament for them. On a practical note, eye tracking technology paved to the development of targeted interventions like improving text readability (Goldberg and Wichansky, 2003) or integrating assistive technologies in the teaching practice (van Gog and Scheiter, 2010).

Multimodal texts and reader interaction

A multimodal text is a combination of more than one of the “modes,” pertaining to the method of communication being used: spatial, linguistic, visual, gestural, and audio, creating meaning far beyond the capacity of any single mode to do so (Moses and Reid, 2021; Sutrisno et al., 2023; Jewitt, 2013; Forceville, 2011). For example, with digital presentation, it can be read and seen coupled with animations and voice narration (Kress, 2010). Second is interactivity where many of the multimodal texts, especially those digital forms let users engage with the content through clicks, links, or swipes that determine how and what they navigate through in the text (Serafini, 2014). Third characteristic of the multimodal texts is non-linearity. Hypertexts and websites give readers liberty to navigate content through different pathways.

Kress and van Leeuwen (2006) explain that multimodal literacy relies on a reading operation where the reader decodes and syncretizes these diverse semiotic sources (Serafini, 2011; Forceville, 2010) unlike traditional print sources. For example, websites and new forms of digital media, such as digital comics strips or an infographics allow users to click on links to take them where they choose to within the content, thereby actively assuming meaning-making agency (Jewitt, 2013; Bezemer and Kress, 2008). In the case of an instructional video, they interpret the visual demonstrations as well as the auditory instructions (Serafini, 2014). For these reasons, it is necessary to delve into the literature about multimodal text reader interaction. To begin with, it widens to print literacy and encompasses various ways different people communicate within media environments (Cope and Kalantzis, 2009). With the growing incorporation of technology in educational systems, understanding how readers interact with multimodal texts can inform teaching practices so that students will be better prepared in the contemporary world (Walsh, 2010; Jewitt, 2009).

As noted by Shin (2023), “we live in a multimodal world,” and this becomes evident with the integration of various modes of communication to convey complex information. Eye tracking, according to Holsanova (2014), is a potentially useful tool for gathering accurate visual data about how readers engage with multimodal texts.

Visual and textual synchronization

Eye tracking studies revealed that the temporal aspect of visuals within a text provides a context that primes readers’ mind for new or important information (Gegenfurtner et al., 2011; Hoffman, 2016), which facilitates better comprehension (Huth et al., 2024; Lee and Révész, 2018; Loewen and Inceoglu, 2016). These visual cues also act as cognitive anchors, aiding information retention (Pjesivac et al., 2021). Recognizable images are recalled with fewer fixations at the center during recognition phases (Borkin et al., 2015), and notably, animations are found to enhance readers’ information recall as opposed to still visuals due to their sequential properties (Coskun and Cagiltay, 2022). Additionally, the frequent shift of eye focus between animated segments and texts insinuate a potentially fragmented reading experience (Foulsham et al., 2016).

Eye movement data reveals that visuals affect where and how long attention is held (Indrarathne and Kormos, 2017; Lee and Jung, 2021), determines whether readers are visual learners by comparing their length of gaze on images and texts (Koć-Januchta et al., 2017), and provides insights into strategies employed when interpreting visuals (Borkin et al., 2015). However, Huang et al. (2011) revealed otherwise since graph drawings on texts have minimal impact on readers’ task performance.

In terms of affective response, dilated pupils on high resolution images imply heightened reading interest (Brunyé et al., 2019), while multiple and rapid side movements of the eyes when absurd images are encountered suggest discomfort (Gregory, 2015). In scenarios where readers are locating specific information, eye tracking has revealed that visuals act as reference points, which accelerates the process (Drew et al., 2017; van der Gijp et al., 2017). Additionally, differences in strategies of various demographics in using visuals as search cues were also explored (Józsa and Hámornik, 2012).

Recent eye tracking researches have explored how games integrated in digital texts can be more accessible for visually impaired people through gazed-controlled interfaces, bypassing the need for a traditional input device like keyboard or mouse (Deng et al., 2014; Munoz et al., 2011). Likewise, Krebs et al. (2021) and Gu et al. (2022) demonstrated that adaptive learning games designed with eye tracking feedback can improve reading comprehension of dyslexic students.

Texts with audio elements

Text processing may be impacted by audio integration. For example, voiced narratives caused readers to focus on text or image portions for longer (Kruger, 2012; Liu et al., 2011); explanatory audios helped readers focus on the relevant content and reduced the need to read the texts again (Conklin et al., 2020); and quiet music during passages can help people reflect more deeply (Kerchner, 2014; Holmqvist et al., 2011a, 2011b). When background music was played during brief passages, there was a decrease in visual wandering; however, lengthier passages showed the opposite pattern (Hyönä and Ekholm, 2016). Cognitive load may also be affected by variations in speech volume, tone, and tempo (Hvelplund, 2011). This can be seen when watching movies with subtitles because viewers’ eye movements change based on whether they are simultaneously exposed to dynamic audio that supports or contradicts the textual information being displayed (Kruger and Steyn, 2014).

Moreover, auditory learners benefit from audio-enhanced texts as data showed that their eyes are less strained when processing information through listening rather than reading (Conklin et al., 2020; Pellicer-Sánchez et al., 2018). However, Kruger and Steyn (2014) noted that the design of audio elements within the texts should consider diverse readers, especially those with hearing impairments or those who are easily distracted by sounds. In light of this consideration, integrating both visual and auditory stimuli is recommended.

Combinations of textual, visual and auditory modes

The integration of texts, visuals, and audios altogether in multimodal materials has demonstrated strengthened contribution to reader engagement in the field of eye tracking. Schiavo et al. (2015) developed the GARY application, a text-to-speech multimedia application, supporting struggling readers in their progress. Similarly, the Zurich Cognitive Language Processing Corpus (ZuCo) demonstrated findings on the advancement of studies concerning literacy and language development at the brain and eye coordination fields (Hollenstein et al., 2018). While such innovations are considered a gamble, looking at both positive and negative effects observed in its utility (Bus et al., 2015; Dobler, 2015), its potential to help improve reading comprehension levels can no longer be ignored.

Recent innovations in the field

Current technological developments came as beneficial complementary tools for enhancing multimodal literacy development. Santos et al. (2016) demonstrated Augmented Reality (AR) as an effective tool in improving vocabulary of the learners, utilizing multimedia information in its setting. Placing buttons for translating, describing, and listening, the constructed environment allowed the students to immerse themselves in learning new words. In the case of reading comprehension, Danaei et al. (2020) revealed that children who used AR-based literature had better grasp of stories over those who had traditional books. Even among children with learning disabilities, AR-induced learning boosted reading comprehension (Shaaban and Mohamed, 2024). Moreover, virtual reality (VR) took a similar position in multimodal text processing, supporting learners to receive help and encouragement (Tai et al., 2020; Asad et al., 2021). These findings supported both platforms as a potent developer of multimedia text literacy (Liu et al., 2020; Bursali and Yilmaz, 2019). These recent progress in the field of AR and VR boosted developers to enhance eye tracking technology linked to these innovations (Dudinskaya et al., 2020). Head-mounted displays specific for AR and VR intersects for eye tracking were developed, emphasizing gaze-based interaction (Kapp et al., 2021).

On the other hand, AR considerations on optimizing its integration hold developers as risks to children were also observed (Li et al., 2018; Papanastasiou et al., 2018). Similarly, VR-based implementations are costly (Kamińska et al., 2019), successful integration requires demanding labor from the teacher (Alizadeh, 2019), and potential mental health risks (Richter et al., 2018). Moreover, it also keeps the field of eye tracking demanding for fresh insights relative to these practical gaps revealed in literature.

Discussion: unpacking the gaps

The area of eye tracking in the field of literacy remains significant, enticing developers and explorers to continually locate substantial materials towards its optimized implementation. As demonstrated by the GARY (Schiavo et al., 2015) and ZuCo (Hollenstein et al., 2018) applications, one can actually readily witness a variety of eye tracking-specific literacy-promoting methods. As Holsanova (2014) firmly thought, the technology’s current contribution to literacy cannot be dismissed. The visions of Just and Carpenter in 1980 relative to the eye-mind hypothesis paved the way to these innovations, integrating the nuances in different formats throughout its development. On the other hand, its sparsity in its field stands observable despite its prevalence in studies.

Extensive research on eye tracking in traditional settings has produced insightful findings, particularly regarding its impact on multimodal texts (Gegenfurtner et al., 2011; Hoffman, 2016; Huth et al., 2024; Lee and Révész, 2018; Loewen and Inceoglu, 2016). Additionally, the extent of the demographics of the participants includes dyslexics (Hyönä and Olson, 1995; Jones et al., 2008) and children with ADHD (Karatekin and Asarnow, 1998). Focusing on eye movements (Indrarathne and Kormos, 2017; Lee and Jung, 2021), including the dilation of the pupils (Brunyé et al., 2019), and length of gaze on images and texts (Koć-Januchta et al., 2017), this area of study has demonstrated interesting insights especially to helping readers. While present advancements in eye tracking have integrated technological innovations (Krebs et al., 2021; Gu et al., 2022), the speedy development of technology enhancements demands continual findings.

Another gap observed in this field was the lack of studies on long term effects of eye tracking such how repeated exposure to multimodal texts influences memory consolidation and cognitive development, seeking recommendations on pragmatic actions. Pjesivac et al. (2021) and Borkin et al. (2015) concurringly shared vital insights on visual cues but nevertheless maintained a narrow approach on recognition and retention stages. While Coskun and Cagiltay (2022) posited similar findings on the utility of visual cues towards enhanced information recall, the postulations remained unlinked with long term considerations. Even in the area of including audio with text elements, studies regarding this concern seemed lacking. Conklin et al. (2020), Kruger (2012), and Liu et al. (2011) demonstrated the benefits of audio for reading focus, but their studies only covered voice-assisted reading. The field’s exclusive practices, which make it difficult for the discipline to provide long-term considerations in eye tracking research, are implied in this gap.

Studies on eye tracking in a cross-disciplinary lens demonstrated another demand to assist this field gain prominence. Technology was heavily considered in the studies visited (Krebs et al., 2021; Gu et al., 2022; Schiavo et al., 2015; Hollenstein et al., 2018) especially augmented and virtual reality integrations, but current literacy partnerships with psychology seemed ignored. Two of the closest cross-disciplinary findings were the studies concerning participants with ADHD (Karatekin and Asarnow, 1998), and dyslexia (Hyönä and Olson, 1995; Jones et al., 2008), which currently are demanding fresh explorations considering their dates of publication.

Recommendations and future directions

The following suggestions are seen to be advantageous to the field of eye tracking research about its application in reading multimodal texts.

There is a need in integrating more innovative applications to eye tracking activities and in reading multimodal texts. Considering the fast pace of the technological landscape in education, rearing today’s children with the most advanced gadgets, latest developments towards multimodal literacy, especially the gamified ones are critical.

The narrow participant demographics in these studies call for expanding research to include mature readers and diverse needs. Similarly, the spectrum in ADHD has developed in recent years, demanding specific interventions and approaches for a particular spectrum type. Moreover, studies concerning children with socio-emotional learning needs should also be given attention in the field of eye tracking research, and how literacy can be addressed among these children especially in reading multimodal texts.

Challenges in AR and VR technologies linked with multimodal text literacy development demand explorations for eye tracking research. Institutions can explore feasible means to alleviate the costs and help communities to enjoy such innovations. Moreover, workshops and trainings may be arranged for utilizing such advancements toward eye tracking research. One may also observe and examine the suggestions of Mayer and Moreno (2010) in decreasing the cognitive processing load that multimedia materials place among learners.

These recommendations may not be exhaustive considering that this is a mini-review. Longer term consolidation of study results through systematic reviews and meta-analyses can provide greater insights into unexplored impact of eye-tracking on literacy development and multimodal reading in the digital era.

Conclusion

This paper synthesizes eye tracking studies to explore how readers interact with multimodal texts. Findings reveal that multimodal reading requires cognitive flexibility since readers navigate interplays of content. Key strategies like prioritizing information based on its perceived importance and confirming text with visual aids are essential for comprehension. However, research gaps persist, particularly in understanding how varied populations, such as those with reading difficulties or non-native language backgrounds, engage with multimodal texts in naturalistic settings. Future research should focus on these areas to enhance instructional designs and embrace the digital evolution of literacy practices.

Author contributions

AG: Conceptualization, Supervision, Writing – original draft. JM: Writing – review & editing. RS: Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abundis-Gutiérrez, A., González-Becerra, V. H., Del Rio, J. M., López, M. A., Ramírez, A. A. V., Sánchez, D. O., et al. (2018). Reading comprehension and eye-tracking in college students: comparison between low-and middle-skilled readers. Psychology 9, 2972–2983. doi: 10.4236/psych.2018.915172