Benefits and Limitations of Computer Gesture Therapy for the Rehabilitation of Severe Aphasia

Roper, Abi; Marshall, Jane; Wilson, Stephanie

doi:10.3389/fnhum.2016.00595

ORIGINAL RESEARCH article

Front. Hum. Neurosci. , 29 November 2016

Sec. Speech and Language

Volume 10 - 2016 | https://doi.org/10.3389/fnhum.2016.00595

This article is part of the Research Topic Technology & Communication Deficits: Latest Advancements in Diagnosis and Rehabilitation View all 16 articles

Benefits and Limitations of Computer Gesture Therapy for the Rehabilitation of Severe Aphasia

$\r\nAbi Roper*$ Abi Roper^1*

Jane Marshall¹

Stephanie Wilson²

¹Division of Language and Communication Science, City, University of London, London, UK
²Centre for Human-Computer Interaction Design, City, University of London, London, UK

Aphasia intervention has made increasing use of technology in recent years. The evidence base, which is largely limited to the investigation of spoken language outcomes, indicates positive treatment effects for people with mild to moderate levels of aphasia. Outcomes for those with severe aphasia, however, are less well documented and – where reported – present less consistent gains for measures of spoken output. This study investigates the effects of a purpose-built gesture therapy technology for people with severe aphasia: GeST+. Study outcomes show significant improvement in gesture production abilities for adults with severe aphasia following computer intervention. They indicate no transfer of effects into naming gains or interactive gesture. Outcomes offer encouraging results for computer therapy methods within this hitherto under-researched population but indicate a need for further refinement of interventions in order to maximize persistence of effects and generalization into everyday communication.

Introduction

About a quarter of stroke survivors have long-term speech and language difficulties caused by aphasia (Ali et al., 2015). When problems are severe, both speech and writing may be eliminated with profound consequences for the person’s quality of life (Hilari and Byng, 2009). Rehabilitation outcomes in those with severe aphasia are also poor (Plowman et al., 2012).

Although aphasia therapy is making increasing use of technology (van de Sandt-Koenderman, 2011; Zheng et al., 2016), few applications have been designed for people with severe impairments (van de Sandt-Koenderman et al., 2007 for an exception). There is also evidence that some language remediation tools do not benefit this group (Palmer et al., 2012). This study evaluated a bespoke computer therapy for people with severe aphasia targeting the compensatory modality of gesture.

The use of gesture can augment communication in severe aphasia (Goodwin, 2000; Parr, 2007) and has been shown to respond to therapy (Daumuller and Goldenberg, 2010; Marshall et al., 2012). However, gains from gesture therapy are often confined to practiced items, with no generalization beyond these; and, in many studies, it is not clear whether acquired gestures benefit interactive communication (Rose et al., 2013). Outcomes also vary across individuals. This may reflect the severity of the aphasia, or impairments in skills related to gesture production, such as executive function (Purdy and Koch, 2006) and praxis (Hogrefe et al., 2012).

Gestures may not simply replace speech in aphasia, they may also cue speech production (Lanyon and Rose, 2009). This is in line with the theoretical proposal that gestures play a facilitatory role, even in healthy speech production (Krauss et al., 2000). Indeed, treatments that include a gestural component have improved word retrieval in aphasia (Raymer et al., 2006; Crosson et al., 2007; Attard et al., 2013). However, the independent contribution of gesture to the treatment effect is difficult to determine. When gesture is treated in isolation effects have not generalized to speech (Marshall et al., 2012).

Therapeutic gains in gesture production are hard won, particularly when the aphasia is severe. For example previous studies have found that at least 3 h of therapy were needed to train each new gesture (Daumuller and Goldenberg, 2010; Marshall et al., 2012). These findings call for intensive treatment regimes. Yet, such regimes are not always available, and may be associated with high rates of drop out (Brady et al., 2016).

Self-administered computer therapies can raise therapy intensity without increasing therapist demand, and in ways that may inhibit drop out. GeST (Galliers et al., 2012) is a therapy tool designed with and for people with severe aphasia in order to train a ‘vocabulary’ of everyday communicative gestures. It employs computer vision-based gesture recognition to determine whether the user has produced the correct gesture. A number of motivating features promote engagement, including opportunities for different levels of practice, applause, and a ‘gaming’ element involving a 3D virtual world.

A pilot study involving nine people with severe aphasia showed that 6 weeks practice with GeST improved gesture production (Marshall et al., 2013). However, gains were modest and only occurred on items that had been practiced with the tool and with therapist support. Spoken naming of both trained and untrained items was explored, with no evidence of a therapy effect. Use of the acquired gestures in communication was not tested.

This paper reports a new therapy study involving GeST+. It aimed to replicate the positive findings of the pilot, with a larger sample and a stronger, quasi-randomized controlled design. The original GeST tool was augmented (GeST+) with an additional software application, to determine if this would enhance the therapy effect. A wider range of outcome measures, and longer follow up, aimed to identify the potential benefits of GeST+, including for interactive communication. Benefits for speech were also examined, through pre- and post-therapy tests of word production. Finally, we examined whether baseline tests of language, cognition, and praxis predicted therapy outcomes.

Materials and Methods

This study employed a wait-list control, quasi-randomized design. Screening and profiling assessments were administered at recruitment. Following screening, participants were allocated to either an immediate or delayed therapy group. Allocation was performed by a member of the team (SW) who was blind to screening data. This was achieved via selection of a paper label displaying either ‘immediate’ or ‘delayed’ from an opaque bag which contained a batch of 10 labels – five stating immediate and five stating delayed. Participants were allocated to their group as indicated by the label selected. The first 11 participants to enter the study were recruited in the South East of England. Allocation here was carried out as each participant entered the study. A second block of data collection was carried out in the South West of England. For logistical reasons allocation here was carried out in blocks of five cases at a time, i.e., five participants at a time were allocated to either the immediate or delayed group. This enabled data collection and therapy delivery to be carried out contemporaneously for this group of participants – to overcome logistical limitations imposed by travel requirements.

Assessments were administered at four time points (T1, T2, T3, T4) – each separated by an interval of 5 weeks. Between T1 and T2, those in the immediate therapy group received 5 weeks of computer-delivered gesture therapy, supported by weekly therapist input. Participants in the delayed therapy group received no input. Both groups undertook repeated measures testing at T2. Following this, those in the delayed group received the 5-week therapy protocol, whilst those in the immediate group received no input. Repeated measures’ testing was carried out again at T3 for both groups. T4 testing was carried out after a further 5 weeks of no input for either group. Computer log data captured therapy practice time and system usage information.

Repeated measure testing at all time points included three main assessments: a primary assessment of gesture production in isolation and secondary assessments of gesture in interactive communication and spoken naming.

Screening and Profiling Assessments

A number of assessments were included to screen and profile participants’ abilities in language, cognition, and praxis. These were included to investigate links between such abilities and any subsequent gains made on the primary or secondary outcome measures.

Language

Four subtests from the standardized Comprehensive Aphasia Test [CAT, (Swinburn et al., 2004)] were employed to assess language. The CAT single word spoken naming assessment was used as a screening measure, with participants scoring 30% or under (i.e., with severe naming difficulties) being included in the study. Following screening, three further CAT subtests were used to profile participants’ individual language abilities: spoken single word comprehension, sentence comprehension, and written single word comprehension.

Cognition

These tests included a basic object to picture matching assessment and a standardized measure of non-verbal cognition.

Object and gesture to picture matching

This screening assessment examined participants’ ability to relate objects and gestures to both photographs and line drawings – skills which were required for completion of the experimental measures, and for successful use of GeST. It was based on a novel assessment describe in Caute et al. (2013) and used to screen participants in two previous gesture studies (Marshall et al., 2012, 2013). Participants scoring 60% or less (≤6/10) for this assessment were excluded from progression to the main study.

Visuospatial assessments from the Cognitive Linguistic Quick Test

Following Nicholas et al. (2011) in assessment of cognition for adults with severe aphasia, visuospatial skill domain subtests from the Cognitive Linguistic Quick Test (CLQT; Helm-Estabrooks, 2001) were employed as profiling tests to assess participants’ non-linguistic cognition. These comprised standardized assessments of symbol deletion; symbol trails; design memory, mazes, and design generation.

Praxis

The Birmingham University Praxis Scale (BUPS, as cited in Bickerton et al., 2012 and standardized for use with both chronic and acute stroke patients) was used as a profiling assessment to examine participants’ fine and gross motor skills and core gesture abilities. The measure comprised four subtests – each supplying written prompts alongside spoken instruction and thus reducing linguistic and memory demands for participants. Subtests examined multi-step object use; gesture production; gesture recognition and meaningless gesture imitation as described in Bickerton et al. (2012).

Primary Gesture Assessment

The primary gesture assessment comprised gesture production from a picture. Derived from a measure employed by Marshall et al. (2013) in the pilot study of GeST, this assessment comprised 40 photograph images of individual objects pictured against a white background. Twenty of the objects presented were items trained in the study. A further 20 were items matched for lexical frequency. Participants were shown an image of an object and instructed: “Show me a gesture for this. Use your hands and your face.”

Participant gesture production was video recorded. Using a method described in Marshall et al. (2012, 2013), videos were later edited into a series of four new videos, each comprising the 40 gestures elicited but distributed across a range of time points. These videos were shown to scorers blinded to the target item, time of assessment and group allocation of the participants. Scorers were asked to identify the item being gestured on a written response sheet. The responses were then compared to the target item and scored for accuracy – with two points being awarded for each accurate identification or acceptable synonym (see Supplementary Data Table 1 for a full list of synonyms), one point for a semantically acceptable alternative (see Supplementary Data 2 for a full description of the scoring process), and 0 points for no response or where the participant indicated that they did not know the answer. A maximum score of 40 points was available for treated items and 40 points for untreated items in this assessment. To evaluate the reliability of the scores awarded for the gesture, videos for 22 of 77 gesture assessments (29%) were viewed and scored by second scorer. Selection of these videos was distributed evenly across the participant data. The second scorer was blinded to the design of the project and the time point at which the assessment had been conducted. The overall percentage of agreement between scorers was 86%.

Interactive Gesture Assessment

Secondary gesture assessment was assessed using a novel measure developed for the purposes of this study. Assessment comprised live gesture production for a familiar communication partner (a family member, friend, or carer in the participant’s immediate environment recruited at the same time as the participant). Participants were shown a short video clip of an everyday situation (e.g., a person answering a telephone) using a Microsoft PowerPoint presentation. Immediately following the clip, a still photograph of a relevant object from the video (e.g., a telephone) appeared against a white background and the participant was asked to gesture the object. The participant was instructed: “I’m going to show you a short video. At the end of the video is a picture. Your job is to gesture that picture to X (like a game of charades). X will try to work out who or what it is and write it down.” Two practice items were shown, followed by 12 test items. The interactive charades assessment had four alternate versions: A, B, C, and D. Each contained six treated and six untreated test items. The order of presentation was randomly assigned and was different for each version. As for the primary gesture assessment, participants were awarded two points for each accurate identification or acceptable synonym, one point being for a semantically acceptable alternative and 0 points for no response or where the participant indicated that they did not know the answer. A maximum score of 12 points was available for treated items and 12 points for untreated items in this assessment.

Naming Assessment

Naming assessment comprised spoken picture naming. Items employed were identical to those used in the primary gesture assessment (see Screening and Profiling Assessments). Participants were presented with a photograph image of an object and asked to state the name of that object. Responses were transcribed/recorded by the researcher and scored for accuracy. A maximum score of 40 points was available for this assessment. Assessment was video recorded for subsequent inter-rater reliability measures. To evaluate the reliability of the scores awarded for the naming data, videos for 14 of 77 naming assessments (18%) were viewed and scored by a researcher external to the project. These videos had been randomly selected using a computer-based randomization process. The second scorer was blinded to the design of the project and the time point at which the assessment had been conducted. A two-way, mixed method intraclass correlation (ICC) was conducted to compare outcomes from the second scorer to those reported by the primary researcher. A high degree of reliability was found between the two score sets. The average measure ICC was 0.907 with a 95% confidence interval from 0.657 to 0.972 [F(13,13) = 27.81, p < 0.001].

Therapy Protocol

Therapy was delivered to participants in their homes over a period of 5 weeks. Weeks one to four adhered to the protocol described by Marshall et al. (2013) in the GeST pilot study. Briefly, participants were presented, one at a time, with gesture videos within GeST and instructed to repeat them. Accuracy was monitored using vision-based gesture recognition and applause was supplied for each correctly produced gesture. Each week a speech and language therapist completed a familiarization exercise with the participant – introducing them to the five gestures to be practiced that week. This was followed by around up to an hour’s supported practice with the computer. Participants were then asked to practice independently for around an hour each day. Week five introduced a supplementary consolidation exercise, allowing participants to practice all 20 gestures together. This cumulative practice period aimed to address the shortcoming noted in (Marshall et al., 2013) that limitations in therapy gains may arise as a result of practicing successive gesture batches instead of a full set. Video clips of the individual gestures were presented, one at a time, using a PowerPoint show. Participants were instructed to copy the gesture demonstrated in the clip. We use the term GeST+ to refer to the combined application of the GeST tool and the supplementary consolidation exercise. Following completion of the 5 weeks of therapy, participants received no further access to GeST+. Additionally, those in the delayed treatment group received no access to GeST+ outside of the allotted treatment period.

Ethical Approval

Ethical approval for this study was granted by the City, University London School of Community and Health Sciences Research Ethics Committee. Following the provision of accessible written and verbal information, all participants gave written informed consent in accordance with the Declaration of Helsinki.

Hypotheses

For each outcome measure (primary or secondary), we predicted an improvement in performance following intervention and a maintenance of this effect after a further five and (for the case of the immediately treated group) 10 weeks. It was anticipated that this effect would be greater for items treated within the intervention protocol when compared to those that were untreated. In addition, a relationship between the screening/profiling assessment scores and changes in outcome measure performance was predicted, as was a relationship between levels of GeST+ practice and changes in outcome measures.

Data Analysis

Primary and secondary outcome measures were subject to two ANOVA analyses. Unless otherwise stated, data met ANOVA assumptions. When this was not the case, log transformations were applied. The first analysis was a mixed within and between subject ANOVA conducted on data collected at T1 and T2. The within variables were time and item. The latter contrasted items that had been treated in GeST+ with items that had not been treated. The between variable was group: immediate vs. delayed. Participants in the immediate treatment group had received GeST+ therapy between T1 and T2, whereas participants in the delayed treatment group had not. Thus a treatment effect was signaled by a time by group interaction. Time by item interactions indicated whether treatment effects were specific to items practiced in GeST+.

The second analysis was a within group ANOVA conducted on the pooled data across all participants, i.e., the immediate and delayed groups combined. The variables were time and item. Time had three levels: pre-therapy (conflating T1 for immediate and T2 for delayed), post-therapy (conflating T2 for immediate and T3 for delayed), and 5 week maintenance (conflating T3 for immediate and T4 for delayed). Item had two levels: treated and untreated. Here treatment effects were indicated by a significant main effect of time, with significant planned comparisons between pre- and post-therapy. Significant comparisons between pre-therapy and maintenance suggested that changes were still evident 5 weeks post-therapy. Time by item interactions again indicated whether treatment effects were specific to items practiced in GeST.

Participants in the immediate treated group underwent a second maintenance assessment, 10 weeks after the end of therapy (T4). Longer-term maintenance of change in this group was assessed by paired t-test comparisons comparing scores at T1 and T4.

Finally, correlation analyses aimed to determine whether any of the screening or profiling assessments were predictive of gains on the outcome measures. Gains were determined by subtracting the pre-therapy from the post-therapy test scores (T2–T1 for immediate; T3–T2 for delayed). GeST usage times were also correlated with gain scores to explore the influence of dose.

Results

Participants

Twenty-two participants were recruited. Following randomization, two opted to discontinue. Twenty participants were therefore included in the analysis. Figure 1 shows participants’ progression through the study. All participants had experienced a left hemisphere stroke with resultant severe aphasia and hemiplegia. Only one retained use of his right hand.

FIGURE 1

FIGURE 1. CONSORT diagram showing participants’ progression through the study.

Table 1 reports the demographic and screening/profiling test scores for participants in the immediate (n = 12) and delayed (n = 8) intervention groups. T-test and Mann–Whitney comparisons confirmed that there were no significant differences between the groups with respect to age, time post-stroke, or any of the test scores. All bar three participants had some experience of computers. The delayed group reported slightly higher pre-stroke computer use.

TABLE 1

TABLE 1. Participant information.

Usage

The time spent using GeST was automatically logged. Across all participants (immediate and delayed groups combined) the mean usage time was 14 h, 50 min (range 5:20–26:50). The mean number of sessions was 52.05 (range 22–132). Usage was highest in the first week, with a mean of just over 5 h. After this, mean usage was close to 3 h per week.

Analysis 1: Mixed within and between Subject ANOVAs

T1 and T2 scores on the primary and secondary outcome measures are reported in Table 2. There were missing data on the Interactive Gesture Assessment (IGA) owing to partners being unavailable.

TABLE 2

TABLE 2. Mean scores (SD) on the outcome measures at T1 and T2 for the immediate and delayed groups.

Primary Outcome Measure

Gesture assessment

Outcomes from the Shapiro–Wilk test indicated that T1 scores were not normally distributed. Log transformations were therefore applied to all scores. The transformed data met ANOVA assumptions, so were used in the analysis.

The mixed ANOVA revealed a main effect of item [F = 39.29 (1,18), p < 0.05; η_p²= 0.69], but no effect of time [F = 1.97 (1,18), p > 0.05, η_p² = 0.10] or group [F = 0.06 (1,18), p > 0.05, η_p² = 0.00]. The item effect indicated that treated items were gestured more successfully than untreated items. There was a significant interaction between group and time [F = 10.88 (1,18), p < 0.005; η_p² = 0.38] and between time and item [f = 7.77 (1,18), P < 0.05; η_p² = 0.30]. The former signals a treatment effect. The immediate group, who had received intervention between T1 and T2 improved; whereas the as yet untreated delayed group did not. The time by item interaction indicates that treatment gains were largely confined to items that were practiced in GeST+. The three-way interaction was not significant.

Secondary Outcome Measures