The 3-Minute Psychomotor Vigilance Test Demonstrates Inadequate Convergent Validity Relative to the 10-Minute Psychomotor Vigilance Test Across Sleep Loss and Recovery

Antler, Caroline A.; Yamazaki, Erika M.; Casale, Courtney E.; Brieva, Tess E.; Goel, Namni

doi:10.3389/fnins.2022.815697

ORIGINAL RESEARCH article

Front. Neurosci., 15 February 2022

Sec. Sleep and Circadian Rhythms

Volume 16 - 2022 | https://doi.org/10.3389/fnins.2022.815697

This article is part of the Research TopicSleep and Circadian Rhythms in Cognitive and Physical Performance: What Do We Know and What Can We Do?View all 5 articles

The 3-Minute Psychomotor Vigilance Test Demonstrates Inadequate Convergent Validity Relative to the 10-Minute Psychomotor Vigilance Test Across Sleep Loss and Recovery

Tess E. Brieva

Biological Rhythms Research Laboratory, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States

The Psychomotor Vigilance Test (PVT) is a widely used behavioral attention measure, with the 10-min (PVT-10) and 3-min (PVT-3) as two commonly used versions. The PVT-3 may be comparable to the PVT-10, though its convergent validity relative to the PVT-10 has not been explicitly assessed. For the first time, we utilized repeated measures correlation (rmcorr) to evaluate intra-individual associations between PVT-10 and PVT-3 versions across total sleep deprivation (TSD), chronic sleep restriction (SR) and multiple consecutive days of recovery. Eighty-three healthy adults (mean ± SD, 34.7 ± 8.9 years; 36 females) received two baseline nights (B1-B2), five SR nights (SR1-SR5), 36 h TSD, and four recovery nights (R1-R4) between sleep loss conditions. The PVT-10 and PVT-3 were completed every 2 h during wakefulness. Rmcorr compared responses on two frequently used, sensitive PVT metrics: reaction time (RT) via response speed (1/RT) and lapses (RT > 500 ms on the PVT-10 and > 355 ms on the PVT-3) by day (e.g., B2), by study phase (e.g., SR1-SR5), and by time point (1000–2000 h). PVT 1/RT correlations were generally stronger than those for lapses. The majority of correlations (48/50 [96%] for PVT lapses and 38/50 [76%] for PVT 1/RT) were values below 0.70, indicating validity issues. Overall, the PVT-3 demonstrated inadequate convergent validity with the “gold standard” PVT-10 across two different types of sleep loss and across extended recovery. Thus, the PVT-3 is not interchangeable with the PVT-10 for assessing behavioral attention performance during sleep loss based on the design of our study and the metrics we evaluated. Our results have substantial implications for design and measure selection in laboratory and applied settings, including those involving sleep deprivation.

Introduction

One of the most commonly utilized measures in sleep research is the Psychomotor Vigilance Test (PVT), a measure of vigilant attention that requires participants to rapidly respond to visual cues randomly presented within specified interstimulus intervals (ISIs) without incorrectly responding when no stimulus is present (Dinges and Powell, 1985; Basner and Dinges, 2011). The PVT is often considered a “gold standard” measure of sleep loss deficits and it is one measure by which biomarkers or predictors of such deficits are compared (Dawson et al., 2014; Basner et al., 2015; Grandner et al., 2018; Moreno-Villanueva et al., 2018). The 10-min PVT (PVT-10) is the standard version, but more recently shorter 5-min (PVT-5) and 3-min (PVT-3) versions have been developed, particularly for applied settings that have limited time for testing (Loh et al., 2004; Basner et al., 2011).

Two published studies have directly compared performance on the PVT-10 and PVT-3 in response to sleep loss without using any other experimental manipulations: (1) the PVT-3 development study (Basner et al., 2011), which compared the PVT-10 (computer-based) and PVT-3 (handheld device-based) across total sleep deprivation (TSD) and five nights of 4 h time-in-bed (TIB) sleep restriction (SR) and (2) a validation study of smartphone-based and tablet-based 3-min PVT versions, which were compared to a laptop-based PVT-10 following 38 h TSD (Grant et al., 2017). Grant et al. (2017) reported significantly faster reaction times (RTs) and fewer lapses (PVT-10: >500 ms RT; PVT-3: >355 ms RT) on the PVT-3 relative to the PVT-10. Basner et al. (2011) also reported significantly faster RTs on the PVT-3, though they found fewer lapses on the PVT-3 only when 500 ms RT, and not 355 ms RT, was used as the lapse threshold for both PVT versions. In a study without sleep loss, Jones et al. (2018) compared performance on the PVT-10, PVT-3, and PVT-5 on the same device across 7 days in elite female basketball players and found that participants had significantly faster RTs and fewer lapses on the PVT-3 relative to the PVT-10 (and PVT-5). Additionally, a recent study involving sleep deprivation, alcohol consumption, and rest in a pressure chamber to simulate in-flight conditions compared performance on a personal computer-based PVT-10 and a handheld computer-based PVT-3 (Benderoth et al., 2021). Benderoth et al. (2021) determined that the two PVT versions had good parallel form reliability for 1/RT and lower, but still significant, correlations were found for number of lapses. Three of these studies concluded that the PVT-3 was a valid alternative to the PVT-10 (Basner et al., 2011; Grant et al., 2017; Benderoth et al., 2021), while the fourth concluded the tests were not interchangeable (Jones et al., 2018). Thus, further research is needed to systematically compare the PVT-10 and PVT-3 using the same device in highly controlled sleep loss studies.

Though averaging data from multiple time points may be necessary to meet various statistical assumptions, doing so can result in the loss of important data relating to changes in performance across time. Of note, the aforementioned sleep loss studies comparing the PVT-10 and PVT-3 (Basner et al., 2011; Grant et al., 2017) utilized averaged data in many of their analyses, with both studies using different numbers of averaged time points, and neither study examining time-of-day variation during baseline or recovery. As a result, any information relating to discrepancies between the measures at various time points, due to possible time-of-day variation or increased homeostatic sleep pressure (Gundel et al., 2007; Fimm et al., 2015), is missing from these studies. Thus, it is important to examine individual time points to determine time-of-day variation in performance, when comparing the PVT-10 and PVT-3.

Little is known about PVT performance across extended recovery periods (e.g., more than one consecutive recovery night) following sleep deprivation (Yamazaki et al., 2021b). Some (Lamond et al., 2008; Moreno-Villanueva et al., 2018; Yamazaki et al., 2021b, 2022b) but not all studies (Wehrens et al., 2012) have demonstrated that PVT-10 performance returns to baseline levels following one night of recovery sleep after TSD. PVT-10 performance recovery following SR is more complex, with studies reporting mixed findings; these include a failure to completely return to baseline, a delayed return to baseline requiring more than one recovery night, or a return to baseline after one recovery night (Dinges et al., 1997; Banks et al., 2010; Pejovic et al., 2013; Yamazaki et al., 2021b). PVT-3 performance also returns to baseline after one night of TSD (Yamazaki et al., 2021a), but data on PVT-3 performance across recovery periods after SR are lacking. Furthermore, no studies to date have directly compared the profile of PVT-10 and PVT-3 performance across an extended recovery period of long duration (e.g., multiple consecutive nights of 12 h) after sleep loss.

Given that prior studies found significant differences between the PVT-3 and PVT-10, that no sleep loss studies administered both versions on the same device or included an extended recovery period, that most analyses utilized averaged data, and that the PVT-3 is increasingly utilized (Basner and Rubinstein, 2011; Basner et al., 2011, 2018; Hilditch et al., 2016; Grant et al., 2017; Behrens et al., 2019; Hansen et al., 2019; Yamazaki et al., 2021a), there is a significant need for studies that compare the PVT-3 to the PVT-10 on the same device in the context of different types of commonly experienced sleep loss (TSD and SR) and with an extended recovery period. Further, no study to date has evaluated the convergent validity of the PVT-3 relative to the PVT-10 while considering repeated measurements (1) across an entire sleep deprivation study, (2) across an extended recovery period, or (3) with the measures administered on the same device.

The current study utilized the repeated measures correlation (rmcorr) technique (Bland and Altman, 1995) to examine for the first time the intra-individual (within-subject) association between the PVT-10 and PVT-3 across time. This statistical method reveals the common intra-individual linear relationship, which is considered representative of the convergent validity of the measures between PVT-10 and PVT-3 metrics. Assuming the PVT-3 and PVT-10 measures are comparable in their ability to assess performance and detect change across time, it was hypothesized that relatively strong rmcorr effect sizes for comparisons between the measures for PVT lapses and PVT 1/RT would be detected, and that these relationships would remain strong regardless of time of day, since the measures should comparably capture any variations in performance due to time effects. It was also hypothesized that all correlations would be stronger for PVT 1/RT relative to PVT lapses as well as stronger during sleep deprivation relative to baseline or recovery periods for both PVT lapses and PVT 1/RT. Lastly, it was hypothesized that correlation patterns for both PVT lapses and PVT 1/RT across the extended recovery period would not differ between those exposed to TSD versus those exposed to SR prior to recovery.

Materials and Methods

Participants

Eighty-three healthy adults were recruited in response to study advertisements. Participants reported habitual nightly sleep durations between 6.5 and 8.5 h, with habitual bedtimes between 2200 and 0000 h, and habitual awakenings between 0600 and 0930 h; these were confirmed via wrist actigraphy prior to study entry. Participants did not engage in habitual napping and did not present with a sleep disorder. They did not have any acute or chronic psychological and medical conditions. Participants did not take regular medications (except for oral contraceptive use in females) and were non-smokers with body mass index values between 17.3 and 30.9 kg/m². See Yamazaki et al. (2021b) for additional details on recruitment methods, inclusion and exclusion criteria, sample characteristics, general study procedures, and participant monitoring. The protocol was approved by the University of Pennsylvania’s Institutional Review Board. All participants received compensation for their participation and provided written informed consent in accordance with the Declaration of Helsinki.

Procedures

Participants engaged in a 13-day laboratory study during which they received daily checks of vital signs and symptoms by nurses (with a physician on call). The 13-day study consisted of two baseline nights (B1-B2, 10 h [2200–0800 h] and 12 h [2200–1000 h] TIB, respectively) followed by randomization to either five nights of 4 h TIB SR (SR1-SR5, 0400–0800 h, N = 41; Condition A) or 36 h TSD (wakefulness from 1000 to 2200 h the following day, N = 42; Condition B), both of which were followed by four nights of 12 h TIB (2200–1000 h) recovery sleep (R1-R4). After R1-R4, participants in the initial SR condition (Condition A) were exposed to 36 h TSD and those in the initial TSD condition (Condition B) were exposed to five nights of 4 h TIB SR. Participants were randomized in groups of four and blinded to their condition assignment until the evening after the second baseline night.

A computer-based neurobehavioral test battery was administered every 2 h during wakefulness throughout the study. Between test bouts participants were ambulatory and permitted to perform sedentary activities; however, they were not allowed to exercise. Ambient temperature was maintained between 22 and 24°C. Laboratory light levels remained constant at <50 lux during scheduled wakefulness and <1 lux during scheduled sleep periods (Yamazaki and Goel, 2020; Brieva et al., 2021; Yamazaki et al., 2021b, 2022a; Casale et al., 2022).

Neurobehavioral Measures

The computer-based neurobehavioral test battery included two widely used versions of a measure of behavioral attention: the 10-min PVT (Lim and Dinges, 2008; Basner and Dinges, 2011) and the 3-min PVT (Basner et al., 2011). Both PVT tests were administered in an environment with minimal distractions. The PVT-10 was administered before the PVT-3 during all test bouts for all participants. Participants were instructed to hit the space bar as quickly as possible after they were presented with a visual cue on the screen. Visual cues were randomly presented within specified interstimulus intervals (ISIs, or the period between the previous response and the next stimulus) specific to each measure version; the PVT-10 ISI was 2–10 s while the PVT-3 ISI was 1–4 s (Basner et al., 2011). Outcome measures were the number of lapses [RT > 500 ms on the PVT-10 and > 355 ms on the PVT-3 (Basner et al., 2011)] and response speed (mean 1/RT, henceforth referred to as 1/RT).

Statistical Analysis

Although repeated measures data are inherently valuable, their analyses can be challenging due to frequent violation of the assumptions of various statistical procedures (Keselman et al., 2001; Park et al., 2009; Bakdash and Marusich, 2017). The methods for correcting these violations, such as averaging, can result in the loss of otherwise meaningful data (Bland and Altman, 1995), and conducting analyses despite violations can result in misleading or uninterpretable results (Glass et al., 1972; Hubbard, 1978; Kenny and Judd, 1986; Scariano and Davenport, 1987). As such, instead of using Pearson’s correlations, we used repeated measures correlations [rmcorr (Bakdash and Marusich, 2017; Bakdash and Marusich, 2020)], to compare PVT-10 lapses to PVT-3 lapses and to compare PVT-10 1/RT to PVT-3 1/RT. Of note, we specifically used correlational analyses because convergent validity is exclusively assessed via correlation (Chin and Yao, 2014). Rmcorr analyses were conducted by day (e.g., B2, SR1, R3, etc.), by study phase (e.g., SR1-SR5, R1-R4, etc.), and by time point (e.g., 1000 h, 1200 h, etc.) across the entire 13-day study and across recovery only (R1-R4) for Condition A and Condition B using the rmcorr R package (Bakdash and Marusich, 2020). By day analyses included data from the 1000–2000 h time points for B2 and for R1-R4. To retain as much data as possible, by day analyses for SR1-SR4 included early morning and late-night time points (e.g., 0800–0200 h the day after each night of SR). For SR5, only the 0800 h through 2000 h time points were collected given the start of R1 occurred immediately after SR5. TSD day was defined as 2200 h on the night of TSD through 2000 h the next day. By study phase analyses included all time points across each period (e.g., R1-R4). For Condition A and Condition B, the B2-R4 study phase included all time points from B2 through R4. The all-study days study phase included all time points from B2 through the end of TSD (2000 h) for Condition A and through the end of SR5 (2000 h) for Condition B.

Rmcorr confidence intervals (CIs) were determined using bootstrapping with replacement and using 1,000 samples (Shan et al., 2021). To meet rmcorr’s linearity assumption, PVT lapses were natural log transformed [nlog(lapses + 0.5)] for the by time point analyses to account for non-linear associations apparent with visual plot inspection (Cohen et al., 2003; Bakdash and Marusich, 2017). The False Discovery Rate correction of Benjamini-Hochberg (Benjamini and Hochberg, 1995) was applied to all rmcorr p-values to account for multiplicity (Gbyl et al., 2021), but notably, this did not alter the significance of any test. Thus, unadjusted p-values are reported. Rmcorr coefficient (r_rm) magnitude was conservatively interpreted using the following ranges: 0.00–0.29, negligible; 0.30–0.49, weak; 0.50–0.69, moderate; 0.70–0.89, strong; and 0.90–1.00, very strong (Carlson and Herdman, 2010; Mukaka, 2012; Post, 2016; Fernández-Marcos et al., 2018; Schober et al., 2018; Yadav, 2018). Furthermore, as per recommendations for interpreting convergent validity coefficients (Carlson and Herdman, 2010; Post, 2016), r_rm values < 0.50 indicated the PVT-3 showed inadequate convergent validity with the PVT-10, r_rm values > 0.70 indicated adequate convergent validity between the measures and r_rm values between 0.50 and 0.70 indicated validity issues between the measures. All statistical analyses were conducted in the R software environment (R Core Team, 2020). All analyses were two-sided with a p-value < 0.05 considered statistically significant. No participants were excluded from the analyses. Pairwise deletion was used for all analyses to minimize data loss since single data points were missing at random throughout the study; the degrees of freedom (df) in Tables 1–4 serve as a proxy for the amount of data lost based on the formula df = N(k-1) – 1, where N is the total number of participants and k is the number of repeated measures per participant.

TABLE 1

Table 1. PVT-10 and PVT-3 lapses rmcorr results by day and by study phase.

Results

Participant Characteristics

Eighty-three healthy adults (mean ± SD, 34.7 ± 8.9 years; 36 females) (aged 21–50 years, 72.3% African American; 43.4% female) participated in the study, with N = 41 participants randomly assigned to Condition A (SR first) and N = 42 participants randomly assigned to Condition B (TSD first). There were no significant differences between conditions in age, BMI, chronotype, or the percentage of participants who were female or African American (Yamazaki et al., 2021b). There were also no significant differences between conditions in pre-study actigraphic sleep duration, onset, offset, or midpoint, or in baseline polysomnographic total sleep time or sleep onset latency (Yamazaki et al., 2021b).

Psychomotor Vigilance Test Lapses

Tables 1, 2 show r_rm, degrees of freedom, p-values, bootstrapped 95% CIs, and median and interquartile range (IQR) values for the PVT-3 and PVT-10 separately for the PVT lapses analyses. Median values were calculated for each value represented in the tables (i.e., 1000 h, B2, SR1-SR5, etc.) for all participants within each condition. We present medians, rather than means, since they are less susceptible to skewing by outliers and better reflect the central tendency of these data. Visualization is important for interpreting rmcorr results (Bakdash and Marusich, 2017; Schober et al., 2018), and as such, we have included select plots (Figures 1–3) as examples of the range of observed effects for each analysis type.

TABLE 2

Table 2. PVT-10 and PVT-3 transformed lapses rmcorr results by time point.

FIGURE 1

Figure 1. Rmcorr plots of repeated-measures correlations between 10-min Psychomotor Vigilance Test (PVT-10) and 3-min Psychomotor Vigilance Test (PVT-3) lapses by study phase for Condition A (A) and Condition B (B). Each color represents a distinct participant with each point showing performance on both measures at one time point while the corresponding line shows the rmcorr fit for that participant (Bakdash and Marusich, 2020; R Core Team, 2020). The gray dashed line represents the regression line obtained by ignoring repeated measurements and treating the data as independent observations; r_rm represents the common within-individual association (rmcorr). Rmcorr effect sizes were interpreted as follows: 0.00–0.29, negligible; 0.30–0.49, weak; 0.50–0.69, moderate; 0.70–0.89, strong; and 0.90–1.00, very strong. Included time points for study phases were as follows: sleep restriction day one from 0800 h through sleep restriction day five at 2000 h (SR1-SR5) and recovery day one from 1000 h through recovery day four at 2000 h (R1-R4).

By Study Phase

Overall, Condition B yielded stronger correlations relative to Condition A, and all the by study phase analyses for PVT lapses were significant (Table 1). The r_rm for B2-R4 was strong for Condition B and moderate for Condition A. SR1-SR5 and R1-R4 were weak for Condition A and moderate for Condition B. Interestingly, the entire study (all-study) r_rm was in the moderate range for both conditions. Figure 1 presents rmcorr plots for the SR1-SR5 and R1-R4 analyses for Condition A and Condition B.

By Day

The by day rmcorr analyses revealed a wide range of rmcorr coefficient values for PVT lapses across study days (Table 1). The only correlation that was strong was R4 for Condition B. For Condition A, only R1 demonstrated a moderate correlation. For Condition B, TSD and SR1-SR3 demonstrated moderate correlations. For Condition A, weak correlations were observed for SR1-SR5 and for TSD. For Condition B, only SR4 and SR5 demonstrated weak correlations. R2 correlations were in the negligible range for both conditions while R4 was negligible for Condition A and R3 was negligible for Condition B. Neither condition demonstrated significant correlations at B2 while R3 was non-significant for Condition A and R1 was non-significant for Condition B. Figure 2 presents B2, SR5, TSD, and R4 rmcorr plots for Condition A and Condition B. Notably, most individual lines approximate the overall regression line except for B2 for both conditions.

FIGURE 2

Figure 2. Rmcorr plots of repeated-measures correlations between 10-min Psychomotor Vigilance Test (PVT-10) and 3-min Psychomotor Vigilance Test (PVT-3) lapses by study day for Condition A (A) and Condition B (B). Each color represents a distinct participant with each point showing performance on both measures at one time point while the corresponding line shows the rmcorr fit for that participant (Bakdash and Marusich, 2020; R Core Team, 2020). The gray dashed line represents the regression line obtained by ignoring repeated measurements and treating the data as independent observations; r_rm represents the common within-individual association (rmcorr). Rmcorr effect sizes were interpreted as follows: 0.00–0.29, negligible; 0.30–0.49, weak; 0.50–0.69, moderate; 0.70–0.89, strong; and 0.90–1.00, very strong. Included time points for each day were as follows: baseline day 2 (B2) from 1000 to 2200 h; sleep restriction day 5 (SR5) from 0800 to 2000 h; total sleep deprivation (TSD) from 2200 to 2000 h; and recovery day 4 (R4) from 1000 to 2000 h.

By Time Point

The entire study (all-study) duration time point rmcorr analyses for PVT lapses were all significant for Condition A and Condition B (Table 2). All r_rm values were moderate for Condition B, while only the 1000, 1200, and 1600 h time point correlations were moderate for Condition A (the 1800 and 2000 h time points were weak). The recovery (R1-R4) time point r_rm coefficients were weaker than the all-study time point coefficients. Across recovery for Condition A, the 1200 and 2000 h time point correlations were in the weak range while the 1000, 1600, and 1800 h time point correlations were in the negligible range or were non-significant. For Condition B, the all-study by time point correlations were negligible across recovery while the 1200, 1600, and 1800 h time point correlations were all non-significant. Figure 3 presents rmcorr plots for 1800 h by time point analyses as an example of moderate, weak and negligible r_rm correlations by time point across the entire study and across recovery for both conditions.

FIGURE 3

Figure 3. Rmcorr plots of repeated-measures correlations between 10-min Psychomotor Vigilance Test (PVT-10) and 3-min Psychomotor Vigilance Test (PVT-3) transformed lapses at 1800 h across the entire study (All Study Days) and across only recovery days 1–4 (R1-R4) for Condition A (A) and Condition B (B). Each color represents a distinct participant with each point showing performance on both measures at one time point while the corresponding line shows the rmcorr fit for that participant (Bakdash and Marusich, 2020; R Core Team, 2020). The gray dashed line represents the regression line obtained by ignoring repeated measurements and treating the data as independent observations; r_rm represents the common within-individual association (rmcorr). Rmcorr effect sizes were interpreted as follows: 0.00–0.29, negligible; 0.30–0.49, weak; 0.50–0.69, moderate; 0.70–0.89, strong; and 0.90–1.00, very strong. Values were transformed by adding 0.5 and natural log transforming the result.

Psychomotor Vigilance Test 1/RT

Tables 3, 4 show r_rm, degrees of freedom, p-values, bootstrapped 95% CIs, and median and IQR values for the PVT-3 and PVT-10 separately for the PVT 1/RT analyses. Median values were calculated for each value represented in the tables (i.e., 1000 h, B2, SR1-SR5, etc.) for all participants within each condition. As noted in section “Psychomotor Vigilance Test Lapses,” we present medians, rather than means, since they are less susceptible to skewing by outliers and better reflect the central tendency of these data. Select rmcorr plots as examples of the range of observed effects for each analysis type are included (Figures 4–6).

TABLE 3

Table 3. PVT-10 and PVT-3 1/RT rmcorr results by day and by study phase.

TABLE 4

Table 4. PVT-10 and PVT-3 1/RT rmcorr results by time point.

FIGURE 4

Figure 4. Rmcorr plots of repeated-measures correlations between 10-min Psychomotor Vigilance Test (PVT-10) and 3-min Psychomotor Vigilance Test (PVT-3) response speed (1/RT) by study phase for Condition A (A) and Condition B (B). Each color represents a distinct participant with each point showing performance on both measures at one time point while the corresponding line shows the rmcorr fit for that participant (Bakdash and Marusich, 2020; R Core Team, 2020). The gray dashed line represents the regression line obtained by ignoring repeated measurements and treating the data as independent observations; r_rm represents the common within-individual association (rmcorr). Rmcorr effect sizes were interpreted as follows: 0.00–0.29, negligible; 0.30–0.49, weak; 0.50–0.69, moderate; 0.70–0.89, strong; and 0.90–1.00, very strong. Included time points were as follows: sleep restriction day one from 0800 h through sleep restriction day five at 2000 h (SR1-SR5) and recovery day one from 1000 h through recovery day four at 2000 h (R1–R4).