The Norwegian Adaptation of the Big Five Inventory-2

Føllesdal, Hallvard; Soto, Christopher J.

doi:10.3389/fpsyg.2022.858920

ORIGINAL RESEARCH article

Front. Psychol. , 18 May 2022

Sec. Quantitative Psychology and Measurement

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.858920

The Norwegian Adaptation of the Big Five Inventory-2

$\r\nHallvard Fllesdal*$ Hallvard Føllesdal^1* $Christopher J. Soto\r\n$ Christopher J. Soto²

¹Department of Organizational Behaviour, BI Norwegian Business School, Oslo, Norway
²Department of Psychology, Colby College, Waterville, ME, United States

Two studies were conducted to assess the psychometric properties of scores from the Norwegian adaptation of the Big Five Inventory-2 (BFI-2). In Study 1, the BFI-2 was translated to Norwegian and the scores from a convenience sample (N = 606) demonstrated good psychometric properties. BFI-2 scores from subsamples correlated in expected ways with self- and other ratings of the Big Five, and with self-ratings of empathic concern and perspective taking. In Study 2, after some minor improvements in translation, the psychometric properties of BFI-2 scores were assessed in a new sample (N = 409). Results from random intercept EFA of scores supported the proposed model. The psychometric properties of two shorter versions of the inventory, the BFI-2-S and BFI-2-XS, were also examined. Overall, the results suggest that the Norwegian adaptation of the BFI-2 provide reliable and valid scores.

Introduction

The Big Five model of personality describes five broad personality domains, often termed Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness (John and Srivastava, 1999). Each domain encompasses a broad group of personality traits, or relatively stable patterns of thinking, feeling, and behaving. In personality research, the Big Five Inventory (BFI; John and Srivastava, 1999) has become one of the most frequently used measures of Big Five. Recently, a new version of this inventory was developed: the Big Five Inventory-2 (BFI-2; Soto and John, 2017a). The BFI-2 is freely available for use in research, and has been adapted to several languages, which may facilitate personality research around the world. In the present research, we will assess whether the good psychometric properties of scores from the original BFI-2 extend to the Norwegian adaptation of the BFI-2.

Compared to the original BFI, the BFI-2 has been improved in several ways (for a thorough description, see Soto and John, 2017a). First, for each Big Five domain, the BFI-2 measures three facets that frequently appear in various personality hierarchies. One facet is considered factor pure, in that “previous research has identified [it] as central to its own domain and independent from the other four domains” (Soto and John, 2017a, p. 5). The two other facets are complementary facets, which means that they “are prominent in the personality literature and represented in the original BFI’s item content” (Soto and John, 2017a, p. 5). Previous research has shown that the inclusion of facet-level traits can improve the descriptive and predictive power of personality inventories (Hofstee, 1992; Hendriks et al., 1999; Paunonen and Ashton, 2001; Ashton et al., 2014). Second, to control for acquiescence, the items in each scale are content-balanced, that is, they consist of an equal number of positively and negatively keyed items. Third, new labels were introduced for two of the domains to better reflect their content: Negative Emotionality (instead of Neuroticism) and Open-Mindedness (instead of Openness). Each of the Big Five personality domains is measured by three facets, which are measured by four items each (i.e., 12 items per domain). Overall, the BFI-2 consists of 60 items, where 18 items are identical to items in the original BFI, 14 are revised BFI items, and 28 are entirely new items.

Many studies utilizing various adaptations of the BFI-2 into different languages have provided strong support for the validity of BFI-2 scores. For instance, several studies have supported the convergent and divergent validity of scores (provided by either the full or brief versions), by relating them to a range of other personality inventories, like the BFAS, NEO PI-R, NEO-FFI, Big Five Mini-Markers, and peer ratings of the BFI-2 in English (Soto and John, 2017a); the NEO PI-R in German (Rammstedt et al., 2018); the Big Five Mini-Markers in Danish (Vedel et al., 2021); the IPIP Big Five scales in Dutch (Denissen et al., 2019); and the Big Five factor markers in Russian (Shchebetenko et al., 2019). Studies have also demonstrated that facet scores may outperform broad domain scores in the prediction of various criteria, like behavioral and psychological outcomes (Soto and John, 2017a); affective states, self-endorsed values, and satisfaction with life outcomes (Denissen et al., 2019); and educational attainment, income, health, and life satisfaction (Danner et al., 2021). Thus, a large body of evidence supports the validity of scores provided by various adaptations of the BFI-2 into different languages.

The aim of the present research was to develop a Norwegian adaptation of the BFI-2 and assess the psychometric properties of scores. In Study 1, the BFI-2 was translated to Norwegian and the psychometric properties of scores assessed. In Study 2, the good psychometric properties of BFI-2 scores were confirmed in a new sample, after some minor improvements in translation.

Study 1

The aim of Study 1 was to translate the BFI-2 into Norwegian and assess the psychometric properties of scores. That is, we wanted to assess the factor structure and reliability of scores; their convergent and divergent validity in relation to self- and others’ ratings of the Big Five; and their predictive validity in relation to empathic concern and perspective taking, two constructs that have been frequently used as indicators of empathy (Davis, 1980, 1983).

While few studies have looked at how empathic concern and perspective taking are related to facets of Big Five, several studies have examined their relationship with the broad Big Five domains (Graziano et al., 2007; Mooradian et al., 2011; Habashi et al., 2016; Melchers et al., 2016; Neumann et al., 2016; Song and Shi, 2017; Guilera et al., 2019). These studies found that empathic concern was positively and strongly related to agreeableness (r = 0.31–0.63); but also to extraversion (r = 0.05–0.29), openness (r = 0.04–0.22), conscientiousness (r = 0.05–0.22); and neuroticism (r = −0.04 to 0.17). Perspective taking on the other hand, was positively related to agreeableness (r = 0.22–0.43), openness (r = 0.20–0.30), conscientiousness (r = 0.11–0.28), and extraversion (r = 0.00–0.15), while negatively related to neuroticism (r = −0.13 to −0.33).

Based on these findings, we expect that empathic concern should be positively related to several Big Five domains, especially agreeableness and extraversion. Perspective taking should also be positively related to several Big Five domains, but negatively related to negative emotionality. In the present study, we will also examine whether the BFI-2 facets may outperform the broad domains in predicting these constructs.

Materials and Methods

Participants and Procedure

A convenience sample (N = 601) was used, consisting of 425 participants from an executive education program at a Norwegian business school, ranging in age from 25 to 65 years old (M = 42.22, SD = 7.39), mainly consisting of women (72.2% women, 25.4% men, 2.4% did not report gender, no difference in mean age between men and women); 28 Master of Science students in an organizational psychology class at a Norwegian business school, ranging from 22 to 39 years (M = 25.36, SD = 0.44, gender not reported to preserve anonymity); and 148 sales employees from a Norwegian company (neither age nor sex were reported in order to preserve anonymity). The students completed the BFI-2 in a paper and pencil version, while sales employees completed an online version. Participants and colleagues also completed other questionnaires not reported on here.

Measures

Big Five Inventory-2

The process of translating the BFI-2 into Norwegian was conducted in collaboration with the authors of the original BFI-2, who had developed detailed translation guidelines for this purpose. As some BFI-2 items are identical with items in the BFI, the translation was informed by results from a principal component analysis (PCA) of item scores from the Norwegian adaptation of the BFI (Engvik and Føllesdal, 2005) in a sample of 1,767 Norwegians (H. Engvik, personal communication, July 1, 2016). The factor loadings indicated that some item translations might be improved. The BFI-2 items were translated to Norwegian by the principal author in cooperation with a translator, and back-translated to English by a bilingual psychologist, and the final translation was reviewed by the authors of the original BFI-2. Informed by the BFI-2 authors’ experiences with adapting this measure into other languages, and the results from the PCA of item scores from the Norwegian adaptation of BFI, alternative translations of six items were included and tested out in the first version of the inventory. For instance, results from PCA of BFI data indicated that the translation of one of the items (“Er selvhevdende” [“Has an assertive personality”]) loaded most strongly (and negatively) on Agreeableness, rather on Extraversion which was the intended domain. An alternative translation of this item was therefore included (“Er selvsikker, gjør seg gjeldende”). Likewise, alternative translations of five other items were included to be tested out. A five-point Likert scale was used, with the labels helt uenig [totally disagree], litt uenig [somewhat disagree], nøytral/ingen oppfatning [neutral/no opinion], litt enig [somewhat agree], and helt enig [totally agree].

Big Five Inventory

Five days after completing the BFI-2, a subsample of executive students (n = 209) completed the 44-item Norwegian adaptation of BFI (John and Srivastava, 1999), as part of a larger research project. We utilized these data to obtain preliminary evidence for the convergent and divergent validity of BFI-2 scores. Although the BFI is the precursor of the BFI-2, only about one-third of the items are identical to items in the BFI, while two-thirds of the items in BFI-2 are new or revised BFI items. In Norway, the BFI is frequently used to measure the Big Five, and it has been found to provide scores with good psychometric properties (Engvik and Føllesdal, 2005). A 7-point rating scale was used (which is standard in the Norwegian adaptation of BFI), ranging from helt uenig [totally disagree] to helt enig [totally agree], with no labels for the scale points in between. In the present sample, Cronbach’s alpha for the domain scores were 0.84 (Extraversion), 0.73 (Agreeableness), 0.78 (Conscientiousness), 0.82 (Neuroticism), and 0.84 (Openness).

Big Five Inventory-20

A subsample of executive students (n = 279) were also rated on the BFI-20 by an average of 4.5 colleagues (n = 1246) at work (the participants worked in different organizations, and each participant recruited raters among their colleagues (supervisor, subordinates, and same level colleagues). These data were also collected as part of a larger research project but utilized in the present study to provide preliminary evidence of convergent and divergent validity of BFI-2 scores. The BFI-20 is a brief 20-item version of the Norwegian adaptation of the BFI, which has been demonstrated to provide scores with adequate structural and predictive validity; and reliability coefficients in the range of 0.57–0.78 (Engvik and Clausen, 2011). Only four out of 20 items in the BFI-20 are identical to BFI-2 items, while six of the items in BFI-20 were slightly revised in the BFI-2. A 7-point rating scale was used, ranging from helt uenig [totally disagree] to helt enig [totally agree], with no labels for the scale points in between.

Empathic Concern and Perspective Taking

Immediately after completing the BFI-2, 220 of the participants completed two seven-item scales selected from the Interpersonal Reactivity Index (Davis, 1980). These data were also collected as part of a larger research project but utilized in the present study to examine the predictive validity of BFI-2 scores. The selected scales, Empathic Concern and Perspective Taking, are frequently used as measures of empathy. Empathic Concern “assesses ‘other-oriented’ feelings of sympathy and concern for unfortunate others” (Davis, 1983, p. 114), and an example item is “I often have tender, concerned feelings for people less fortunate than me.” Perspective Taking measures “the tendency to spontaneously adopt the psychological point of view of others” (Davis, 1983, p. 113–114), and an example item is “I believe that there are two sides to every question and try to look at them both.” A 5-point rating scale was used, ranging from passer ikke [inaccurate] to passer helt [accurate], with no labels for the scale points in between. Cronbach’s alpha in the present sample were 0.77 and 0.76 for Empathic Concern and Perspective Taking, respectively, and the two scale scores were only modestly correlated (r = 0.23, p = 0.000).

Results and Discussion

Results from reliability analyses and item-total correlations of the 66 candidate items were used to select the final set of 60 items for the Norwegian BFI-2. After selecting this final item set, we conducted identical analyses to Soto and John (2017a) to be able to compare the psychometric properties of scores with those reported for the original, English-language BFI-2 (for an explanation of the rationale for the various analyses, see Soto and John, 2017a).

In order to assess the structure in the BFI-2 scores, a Principal Component Analysis (PCA) was performed. Although PCA may not be optimally suited to assess the latent structure in scores (Conway and Huffcutt, 2003), this type of analysis was chosen in order to enable comparison with results reported for the original BFI-2 (Soto and John, 2017a). PCA were conducted on item scores after within-person-centering, in order to control for acquiescence. That is, each item score for each person was centered around their within-person mean across all 60 items (for an explanation of this approach, see Soto and John, 2017a). PCA with varimax rotation, requesting five components, revealed that 59 of the 60 item scores had their highest loading on the intended Big Five component (Table 1). Moreover, a PCA with varimax rotation of the 15 facet scores (Table 2) showed that all facets had their highest loading on the intended Big Five domains. Cronbach’s alpha ranged from 0.79 to 0.86 for the domain scores (Table 3) and from 0.57 to 0.77 for the facet scores (Table 4). In order to assess the similarity of the principal components obtained in the present study with those reported for the original BFI-2 (i.e., for the Internet sample, Soto and John, 2017a, p. 12), Tucker’s factor congruence was estimated for pairs of corresponding components, using the R psych package (Revelle, 2021). A congruence coefficient above 0.95 indicates that components can be considered equal (Lorenzo-Seva and ten Berge, 2006). The coefficients were 0.96–0.97 for components derived from item scores, and 0.98–0.99 for components derived from the facets. These findings indicate that the components derived from scores from the Norwegian adaptation of the BFI-2 can be considered equivalent to corresponding components in the original BFI-2. Overall, the psychometric properties are highly similar to those reported for the original BFI-2, and the structural validity of scores was supported.

TABLE 1

Table 1. Loadings from a principal component analysis of the 60 within-person centered Norwegian BFI-2 items (Study 1 and 2).

TABLE 2

Table 2. Loadings from a principal component analysis of scores from the 15 BFI-2 facets (Study 1 and 2).

TABLE 3

Table 3. Reliability estimates (with confidence intervals) and intercorrelations for scores from BFI-2 domains (Study 1 and 2).

TABLE 4

Table 4. Reliability, descriptive statistics, and intercorrelations for scores from BFI-2 facets (Study 1 and 2).

Convergent and Divergent Validity

As mentioned previously, both self–ratings with the BFI and other ratings with the BFI-20 were collected for a subsample of executive students as part of a larger research project. In the present study, these data were utilized to provide preliminary evidence of convergent and divergent validity of scores from the Norwegian adaptation of BFI-2. The results are presented in Table 6. First, we examined the correlations between BFI-2 scores and self-ratings of personality with the BFI. The BFI-2 domain scores were strongly related to corresponding self-rated BFI domain scores, with correlation coefficients ranging from 0.72 to 0.83, averaging 0.77. Moreover, the BFI-2 domain scores were weakly related to non-corresponding self-rated BFI domain scores, with correlations ranging from −0.25 to 0.24, and absolute correlations averaging 0.13. The average correlation between corresponding domains of 0.77 was somewhat lower than the corresponding average correlation of 0.92 reported by Soto and John (2017a). In their study, however, the BFI and BFI-2 were administered together, while in the present study these questionnaires were administered 5 days apart, which may have attenuated the correlations between the scores. Moreover, in the present study, a 7-point scale was used with the BFI, with rating labels on the endpoints only, which is standard for the Norwegian adaptation of the BFI.

Next, we examined the correlations between BFI-2 scores and colleagues’ ratings with the BFI-20. Due to the nested nature of data (raters nested within participants), a multilevel model was specified and analyzed using Mplus 8.7 using manifest personality scores. The intraclass correlations (ICC = 0.27–0.44) indicated that a substantial amount of variance in personality ratings was due to differences among rated targets, supporting the decision to use a multilevel model. The relationships between self-ratings and colleagues’ ratings of personality were assessed on the between-group level in the model, as colleagues are nested within participants. The results are presented in Table 5. As expected, the self-rated BFI-2 domain scores correlated most strongly with the corresponding domain scores rated by colleagues (average r = 0.47, range = 0.38–0.53), and weaker with the non-corresponding domains rated by colleagues (average |r| = 0.13, range −0.27 to 0.32). Overall, the pattern of correlations between self and other ratings of Big Five supports the convergent and divergent validity of BFI-2 scores.

TABLE 5

Table 5. BFI-2 scores and their correlations with self-ratings on BFI and colleagues’ ratings on BFI-20.

Predicting Empathic Concern and Perspective Taking

To further assess the validity of scores, we examined how the BFI-2 facet and domain scores could predict self-ratings of Empathic Concern and Perspective Taking, and whether the BFI-2 facet scores could outperform the broad domains. As facets generally provide scores with lower reliability than domain scores (due to fewer items) it may be challenging to compare their predictive validity. Therefore, both facet and domain scores were corrected for measurement error by modeling them as latent variables in Mplus, by specifying the residual variance for each variable x to variance_x × (1 − reliability_x), based on the estimated reliability of scores in the present sample (Bollen, 1989). The results are presented in Table 6; in the following text, the results for the corrected (latent) variable scores are reported.

TABLE 6

Table 6. Self-ratings of BFI-2 and correlations with empathic concern and perspective taking.

The pattern of correlations between BFI-2 domain scores and Empathic Concern were in line with previous studies. That is, among the Big Five domains, Empathic Concern correlated most strongly with Agreeableness (Graziano et al., 2007; Mooradian et al., 2011; Melchers et al., 2016; Neumann et al., 2016; Song and Shi, 2017; Guilera et al., 2019). This relationship, however, seems to be mostly driven by the facet Compassion, as this correlation (r = 0.71) was substantially higher than for overall Agreeableness (r = 0.40). This is reasonable, as Compassion and Empathic Concern are conceptually very similar constructs, and because Compassion is considered a factor-pure facet of Agreeableness (Soto and John, 2017a). Empathic Concern was also positively correlated with the domain scores for Negative Emotionality and Open-mindedness, in line with findings reported by Song and Shi (2017). Moreover, all facets within these domains correlated positively with Empathic Concern. For Extraversion, the correlation with Empathic Concern was not significant, in contrast to findings reported in previous studies (Mooradian et al., 2011; Melchers et al., 2016; Neumann et al., 2016; Guilera et al., 2019). By examining the Extraversion facets, however, the scores from Energy Level were positively and significantly correlated with Empathic Concern, which underscores the importance of measuring facets in addition to domain scores.

The BFI-2 facets also seem to outperform the broad domains in predicting Empathic Concern. For four of the Big Five domains (all except Negative Emotionality), the correlations with Empathic Concern were stronger for facet scores than for domain scores. Moreover, the Big Five explained 50% of the variance (R²_Adj. = 0.498, p = 0.000) in Empathic Concern, with Extraversion (β = 0.20, p = 0.008), Agreeableness (β = 0.57, p = 0.000), Negative Emotionality (β = 0.55, p = 0.000), and Open-Mindedness (β = 0.17, p = 0.017) as significant predictors. However, when using only one facet score from each domain as predictors (the facet score from each domain that was most strongly, and significantly, correlated with Empathic Concern), the four facets (Energy Level, Compassion, Anxiety, and Aesthetic Sensitivity) explained 56% of the variance (R²_Adj. = 0.555, p = 0.000), with Compassion (β = 0.61, p = 0.000) and Anxiety (β = 0.20, p = 0.001) as significant predictors. It is important to note, however, that such an analysis, where we select as predictors the facets that are most strongly correlated with the outcome, may capitalize on chance and inflate the estimated explained variance (Ones and Viswesvaran, 1996). Future studies should therefore try to replicate these findings.

Turning to Perspective Taking, the correlation pattern with BFI-2 domain scores was also in line with previous studies. The scores from Perspective Taking were most strongly correlated with Agreeableness, as has been found in previous studies (Mooradian et al., 2011; Melchers et al., 2016; Neumann et al., 2016; Song and Shi, 2017). In contrast to Empathic Concern (which correlated most strongly with one facet within Agreeableness), Perspective Taking correlated positively with all three facets within Agreeableness. Moreover, Perspective Taking was negatively correlated with Negative Emotionality and positively correlated with Open-mindedness, as found in previous studies (Mooradian et al., 2011; Melchers et al., 2016; Song and Shi, 2017). Perspective Taking was also uncorrelated with the domain scores of Extraversion and Conscientiousness, while positively correlated with one facet score within each of these domains (Energy Level and Responsibility, respectively).

The scores from BFI-2 facets also seem to outperform broad domains in predicting Perspective Taking. That is, for all domains, except Agreeableness, the facet scores provided higher correlations than the domain scores. Regression analyses revealed that the Big Five explained 28% of the variance (R²_Adj. = 0.283, p = 0.000) with Agreeableness (β = 0.21, p = 0.000) and Open-Mindedness (β = 0.20, p = 0.000) as the only significant predictors. When using only the one strongest facet from each domain as a predictor, the five facets (Energy Level, Respectfulness, Responsibility, Depression, and Intellectual Curiosity) together explained 41% of the variance (R²_Adj. = 0.39, p = 0.000), with Respectfulness (β = 0.37, p = 0.001) and Intellectual Curiosity (β = 0.55, p = 0.000) as significant predictors.

Overall, the results suggest that the BFI-2 scores predict empathic concern and perspective taking in expected ways, supporting the construct validity of BFI-2 scores. Moreover, facet scores seem to be more important predictors than domain scores, but this pattern was not entirely consistent. For instance, the facets Compassion and Intellectual Curiosity outperformed their respective broad domains (Agreeableness and Open-Mindedness) in predicting Empathic Concern and Perspective Taking, respectively. For Negative Emotionality and Agreeableness, however, the domain scores outperformed the respective facet scores in predicting Empathic Concern and Perspective Taking, respectively. This illustrates that facets may be more important than domain scores in some instances, and not in others, which might be due to the degree of conceptual correspondence between predictor and criterion (for a discussion, see e.g., Judge et al., 2013). Overall, however, a faceted approach may be important in informing us about which aspects of personality are important for understanding and predicting empathy.

Taken together, the results of Study 1 suggest that the Norwegian adaptation of the BFI-2 provides scores with good psychometric properties. The proposed factor structure was supported, and the scales provided scores with adequate reliability, which correlated as expected with both self- and other ratings of the Big Five. Moreover, the scores correlated as expected with self-ratings of empathic concern and perspective taking. Finally, the facet scores generally outperformed the broad domain scores in predicting empathic concern and perspective taking. Overall, these findings support the construct validity of the BFI-2 scores.

Some minor issues, however, may be noted. First, while completing the BFI-2, some younger students reported that they did not understand the meaning of one of the words in item 28 [“skjødesløs” (careless)]. Thus, one may question the validity of scores from this scale in younger samples. Second, a closer look at the distribution of item scores revealed that item 13 provided scores with a relatively high mean (4.78) and a very large kurtosis (11.05), which is not optimal. Third, for 25 of the 60 items, the modal value was identical to the endpoints of the rating scale (either 1 or 5), suggesting that these items provide extreme scores and may not optimally differentiate among individuals. These issues were addressed in Study 2.

Study 2

The aim of Study 2 was to assess the psychometric properties of scores from the Norwegian adaptation of BFI-2 in a new sample, after some slight improvements based on findings in Study 1. First, small revisions of items 13 and 28 were tested out in smaller samples before a final translation was selected for inclusion in the Norwegian adaptation of the BFI-2. That is, item 13 was rephrased from “Er til å stole på, stødig” to “Er pliktoppfyllende, gjør som avtalt” and item 28 was rephrased from “Kan være litt skjødesløs” to “Kan vaere litt slurvete, likeglad.” Second, in order to reduce the extreme scores on some of the items, the endpoint labels on the rating scale were rephrased to be more similar in meaning to the labels in the original BFI-2. That is, in Study 1 the endpoint labels helt uenig [totally disagree] and helt enig [totally agree] were used. These labels are commonly used in Norwegian questionnaires, and are also used in the Norwegian adaptation of the BFI (Engvik and Føllesdal, 2005). One may question, however, whether these labels express the same strong levels of disagreement and agreement as the labels in the original BFI-2, i.e., strongly disagree and strongly agree. The labels were therefore rephrased to svært uenig [strongly disagree] and svært enig [strongly agree], respectively.

One aim of the present study was therefore to see if a slight improvement in the translation of the endpoints of the rating scale might lead to less extreme scale scores. A second aim was to try to replicate the good psychometric properties of the final Norwegian adaptation of BFI-2 in a new sample. A third aim was to examine the preliminary psychometric properties of two shorter versions of the inventory, the BFI-2-S and BFI-2-XS (Soto and John, 2017b).