Automatic price appraisals: why they matter and how to measure them

Altenburg, Daria; Spruyt, Adriaan

doi:10.3389/fpsyg.2024.1359007

ORIGINAL RESEARCH article

Front. Psychol., 26 April 2024

Sec. Cognition

Volume 15 - 2024 | https://doi.org/10.3389/fpsyg.2024.1359007

Automatic price appraisals: why they matter and how to measure them

Daria Altenburg^*

Adriaan Spruyt

Department of Marketing, Innovation and Organisation, Ghent University, Ghent, Belgium

Accurately estimating consumers’ willingness-to-pay (WTP) is crucial to product design, pricing decisions, and the design of competitive marketing strategies. However, traditional self-report measures of WTP are susceptible to many reporting biases, including tactical responding or an inability to make accurate estimates. Importantly, appraisals also occur automatically (i.e., in the absence of substantial time, intention, awareness, and/or substantial cognitive resources) and implicit measures used to capture automatic appraisals are less susceptible to the sort of reporting biases that self-report measures can be affected by. However, the only existing implicit measure for assessing automatic price appraisals (the Task Rule Congruency paradigm, ‘TRC’) is impractical because of the large number of trials and time it requires. Accordingly, here we introduce the Implicit Attribute Classification Task (IMPACT), test its effectiveness for the measurement of automatic price appraisals (Study 1), and directly compare its effectiveness and utility with that of the TRC (Study 2). We find that the IMPACT is an efficient measure of automatic price appraisals, that it produces considerably larger effects compared to the TRC, and that it does so while substantially shortening the procedure. We also discuss how the IMPACT scores can be used to derive an implicit measure of willingness to pay. Our findings make a substantial contribution to both research and practice by providing an effective tool that facilitates, for the first time, an efficient exploration of implicit WTP.

Introduction

Willingness-to-pay (WTP) refers to the maximum price at which consumers will buy a product or service (Varian, 1992). Accurately gauging this price point is crucial, as WTP measures play a key role in pricing decisions, competitive strategies, and product development. Both direct [e.g., the Van Westendorp Price Sensitivity Meter (Van Westendorp, 1976)] and more subtle (e.g., choice-based conjoint analysis) WTP measures rely on explicit consumer price evaluations or explicitly stated preferences (Breidert et al., 2006; Miller et al., 2011). However, this approach is prone to a number of reporting biases and can lead to inaccurate estimations as consumers may not always want to accurately volunteer this information, may respond strategically, or may perhaps simply be unable to access or express their willingness-to-pay (Podsakoff et al., 2012; Wang and Ge, 2016). For instance, a recent meta-analysis demonstrated that traditional measures of WTP tend to overestimate consumers’ real WTP (Schmidt and Bijmolt, 2020), therefore limiting their utility as ideal tools for price optimisation.

Dezwaef et al. (2019) recently suggested that one way to circumvent these limitations would be to employ a measure that derives willingness-to-pay from automatic processes [i.e., spontaneous evaluative processes that can be characterised as fast, efficient, unintentional, uncontrolled and/or unconscious (Moors and De Houwer, 2006)], as opposed to explicit processes (i.e., deliberated, self-reported evaluations). Specifically, Dezwaef et al. (2019) argued that comparisons between a given price and an internal (typically memory-based) reference point, such as the price of the item at past purchases, occur automatically and that these automatic comparisons inform consumers’ WTP. Accordingly, these automatic comparisons can be leveraged as a starting point to assess automatic price appraisals and in turn arrive at an estimate of implicit WTP.

To provide first proof-of-principle, Dezwaef et al. (2019) used a task-rule congruency paradigm (‘TRC’) to test whether price evaluations occur automatically. The task-rule congruency effect occurs when two tasks share the same stimuli and mode of response (commonly a key-press) and the automatic activation of a response rule that had been learned in one task interferes with performance on a second task (Kiesel et al., 2010; Liefooghe et al., 2012). Specifically, performance is facilitated if the active response rule and the irrelevant response rule map onto each other (i.e., are compatible) and hindered if the active response rule and the irrelevant response rule do not map onto each other (i.e., are incompatible) (Meiran and Kessler, 2008). For instance, participants may be presented with a set of word-stimuli that differ in terms of colour (green vs. yellow) and font style (italic vs. bold). A first task may require the stimuli to be classified according to colour, whereas a second task may require the stimuli to be classified according to font style. In both tasks, participants classify the stimuli by pressing either a left-hand key (for ‘green’ in task one and ‘italic’ in task two), or a right-hand key (for ‘yellow’ in task one and ‘bold’ in task two). Because each response key now shares two meanings (e.g., ‘green’ and ‘italic’ for the left-hand key), response speed will be facilitated if the response rule of the irrelevant task is compatible with the response rule of the relevant task. Based on this logic, Dezwaef et al. (2019) tested whether an evaluative response to prices is automatically activated during a second, non-evaluative categorisation task, even if the evaluative response rule is no longer relevant to the task instructions.

To demonstrate this principle, Dezwaef et al. (2019) tested whether they could use this method to record known pricing evaluations of everyday grocery items. Given consumers’ familiarity with grocery prices [product familiarity was also confirmed in a pre-test by Dezwaef et al. (2019)], manipulating the prices of these items to be either cheaper or more expensive than retail prices should elicit a corresponding response in the participants. Specifically, the prices were manipulated across seven intervals (−70, −40%, −10%, 0%, +10%, +40%, +70%). In line with this manipulation, if the measure is working as expected, the recorded attitudes should follow a linear trend aligned with these intervals.

In the task-rule congruency paradigm used by Dezwaef et al. (2019), the procedure consisted of two consecutive single-task phases. First, in an evaluative phase, participants were required to evaluate a series of product-price combinations as ‘cheap’ or ‘expensive’ by pressing a left-hand key (cheap) or a right-hand key (expensive). This phase was used to train participants to map the evaluations ‘cheap’ or ‘expensive’ onto the corresponding keys. Next, participants encountered a categorisation phase, that again presented a series of product-price combinations. Using the same keys as in the evaluation phase, participants were instructed to categorise the price stimuli based on parity (Study 1) or font style (Study 2), and no longer evaluate the prices as cheap or expensive. During this phase, it was expected that the (now irrelevant) task rule learned in the evaluation phase would interfere with the performance on the categorisation task. Specifically, if the required categorisation response was compatible with the evaluative response, response speed and accuracy were expected to be facilitated. In contrast, if the required categorisation response were incompatible with the evaluative response, response speed and accuracy were expected to be hindered.

Dezwaef et al. (2019) predicted that large discrepancies between the tested price and the internal reference point (here: common retail prices) would lead to greater task-rule congruency interference compared to small discrepancies. In line with this prediction, they observed that the task-rule congruency effects were larger when the tested price was unambiguously more expensive or unambiguously cheaper compared to the participants’ internal reference point, and that the effects were smaller the closer the tested price approached the internal reference point. These findings by Dezwaef et al. (2019) are a potentially valuable first demonstration of the automatic processing of price evaluations. However, Dezwaef et al. (2019) emphasized the need for a shorter and less resource-intensive procedure in order for this approach of measuring price appraisals to be easily used in further research and applied to marketing practices.

Following this advice, we developed a new measure of automatic price appraisals based on the Implicit Attribute Classification Task (“IMPACT”, Altenburg and Spruyt, 2022). Similar to the TRC paradigm The IMPACT is an implicit measure that draws on task-rule congruency principles to assess automatic relational processing of stimulus pairs. Specifically, each trial of the standard IMPACT consists of the presentation of a stimulus compound comprising a target (e.g., fries) and an attribute (e.g., healthy) followed by one of three task cues that indicates either an evaluative (cue:???) or non-evaluative (cues: YES, NO) task-rule. During the evaluative trials, participants are required to indicate whether the attribute applies to the target by pressing either a left-hand or right-hand key. During the non-evaluative trials, participants are required to press the left-hand or right-hand key depending on whether the cue “YES” or “NO” appears, that is, regardless of their evaluation of the match between attribute and target. The evaluative and non-evaluative task trials are presented in a mixed-task block. Because the task rule changes but the response keys stay consistent, there are two meanings mapped onto each of the response keys (e.g., the cue “YES” and ‘the attribute does apply to the target’ for the left-hand key; and the cue ‘NO’ and ‘the attribute does not apply to the target’ for the right-hand key). Due to this shared mapping, response speed and accuracy will be facilitated if the required response in the non-evaluative trials (e.g., press left for the cue “YES”) is congruent with the response rule learned in the evaluative trials (e.g., press left if the target is healthy). In turn, if the required response in the non-evaluative trials is incongruent with the response rule learned in the evaluative trials, response speed and accuracy will be obstructed. Following this procedure, automatic appraisals of a target can be derived from response latencies and accuracy rates. Importantly, the flexible trial composition allows the IMPACT to be adapted for a wide range of applications, such as in this specific context assessing the relationship between a product and a range of prices.

There are several reasons to predict that an adapted version of the IMPACT would outperform the procedure used by Dezwaef et al. (2019). First, the evaluative trials and non-evaluative trials are presented in an intermixed, random order instead of a sequential, blocked order [as used by Dezwaef et al. (2019)]. Previous research has demonstrated that the task-rule congruency effect on reaction times is commonly larger in task-switching designs such as that used in the IMPACT, compared to single-task blocks [as used by Dezwaef et al. (2019)] (Meiran, 2000; Meiran, 2005). Second, previous research using the IMPACT has demonstrated its effectiveness in recording automatic appraisals based on only a limited number of trials (Altenburg and Spruyt, 2022), suggesting that the IMPACT will likely be a less resource-intensive procedure than that by Dezwaef et al. (2019). Lastly, the IMPACT is simple to use in an online setting and therefore practical to target a wide range of specific consumer segments. Dezwaef et al. (2019) emphasized that, ideally, an implicit measure of willingness to pay should be suitable for testing online because it would allow researchers to more easily reach a wider range of specific consumer segments, and pointed out that the procedure used in their study may be too time-consuming to be an effective online tool. Accordingly, we propose an adapted version of the IMPACT, and, closely following the methods put forward by Dezwaef et al. (2019), assessed its effectiveness as an implicit measure of automatic price appraisals.

Outline of the pricing-IMPACT

In each trial of the pricing IMPACT, participants are briefly presented with a product-price compound that disappears after 1,500 ms, followed directly by a task cue. Because the IMPACT consists of intermixed evaluative trials and non-evaluative categorisation trials, the task cue is used to indicate to participants which task they are required to perform. In evaluative trials (indicated by the cue “???”), participants are asked to indicate whether the presented product-price compounds are cheap or expensive by pressing either a left-hand or a right-hand key. These trials are used to train participants to map the evaluations “cheap” and “expensive” onto the respective keys and are not used for analysis. In non-evaluative categorisation trials, participants are asked to simply press the left-hand or right-hand key depending on whether the cue ‘CHEAP’ or the cue ‘EXPENSIVE’ appears, regardless of their personal evaluation of the product-price combination (see Figure 1 for a schematic of the procedure). These trials are used to infer implicit attitudes based on task-rule congruency effects (i.e., the interference of the response rule learned in the evaluative trials with performance on the non-evaluative categorisation trials, see p. 2–3).

Figure 1

Figure 1. Trial types presented in the pricing IMPACT. Trials are presented in randomised, intermixed order. Both trial types start with a black screen with a random presentation time between 500 and 1,500 ms, followed by a product-price pairing that disappears after 1,500 ms. Following the product-price pairing, a task cue is presented until a key press is registered. The Evaluative trials (left) are indicated by three question marks and participants are asked to press a key to indicate whether the product-price pair is cheap or expensive. These trials are used to train participants to map the evaluations “cheap” and “expensive” onto the respective keys. In non-evaluative categorisation trials (right), participants are asked to press the left-hand or right-hand key depending on whether the cue ‘CHEAP’ or the cue ‘EXPENSIVE’ appears, regardless of their evaluation of the product-price pair. These trials are used to infer implicit attitudes based on interference effects. Note that, due to potential copyright concerns, Figure 1 illustrates the IMPACT procedure using images of generic (i.e., unbranded) products. The actual stimuli used in Experiment 1 are publicly available at https://osf.io/w4kvz/.

Although the non-evaluative categorisation task does not require participants to process the product-price compounds (as the response is dependent only on the cue), the relational processing of the product and price compounds is promoted across all trials because participants cannot anticipate whether an evaluative or non-evaluative task cue will appear. Accordingly, response times are expected to be fast if the evaluative and non-evaluative response are compatible and slow if the evaluative and non-evaluative response are incompatible.

Experiment 1

Experiment 1 tested the effectiveness of the pricing IMPACT at recording automatic price appraisals. Modeling this experiment closely on the materials put forward by Dezwaef et al. (2019), we tested automatic price appraisals of common supermarket products on prices ranging from unambiguously cheap to unambiguously expensive.

Method

Ethical statement

This study complied with the Research Code of Ethics of the Faculty of Economics and Business Administration, Ghent University. Participants were informed that their participation was voluntary, that they could end their participation at any time, and that their data would be processed, stored, and reported anonymously. Participants then gave written informed consent prior to participation.

Participants

One hundred forty-two volunteers initially took part in the study. All data was collected between May 1st and May 18th, 2022. Recruitment took place online and relied on a Belgian convenience sample. Eight participants started or completed the task more than once. In these cases, only the first participation data was retained and subsequent repetitions discarded. Six respondents did not proceed past the instructions, and another thirty-five respondents did not complete the full experiment and were excluded from analysis. Four participants were excluded because of excessive error rates in the IMPACT (43.75–53.57%). Additionally, eight participants were excluded as an excessive percentage of their trials had to be replaced due to error rates and extreme response latencies taken together (41.96–50.89%). We defined error rates and trial exclusions of 41.96% and above as excessive. This was, in line with previously reported data exclusion standards of the authors, based on the binomial probability of observing a given number of correct responses if a respondent was actually performing at chance level. One further participant was excluded because their mean response speed (1,465 ms) deviated more than 2.5 SD from the samples’ mean response speed (cut-off value: 1396 ms). A final sample of N = 88 was retained for analysis (45 women, 41 men, 2 did not report their gender; M_age = 35.72 years, Mdn_age = 28.5 years, Range_age = 20–75 years).

Materials and procedure

Stimuli and price points

The product stimuli used in this study were the same thirty-six products selected by Dezwaef et al. (2019) based on a pre-test using a Belgian sample to ensure that participants would be familiar with the products and average retailing price. Also in line with Dezwaef et al. (2019), twenty-eight of these thirty-six products were shown during the evaluation trials and eight were shown in the non-evaluative measurement trials. To ensure that participants would learn the required evaluative mapping of cheap and expensive during the evaluative trials, the price points shown with the products in these trials were calculated as unambiguously expensive (70% above the current retail price) or unambiguously cheap (70% below the current retail price, retail prices as indicated on the website of Carrefour Belgium, data retrieved on April 7^th, 2022). In the non-evaluative measurement trials, the prices varied across seven price points, ranging from unambiguously cheap to unambiguously expensive. Specifically, for each product we calculated six price points (−70, −40%, −10%, +10%, +40%, +70%) based on and in addition to the actual retail price (0%). The exact stimuli and price points used in this study are publicly available at https://osf.io/w4kvz/.

Procedure

Participants were able to access the study via desktop devices. Participation on mobile devices was not possible. Participants were first asked about their age, gender, occupation, average net income, and frequency of grocery shopping. Next, they were forwarded to the pricing IMPACT.

Pricing IMPACT

The pricing IMPACT was built using PsychoPy software (Peirce et al., 2019) and administered via the PsychoPy launching platform Pavlovia.org. Participants were instructed to use the E-key and the I-key to 1) evaluate the shown product-price combination as cheap or expensive if the???-cue appeared, or 2) categorise the cue words CHEAP and EXPENSIVE, regardless of their personal evaluation. The response assignment of the E- and the I- key was counterbalanced. For half of the participants the evaluation ‘cheap’ and the cue word CHEAP were mapped onto the E-key, whereas the evaluation ‘expensive’ and the cue word EXPENSIVE were mapped onto the I-key. The reverse mapping was true for the other half of participants. Participants were encouraged to respond as quickly as possible while keeping a low error rate. To familiarise participants with the task, they first completed eight practice trials, followed by a reminder of the instructions.

Next, participants completed a measurement block that consisted of the intermixed evaluative trials and non-evaluative measurement trials. For the evaluative trials, each of the twenty-eight products was displayed once with the unambiguously high price, and once with the unambiguously low price, resulting in a total of 56 evaluative trials. For the non-evaluative measurement trials, each of the eight product and price-point combination was displayed once with the cue “CHEAP” and once with the cue “EXPENSIVE” resulting in a total of 112 non-evaluative measurement trials. Altogether, participants completed one measurement block, i.e., a total of 168 trials presented in randomised order. The inter-trial interval was set to randomly vary between 500 ms and 1,500 ms. The product-price compounds were always presented for 1,500 ms before they disappeared and one of the task cues appeared. The task cue was always displayed until a key press was registered. In both the evaluative and non-evaluative trials participants were presented with a red “X” if a wrong key press was registered. This error feedback remained on screen until the respondent corrected their response.

Results

The data were prepared and aggregated in the RStudio environment version 2022.07.1.554 (R Studio Team, n.d.) using R version 4.1.2 (2021-11-01) (R Core Team, 2021) and the packages tidyverse 1.3.1 (Wickham et al., 2019), gtools 3.9.2. (Warnes et al., n.d.), and stringr 1.4.0 (Wickham, n.d.). Analyses were carried out using SPSS IBM SPSS Statistic for Windows, Version 27. The data of this experiment are publicly available at https://osf.io/w4kvz/.

Data preparation

For the calculation of the D scores, only the critical trials (i.e., categorisation trials with the cues “EXPENSIVE” or “CHEAP”) were used. All data preparation decisions were based on consistent standards applied by the authors (Altenburg, 2023) and empirically derived recommendations described by Greenwald, Nosek, and Banaji (Greenwald et al., 2003). Trials with extreme response latencies (below 300 ms or above 5,000 ms) were winsorised, so that trials with a response latency below 300 ms were replaced with 300 ms, and trials with a response latency above 5,000 ms were replaced with 5,000 ms. In line with recommendations by Greenwald et al. (2003), error trials were excluded and instead replaced by the participant’s mean response latency based on correct trials plus a penalty of 600 ms. One D score for each price point was then calculated following the standard D score formula by Greenwald et al. (2003). This formula divides the differences in mean response latencies on compatible and incompatible trials by their pooled standard deviation. Because compatibility is reversed for negative and positive prices in this study (e.g., the cue “CHEAP” is compatible with price points below the retail price, but incompatible with price points above the retail price), we reversed compatibility scoring at the 0% price point. That is, to preserve a uniform directionality of the D scores, for each price point we subtracted the mean response latency on trials with the cue “EXPENSIVE” (i.e., trials that were incompatible for all price points below the retail price, and compatible for all price points above the retail price) from the mean response latency on trials with the cue “CHEAP” (i.e., trials that were compatible for all price points below the retail price, and incompatible for all price points above the retail price) and divided this by the pooled standard deviation. This always resulted in a D score below zero if a price was appraised as cheap, and in a D score above zero if a price was appraised as expensive. Following this procedure, we derived one D score for each of the seven price points.

Internal consistency

We estimated an overall internal consistency score using Spearman-Brown corrected mean split-half correlations (Pronk et al., 2022). For each participant, the compatible trials and the incompatible trials were each split into random halves. We calculated one D score using the first halves of the compatible and incompatible trials, and a second D score using the second halves of the compatible and incompatible trials. Next, the D scores of the two halves were correlated at the group level. This process was repeated one hundred times, and a consistency score was derived from the mean correlation using the Spearman-Brown prediction formula. This resulted in an internal consistency estimate of R_sb = 0.42.¹

Evaluative trials

On average, participants identified 88% of the unambiguously cheap and 83% of the unambiguously expensive prices correctly during the evaluative trials. This suggests that respondents understood the evaluative task instructions and were largely able to identify unambiguous prices.

Categorisation trials

On average, participants responded correctly on 92% of all categorisation trials (95% of compatible trials, 89% of incompatible trials) during the categorisation trials. This suggests that respondent understood the non-evaluative categorisation task instructions.

Pricing appraisals

D scores below zero indicated that a price was appraised as cheap, whereas D scores above zero indicated that a price was appraised as expensive. Moreover, the greater the distance between the tested price and the reference point, the greater the absolute value of the score. Accordingly, if the IMPACT recorded the expected automatic price appraisals, the observed pattern of the D scores should describe a positive linear trend from the lowest negative value at the −70% price point to the highest positive value at the +70% price point, whereby the reference point (0%) should be close to zero.

To test for this pattern of results, we analysed the price-point D scores using a one-way repeated measures ANOVA with seven levels. Mauchly’s test indicated that the assumption of sphericity was violated (p < 0.001), therefore a Greenhouse–Geisser correction was applied. Results showed that there was a statistically significant effect of price point, F(4.9, 429.1) = 34.30, p < 0.001, η_p² = 0.28. As shown in Table 1, the mean D scores followed the expected positive linear trend, whereby the lowest negative score was observed at the −70% price point, an almost neutral score was observed at the 0% price point, and the highest positive value was observed at the 70% price point. Trend analysis confirmed that this was a statistically significant linear trend, F (1, 78) = 106.02, p < 0.001, η_p² = 0.55.

Table 1

Table 1. Descriptive statistics for the automatic price appraisal (D scores) by pricing points.

Discussion

The findings of Experiment 1 suggest that price appraisals occur automatically and that the IMPACT is a suitable tool to capture these automatic price appraisals. Interestingly, we found that participants perceived both the current retail price (0%) and the ambiguously cheap price (−10%) as virtually identical, and (almost) neutral. This is particularly interesting because (1) increasing the price by 10% did invoke a considerable difference in the automatic price appraisal, suggesting that a 10% increment is generally large enough to provoke a measurable change, and (2) the period shortly prior to the data collection, starting in late 2021, was characterised by a worldwide steep increase in inflation rates, with the Belgian Consumer Price Index Inflation having increased by 8.97% at the time of data collection (May 2022), compared to the previous year (Triami Media BV, 2023). The finding that the −10% price point was perceived as neutral, rather than cheap, is consistent with this rapid price increase, potentially suggesting that relatively recent prices, although now out of date, may still hold some influence as an internal reference point.

Importantly, the observation that the IMPACT was able to (1) record the automatic price appraisals, (2) produce high effect sizes, and (3) considerably shorten the task procedure compared to that of the TRC (the exact TRC procedure will be described in Experiment 2) demonstrates the added value and advantages of the IMPACT over the TRC in its own right. To directly investigate this issue, in Experiment 2 we tested how the pricing IMPACT performs in a direct comparison with the TRC paradigm used by Dezwaef et al. (2019).

Experiment 2

In order to directly compare the IMPACT with the TRC, we replicated the method and procedures from Experiment 1 using a between subjects design in which participants completed either the IMPACT or the TRC. Dezwaef et al. (2019) used two different categorisation tasks whereby participants were required to categorise prices based on parity in Study 1, and based on font (italic/regular) in Study 2. Because the second study produced a larger effect, we used the version of the TRC in which the categorisation task was based on the font of the price. Additionally, we shortened the relatively long fixation (from 1,000 ms to 500 ms) and stimulus onset asynchrony (from 2,500 ms to 1,500 ms) used by Dezwaef et al. (2019) to more closely reflect the parameters used in the IMPACT (The IMPACT has no fixation but an inter trial interval that randomly varies between 500 ms – 1,500 ms, and a stimulus onset asynchrony of 1,500 ms). These choices were made deliberately in order to (1) maximise power for the TRC and (2) avoid confounding the potential effect of measurement type with that of differing fixation and stimulus timing. Figure 2 depicts a schematic of the TRC procedure.

Figure 2

Figure 2. Example of trials in the evaluative and categorisation phase presented in Dezwaef et al. (2019) TRC paradigm. Participants first completed the evaluation phase. Trials started with the presentation of a fixation cross (500 ms), followed by a product. After 1,500 ms, the product disappeared and a price was presented until a key press was registered. In the evaluative phase, participants were asked to indicate as fast as possible whether they evaluated the product as cheap (e.g., press the E-key) or expensive (e.g., press the I-key). Next, participants completed the categorisation phase. The trial composition was identical to that in the evaluation phase, however, participants were now asked to indicate as fast as possible whether the price was printed in italic (e.g., press the E-key) or regular (e.g., press the I-key) font. For both phases, the keys used to respond remained the same. Note that, due to potential copyright concerns, Figure 2 illustrates the TRC procedure using images of generic (i.e., unbranded) products. The actual stimuli used in Experiment 2 are publicly available at https://osf.io/w4kvz/.