The Nature and Nurture of Congenital Amusia: A Twin Case Study

Pfeifer, Jasmin; Hamann, Silke

doi:10.3389/fnbeh.2018.00120

ORIGINAL RESEARCH article

Front. Behav. Neurosci., 25 June 2018

Sec. Pathological Conditions

Volume 12 - 2018 | https://doi.org/10.3389/fnbeh.2018.00120

This article is part of the Research TopicMusic Cognition IIView all 7 articles

The Nature and Nurture of Congenital Amusia: A Twin Case Study

Jasmin Pfeifer^1,2*

Silke Hamann¹

¹Phonetics Laboratory, Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, Netherlands
²Institute for Language and Information, Heinrich-Heine University, Düsseldorf, Germany

In this article, we report the first documented case of congenital amusia in dizygotic twins. The female twin pair was 27 years old at the time of testing, with normal hearing and above average intelligence. Both had formal music lesson from the age of 8–12 and were exposed to music in their childhood. Using the Montreal Battery of Evaluation of Amusia (Peretz et al., 2003), one twin was diagnosed as amusic, with a pitch perception as well as a rhythm perception deficit, while the other twin had normal pitch and rhythm perception. We conducted a large battery of tests assessing the performance of the twins in music, pitch perception and memory, language perception and spatial processing. Both showed an identical albeit low pitch memory span of 3.5 tones and an impaired performance on a beat alignment task, yet the non-amusic twin outperformed the amusic twin in three other musical and all language related tasks. The twins also differed significantly in their performance on one of two spatial tasks (visualization), with the non-amusic twin outperforming the amusic twin (83% vs. 20% correct). The performance of the twins is also compared to normative samples of normal and amusic participants from other studies. This twin case study highlights that congenital amusia is not due to insufficient exposure to music in childhood: The exposure to music of the twin pair was as comparable as it can be for two individuals. This study also indicates that there is an association between amusia and a spatial processing deficit (see Douglas and Bilkey, 2007; contra Tillmann et al., 2010; Williamson et al., 2011) and that more research is needed in this area.

Introduction

Congenital amusia is an innate disorder that has been shown to have a negative influence on pitch perception (Peretz et al., 2002; Foxton et al., 2004; Stewart, 2008), with a co-occurring deficit in rhythm perception in about 50% of the cases (Pfeifer and Hamann, 2015). This congenital variety of amusia is neither caused by a hearing deficiency nor by any form of brain damage or intellectual impairment (Ayotte et al., 2002) and causes persistent, lifelong impairments in the musical (Stewart, 2008), or more broadly, auditory domain. While congenital amusia had long been reported to affect only the musical domain (Peretz, 2001; Ayotte et al., 2002; Peretz et al., 2002), many recent studies have shown that different areas of speech perception are also affected, such as the perception of intonation (Patel et al., 2008; Liu et al., 2010; Hamann et al., 2012), of tone in languages that employ tone differences distinctively (Tillmann et al., 2011; Liu et al., 2012, 2015a,b), the perception of vowels (Huang et al., 2016; Zhang et al., 2017) and of emotional prosody in language (Thompson et al., 2012; Lolli et al., 2015).

The prevalence of the disorder is estimated to range between 1.5% (Peretz and Vuvan, 2017) and 4% (Kalmus and Fry, 1980) of the general population. Because of its clustering within families, documented in the first and so far only familial aggregation study by Peretz et al. (2007), congenital amusia has been hypothesized to have a genetic component. Peretz et al. (2007) studied 13 amusics from nine families and calculated a sibling recurrence risk ratio (the ratio of manifestation, given that a sibling is affected, compared with the prevalence in the general population; Risch, 1990) of λ_s = 10.8. This ratio is in the same order of magnitude as the heritability of specific language impairments and of absolute pitch. Based on these numbers, recent studies think it very likely that congenital amusia has a hereditary component (Peretz et al., 2007; Gingras et al., 2015; Peretz and Vuvan, 2017). However, familial aggregation could be simply due to shared family environment (in the case of congenital amusia, e.g., non-exposure to music within a family). Such environmental factors can only be reliably separated from genetic effects in twin studies, which have been employed successfully to test the heritability of pitch processing in general. Drayna et al. (2001), for instance, compared musically non-preselected monozygotic (N = 136) and dizygotic (N = 148) twin pairs using the Distorted Tunes Test (DTT; Kalmus and Fry, 1980) in a large-scale study. The heritability of pitch processing as estimated by their genetic model fitting was 71%, and they found a high correlation (0.67) in liability within monozygotic twin pairs and a medium one (0.44) for dizygotic twin pairs. A newer twin study on general pitch and rhythm perception (Seesjärvi et al., 2016) used three subtests from an online musicality test (Peretz et al., 2008) with 69 monozygotic and 44 dizygotic twin pairs to compare genetic and environmental effects. The correlations of scores within the twin pairs on the scale test was comparable to Drayna et al. (2001) with a high correlation (0.58) for monozygotic and a medium one (0.38) for dizygotic twin pairs. On the out-of-key test, a high correlation was found for both twin groups (0.63 monozygotic and 0.67 dizygotic) and on the off-beat test only a medium correlation (0.31) for monozygotic twin pairs. Mosing et al. (2014) tested a large sample of 2568 Swedish twins with a rhythm, a melody and a pitch task. They also found similarly high correlations of 0.57 for melody and 0.48 for pitch in monozygotic twins but lower correlations of 0.32 for melody and 0.29 for pitch in monozygotic twins.

A pitfall of utilizing such twin studies in congenital amusia research is the sample size. The recruitment of amusic participants in general is already difficult, while the recruitment of a sufficiently sized pool of amusic twin pairs is nearly impossible. Most amusia studies have small sample sizes, and some are single subject studies, e.g., Peretz et al. (2002) reporting the first case of amusia or Lebrun et al. (2012) reporting the first case of amusia in a child.

In the present study, we report the first documented case of congenital amusia in a dizygotic twin. With these twins, we conducted a large battery of tests assessing their musicality, pitch perception and pitch memory, language perception and spatial abilities in order to determine a possible genetic impact of amusia on these abilities. An overview can be found in Table 1. We chose to use not only the Montreal Battery of Evaluation of Amusia (MBEA; Peretz et al., 2003) to assess amusics’ music perception, as it has been criticized lately (Henry and McAuley, 2013; Pfeifer and Hamann, 2015) but also conducted the Goldsmith Musical Sophistication Index (GoldMSI, Müllensiefen et al., 2014). The Gold-MSI has never been conducted with amusics to our knowledge, so our twin pair will be compared to available norm samples. We thereby hope to obtain a broader perspective on the musical abilities and disabilities of our amusic twin in comparison to the non-amusic twin. We also included pitch perception tasks, as these are now widely used to determine amusics’ pitch thresholds, and memory span tasks to investigate possibly different memory capacities of the twins. In addition, we wanted to asses the twins’ language perception, as an increasing body of literature points to deficits in speech perception as well. We decided to also include tests on spatial abilities, as deficits in spatial processing by amusics have been found by Douglas and Bilkey (2007). Douglas and Bilkey used a classic Mental Rotation task (Shepard and Metzler, 1971) with line drawings of two three-dimensional objects that had to be compared, and amusics showed significantly higher error rates on this task. Later tests failed to replicate these findings. Tillmann et al. (2010) utilized the same Mental Rotation task but with 160 trials instead of the 20 employed by Douglas and Bilkey. In addition they also used a bisection task in which the midpoint of a straight line or a string of letters has to be marked. They found no difference between controls’ and amusics’ accuracy or reaction time on either task. Williamson et al. (2011) again used a version of the Mental Rotation task and two further tasks assessing memory for sequences of spatial location (Milner, 1971) and memory for visual patterns (Della Sala et al., 1997). No difference in accuracy between amusics (N = 14) and controls (N = 14) on any of these tasks was found. However, a subgroup of amusics with the most severe pitch perception deficits exhibited slower reaction times on the Mental Rotation task. Peretz and colleagues (Peretz et al., 2008; Peretz and Vuvan, 2017) report that amusia and visuo-spatial deficits are associated, though this is solely based on self-report questionnaire data.

TABLE 1

Table 1. Overview of the assessed abilities and the utilized tests with references.

Materials and Methods

Procedure

First, we assessed the twins with the Montreal Battery of Evaluation of Amusia (Peretz et al., 2003) and a questionnaire about educational, musical and demographic background. In addition, we assessed the twins’ hearing and their intelligence. In order to further ascertain the differences and similarities in their musical, pitch perception and memory, language and spatial abilities, we then conducted a number of additional tests, listed in Table 1.

All experiments were conducted at the University of Düsseldorf in the phonetics laboratory in a sound-insulated booth. All experiments were programmed in Praat (Boersma and Weenink, 2016) unless otherwise mentioned, and auditory stimuli were presented over AKG K 601 headphones on a windows XP computer. All data were collected in accordance with the declaration of Helsinki. Both participants gave informed written consent to participate in this study and received a small monetary reimbursement for their time. Both participants completed all test over the course of several days. The twins took the same tests on the same days, right after each other so that they did not have the possibility to exchange information on the tasks before both had completed them.

Participants

The female twins were 27 years old at the time of testing with no history of psychiatric or hearing disorders. They grew up together in the same household with one younger male sibling and attended primary and secondary school and their undergraduate program in linguistics together. They had music lesson (flute) from the age of 8–12 and had the same exposure to music in their childhood and adolescence. The parents of the twins still live together. The mother does not show signs of amusia and seems to enjoy music. The father, however, has a severe hearing deficit in both ears that has been present since childhood due to a measles infection, and he uses hearing aids. He therefore had no normal exposure to music in childhood. Due to his severe hearing impairment, we could not test him for amusia and we cannot make any statement whether he might be amusic or not.

For the diagnosis of the twins, the MBEA (Peretz et al., 2003) and a questionnaire were used (the latter is described in detail in Pfeifer and Hamann, 2015: pages 9–11). Their scores on the MBEA are given in Table 2. One twin, called A in the following, falls below the cut-off scores by Peretz et al. (2003) on the first four subtests, exhibiting a pitch and a rhythm perception deficit. The other twin, called C in the following, stays well above the cut-off scores on all subtests. A further analysis of the MBEA results with signal detection theory (SDT; Green and Swets, 1966; MacMillan and Creelman, 2005) was carried out, as the SDT measure d′ is bias free and reflects participants’ discriminatory ability without the response bias. The twins show clearly distinct discriminatory abilities, with C having much higher scores i.e., being able to discriminate much better between stimuli than A in all but the Meter subtest, where A is slightly better than her non-amusic twin sister. The d′ scores for the Meter subtest are rather low for both twins, which reflects the problematic nature of this subtest (see Pfeifer and Hamann, 2015, for details).

TABLE 2

Table 2. Montreal Battery of Evaluation of Amusia (MBEA) scores of the twins based on sum of correct responses out of 30 where cut-off score by Peretz et al. (2003) are given in brackets, and d′ scores.

The answers to the questionnaire confirmed the results obtained by the MBEA.

Both twins have normal hearing defined as a mean hearing level of 20 dB or less in both ears (tested with a pure tone audiometry at 250, 500, 1000, 2000, 3000, 4000, 6000 and 8000 Hz). The twins intelligence was assessed using the German version of the Hamburger Wechsler Adult Intelligence Scale (HAWIE; Wechsler, 1964). The twins both exhibited higher than average intelligence scores belonging to the highest 2% of scores. The non-amusic twin C achieved a global score of 132 IQ points (verbal 111, action 139) and the amusic twin A a global score of 138 (verbal 124, action 136) IQ points. Both reached similar scores on all subtests with the exception of the digit span subtest, where A had problems in comparison to her twin.

Further Musical Abilities

In addition to the MBEA, we also employed the Goldsmith Musical Sophistication Index (Müllensiefen et al., 2014), to further assess the musical performance of our twin pair. We tested them with four of the five parts of the Gold-MSI: A self-report questionnaire (the German version hereof, see Schaal et al., 2014; Fiedler and Müllensiefen, 2015), a genre sorting task (Gold-Genre), a melody memory task (Gold-Melody), and a beat alignment perception task (Gold-BAT). The Gold-Genre task consists of 16 musical excerpts, each 800 ms long, without lyrics or vocals. The excerpts are taken from four different genres (pop, rock, jazz and hip-hop) and participants have to group them into four categories without being told what the categories are. The Gold-Melody task consists of 13 melody pairs that have to be compared. Each melody is between 10 and 16 notes long, and the second melody is always transposed to a different key to test memory for a melody’s interval structure rather than absolute pitch. The two melodies are either the same—except for the key transposition of which subjects are informed—or the second melody contains an alteration. The Gold-BAT task is based on the Beat Alignment Test by Iversen and Patel (2008) and investigates beat-based processing. The test consists of 12 melodies from three different genres, and a beat track is superimposed on every melody. The participant’s task is to judge whether the beat track is on the beat of the music or not.

Pitch Perception and Pitch Memory Abilities

We employed two tasks previously used by Williamson and Stewart (2010) to investigate the auditory pitch perception abilities of our two participants. The pitch detection task measures the threshold for the detection of a pitch change, while the pitch direction task measures the threshold for discriminating pitch direction. Both are two-alternative forced choice AXB tasks employing an adaptive two-up-one-down staircase procedure. Every trial consisted of three consecutive tones, each 600 ms long. In the pitch detection task, the target tone was a pitch glide centered around 500 Hz, while the two non-target tones were steady-state tones with a frequency of 500 Hz. In the pitch direction task, all three tones were pitch glides centered around 500 Hz. The target tone was a glide in the opposite direction to the two non-target tones. The task was to identify which tone was different: the first or the last. Each task started with a pitch difference of six semitones. When participants gave two consecutive correct answers, they advanced a level, and the pitch difference became smaller. When they made one mistake, they went one level down and the pitch distance became larger. Each task ended after 15 level changes. To increase the precision of threshold determination, variable pitch step sizes were used. For the first five level changes, the change consisted of one semitone. For level changes 6–9, a change of 0.2 semitones was used, and for levels 10–15 a change of 0.05 semitones. The last 10 trials were averaged to compute the perceptual threshold of the participants.

We also included a test assessing participants’ short-term memory for auditory as well as visual sequences with a two-alternative forced choice design (Williamson and Stewart, 2010; Schaal et al., 2015). The auditory stimuli were 10 sine wave tones with a duration of 500 ms and with fundamental frequencies ranging from 262 Hz to 741 Hz in whole tone steps. The visual stimuli were 10 Devanagari letters presented for 500 ms in black on a white background. The procedure was the same for both types of stimuli: 500 ms of silence or a blank screen were followed by two successive, equally long sequences of tones or letters. The two sequences in a trial were either identical or the position of two tones/visual signs was switched in one of the sequences. The participants’ task was to determine whether the two sequences were identical or different. The same two-up-one-down staircase procedure described above for the pitch perception thresholds was employed, and the difficulty advanced, i.e., the sequences became longer after two consecutive correct answers and shorter after one incorrect answer. Each task was terminated after four incorrect answers. The last 10 trials of each task were used to calculate participants’ memory span, indicating the (auditory or visual) memory load they can store in each domain.

Language Perception Abilities

Intonation Perception

To test the intonation perception of our twin pair, we used the AX same-different discrimination task and stimuli from Hamann et al. (2012), which was in turn based on the study by Patel et al. (2008). The stimuli pairs were based on recordings of four German statement-question pairs spoken by a male native speaker. Each pair was identical but for the final intonation contour, i.e., statements exhibiting a falling pattern and the corresponding echo questions a rising pattern. The intonation contour of questions was manipulated downwards in seven steps of one semitone each, while the intonation contour of questions was manipulated upwards in the same way. Stimulus pairs consisted of the original statement or question followed either by one of the downward or upward manipulations or the original again, resulting in 112 stimuli pairs. Participants had to indicate for each pair whether the two were identical or not. We also included sinusoidal wave analogs (similar to Patel et al., 2008) that did not contain any linguistic material but were solely based on the intonation contour of the speech stimuli. These were manipulated and paired in the same way as the speech stimuli. The test was scored by calculating three different performance measures: hit rate, percentage correct and d′. Hit rate is solely based on answers to stimulus pairs where A differs from X, which are considered a hit when they are correctly identified as different. Percentage correct is the sum of both hits and correct rejections (stimulus pairs where A and X are the same and which are correctly identified as same) in relation to all answers.

Vowel Perception

The second language-related task consists of an AXB forced-choice discrimination task with vowel stimuli. We used isolated synthetic vowels based on auditory properties of the natural German vowels /ε/ and /e:/, where /e:/ is 110 ms long with a first formant (F1) of 350 Hz and a second formant (F2) of 2157 Hz, and /ε/ 60 ms long with an F1 of 524 Hz and an F2 of 1869 Hz (based on Jessen, 1993). On the basis of these vowels we created four continua with seven steps each, depicted as the four sides of the rectangular in Figure 1. For each AXB trial, A and B were the endpoints of one continuum (one side of the rectangular), and X could either be one of the two endpoints or one of the five vowels in-between. The trials were offered with two different inter-stimulus intervals (ISIs) of either 0.2 s or 1.2 s (Werker and Logan, 1985; Williamson and Stewart, 2010). Each trial was repeated five times throughout the experiment.

FIGURE 1

Figure 1. Spectral and durational values of vowel stimuli.

The vowel perception task was scored by calculating the percentage of how often participants perceived X correctly as category A (where the answer was considered correct when X was either identical to A or one of the three stimuli close to A on the continuum in question). Based on this measure we calculated d′ values.

Spatial Abilities

The Mental Rotation task used in previous studies to test amusics’ spatial abilities has been argued to be rather complex and to rely on different cognitive processes (Williamson et al., 2011), we therefore decided to employ the Object Perspective Taking Test (Hegarty and Waller, 2004) and the Santa Barbara Solids Test (Cohen and Hegarty, 2012) instead. These two tests were chosen as they differentiate between spatial orientation abilities, tested with the Object Perspective Taking Test, and spatial visualization abilities, tested with the Santa Barbara Solids Test.

In the Object Perspective Taking Test, the participant is asked to imagine the degree in which several objects are placed to each other from different perspectives, providing a test of egocentric spatial transformations. The test was administered in a paper-and-pencil based version and contained 12 items. Each item consists of a map in the top half of the page, in which seven items are arranged. Participants are asked to imagine being at the position of one object, facing a second one, and having to point to a third object. On the bottom of the page is a circle and the first object is always located in its center with an arrow pointing vertically up to the second object. Participants have to draw a second arrow from the center of the circle outwards to the position of the third target object, thereby making an egocentric transformation. Participants are prevented from rotating the paper, so as not to make the task easier. The perspective change on every item is at least 90 degrees. Each item is scored by calculating the deviation from the correct direction in degrees. The overall score on the test is the average deviation across all items.

The Santa Barbara Solids Test was also administered in a paper-and-pencil version containing 30 items. Each item consists of a three-dimensional geometric object that is sliced by a plane. Participants are asked to imagine looking at the two-dimensional cross-section of the geometric object caused by the plane. The stimuli vary in complexity along two factors: Complexity of the geometric shape and the orientation of the cutting plane. Half of the items have planes that are vertical or horizontal to the main axis of the shape, and the other half have planes that are diagonal to this axis. Participants are given four answer choices, depicted as possible cross-sections. The answers include one egocentric distracter that represents the shape that a participant who fails to change her perspective would choose, providing a way to differentiate whether a perspective change away from egocentric was made or not; see the example in Figure 2.

FIGURE 2

Figure 2. Example item from the Santa Barbara Solids Test (Cohen and Hegarty, 2012: 869). The top depicts a three-dimensional object and a plane cutting this object vertically, the bottom displays four cross-sections as answer choices ((c) being the correct answer, and (d) the distracter without change in view perspective).

The Santa Barbara Solids Test is scored by counting the number of correct responses and calculating the percentage correct.

Results

The pitch perception and memory tasks as well as the language perception tasks have previously been used with amusics, and the performance of the twin pair is compared to those samples. The Gold-MSI has never been conducted with amusics, therefore no cut-off scores for amusics are available. However, Müllensiefen et al. (2014) provide data norms based on 147,636 participants, to which we compared our two subjects. Similarly, the spatial tasks have not been administered to amusics before, and we compared the twins’ performance to the data norms by Hegarty and Waller (2004) for the Object Perspective Taking Test (based on 62 participants) and the norms by Cohen and Hegarty (2012) for the Santa Barbara Solids Test (223 participants).