Musical Creativity and Depth of Implicit Knowledge: Spectral and Temporal Individualities in Improvisation

Daikoku, Tatsuya

doi:10.3389/fncom.2018.00089

ORIGINAL RESEARCH article

Front. Comput. Neurosci. , 13 November 2018

Volume 12 - 2018 | https://doi.org/10.3389/fncom.2018.00089

This article is part of the Research Topic Brain-inspired Machine Learning and Computation for Brain-Behavior Analysis View all 22 articles

Musical Creativity and Depth of Implicit Knowledge: Spectral and Temporal Individualities in Improvisation

$\r\nTatsuya Daikoku*$ Tatsuya Daikoku^*

Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

It has been suggested that musical creativity is mainly formed by implicit knowledge. However, the types of spectro-temporal features and depth of the implicit knowledge forming individualities of improvisation are unknown. This study, using various-order Markov models on implicit statistical learning, investigated spectro-temporal statistics among musicians. The results suggested that lower-order models on implicit knowledge represented general characteristics shared among musicians, whereas higher-order models detected specific characteristics unique to each musician. Second, individuality may essentially be formed by pitch but not rhythm, whereas the rhythms may allow the individuality of pitches to strengthen. Third, time-course variation of musical creativity formed by implicit knowledge and uncertainty (i.e., entropy) may occur in a musician's lifetime. Individuality of improvisational creativity may be formed by deeper but not superficial implicit knowledge of pitches, and that the rhythms may allow the individuality of pitches to strengthen. Individualities of the creativity may shift over a musician's lifetime via experience and training.

Introduction

Implicit Knowledge and Creativity in Brain

The brain models external phenomena as a hierarchy of statistical dynamical systems, which encode causal chain structure in the sensorium (Friston et al., 2006; Friston and Kiebel, 2009; Friston, 2010) to maintain low entropy and free energy in the brain (von Helmholtz, 1909), and predicts a future state based on the internalized stochastic model to minimize sensory reaction and optimize motor action regardless of consciousness (Friston, 2005). This prediction associates with the brain's implicit, domain-general, and innate system, called implicit learning or statistical learning (Reber, 1967; Saffran et al., 1996; Cleeremans et al., 1998; Perruchet and Pacton, 2006), in which our brain automatically calculates transitional probabilities (TPs) of sequential phenomena and grasps information dynamics. The terms implicit learning and statistical learning have been used interchangeably and are regarded as the same phenomenon (Perruchet and Pacton, 2006). Because of the implicitness of statistical learning and knowledge, humans are unaware of exactly what they learn (Daikoku et al., 2014). Nonetheless, neurophysiological and behavioral responses disclose implicit learning effects (Francois and Schön, 2011; François et al., 2013; Daikoku et al., 2015, 2016, 2017a,c,d; Koelsch et al., 2016; Yumoto and Daikoku, 2016, 2018; Daikoku and Yumoto, 2017). When the brain implicitly encodes TP distributions that are inherent in dynamical phenomena, several things are automatically expected, including a probable future state with a higher TP, facilitating optimisation of performance based on the encoded statistics despite being unable to describe the knowledge (Broadbent, 1977; Berry and Broadbent, 1984; Green and Hecht, 1992; Williams, 2005; Rebuschat and Williams, 2012), and inhibit neurophysiological response to predictable external stimuli for the efficiency and low entropy of neural processing based on predictive coding (Daikoku, 2018b). The implicit knowledge has been considered to contribute to many types of mental representation: the comprehension and production of complex structural information such as music and language (Rohrmeier and Rebuschat, 2012), intuitive decision-making (Berry and Dienes, 1993; Reber, 1993; Perkovic and Orquin, 2017), auditory-motor planning (Pearce et al., 2010a,b; Norgaard, 2014), and creativity (Wiggins, 2018) involved in musical composition (Pearce and Wiggins, 2012; Daikoku, 2018a) and musical improvisation (Norgaard, 2014). Additionally, compared to language (Chomsky, 1957; Jackendoff and Lerdahl, 2006), several studies suggest that musical representation including tonality is mainly formed by a tacit knowledge (Delie‘ge et al., 1996; Delie‘ge, 2001; Bigand and Poulin-Charronnat, 2006; Ettlinger et al., 2011; Koelsch, 2011; Huron, 2012). Thus, it is widely accepted that implicit knowledge causes a sense of intuition, spontaneous behavior, skill acquisition based on procedural learning, and is further closely tied to musical production such as intuitive creativity, composition, and playing.

Particularly in musical improvisation, musicians are forced to express intuitive creativity and immediately play their own music based on long-term training associated with procedural and implicit learning (Clark and Squire, 1998; Ullman, 2001; Paradis, 2004; De Jong, 2005; Ellis, 2009; Müller et al., 2016). Thus, compared to other types of musical composition in which a composer deliberates and refines a composition scheme for a long time based on musical theory, the performance of musical improvisation is intimately bound to implicit knowledge because of the necessity of intuitive decision-making (Berry and Dienes, 1993; Reber, 1993; Perkovic and Orquin, 2017) and auditory-motor planning based on procedural knowledge (Pearce et al., 2010a,b; Norgaard, 2014). This suggests that the stochastic distribution calculated from musical improvisation may represent the musicians' implicit and statistical knowledge and individual creativity in music that has been developed via implicit learning. Few studies have investigated the relationship between musical improvisation and implicit knowledge. Here, this study proposed the computational model of improvisational creativity based on the framework of implicit statistical learning.

Computational Model of Musical Creativity

The computational model is often used to understand general music acquisition (Cilibrasi et al., 2004; Backer and van Kranenburg, 2005; Albrecht and Huron, 2012; Ito, 2012; Prince and Schmuckler, 2012; Albrecht and Shanahan, 2013; London, 2013), entropy-based music prediction (Manzara et al., 1992; Ian et al., 1994; Reis, 1999; Pearce and Wiggins, 2006; Cox, 2010), implicit learning, and the metal representation of implicit knowledge (Dubnov, 2010; Wang, 2010; Rohrmeier and Rebuschat, 2012). Particularly, Competitive Chunker (Servan-Schreiber and Anderson, 1990), PARSER (Perruchet and Vinter, 1998), Information Dynamics of Music (IDyOM) (Pearce, 2005; Pearce and Wiggins, 2012), and n-gram models (Pearce and Wiggins, 2004) underpin the hypothesis that music is acquired by extracting and concatenating chunks, which is a main theory of implicit learning and statistical learning. Although experimental approaches are necessary for understanding the real-world brain's function in music acquisition, the modeling approaches partially outperform experimental results under conditions that are impossible to replicate in an experimental approach. For example, they can directly verify much of the real-world music and time-course variation over long time periods (Daikoku, 2018a). Most experimental approaches use the specific paradigms, which are ecologically unrealistic and focus on the specific type of short-term learning effects (e.g., chord perception, prediction, and timing). Additionally, some modeling approaches calculate statistics in music and device models, and also evaluate the validities of these models by neurophysiological and behavioral experiments and provide possibilities of novel tasks for neural and behavioral experiments (Potter et al., 2007; Pearce et al., 2010a,b; Pearce and Wiggins, 2012). A combination of the two approaches is better because each can complement the weak points of the other approach (Daikoku, 2018b).

The n-gram models, which correspond to various-order Markov model (Markov, 1971), calculate TPs of sequences by chopping them into short fragments (n-grams) up to a size of n, and are frequently used in both experimental and computational approaches (Pearce and Wiggins, 2004; Daikoku, 2018b). The online musical production, however, is not the mere chopping of one type of length of sequence, but it is a dynamical prediction to maintain an aesthetic melody with various length of sequence, temporal, and spectral features, and harmony that interact with each other (Lerdahl and Jackendoff, 1983; Hauser et al., 2002; Jackendoff and Lerdahl, 2006). That is, the musical production is not restricted to a single stream of events or a hierarchy but, rather, they interact with various hierarchical structures. Previous computational (Conklin and Witten, 1995; Pearce and Wiggins, 2012) and neural studies (Daikoku and Yumoto, 2017) expanded the n-gram method to modeling the interaction of parallel streams and enhanced the predictive power. However, the model that suffices to explain musical creativity cannot still be devised. Nonetheless, the nth-order Markov models could explain that the prediction continually occurs with each state of sequence and that the entropy in the brain (i.e., the average surprise of outcomes sampled from a probability distribution, Applebaum, 2008) gradually decreases by exposure to musical sequences. Thus, the TP distribution sampled from music based on nth-order Markov models may refer to the characteristics of a composer's superficial-to-deep implicit knowledge: a high-probability transition in music may be one that a composer is more likely to predict and choose based on the latest n states, compared to a low-probability transition. The notion has also been neurophysiologically demonstrated by our previous studies (Daikoku et al., 2017b). The model has also been applied to develop artificial intelligence that give computers learning and decision-making abilities similar to that of the human brain, such as an automatic composition system (Raphael and Stoddard, 2004; Eigenfeldt, 2010; Boenn et al., 2012) and natural language processing (Brent, 1999; Manning and Schütze, 1999). Thus, the Markov model is used in the interdisciplinary realms of neuroscience, behavioral science, engineering, and informatics.

Temporal and Spectral Feature in Musical Creativity

Temporal and spectral features are important pieces of information for which to configure characteristics of each type of music (e.g., individuality, genre, and culture). Additionally, two types of information are not independent of each other, but rather they closely interact. Thus, the relationships between temporal (i.e., rhythm) and spectral (i.e., melody) structures are a large question to understand music creativity. Some researchers indicated that humans cannot learn temporal structure independent of spectral structure (Buchner and Steffens, 2001; Shin and Ivry, 2002; O'Reilly et al., 2008), whereas other researchers demonstrated temporal implicit learning independent of pitch information (Salidis, 2001; Ullén and Bengtsson, 2003; Karabanov and Ulle'n, 2008; Brandon et al., 2012) and vice versa (Daikoku et al., 2017d). Additionally, neurophysiological and psychological studies suggested that humans can learn relative rather than absolute temporal and spectral (Daikoku et al., 2014, 2015) patterns. Thus, the relationships between temporal and spectral features on musical creativity and implicit learning remains controversial. To the best of my knowledge, there are no integrated models that cover temporal and spectral features in musical creativity. The present study first provides the implicit-learning models that unify temporal and spectral features in musical improvisation. Additionally, this study investigated which information (spectral and temporal) and hierarchy (1st to 6th orders) represent the individualities of creativity. To comprehensively understand how musical creativity occurs in the human brain and how temporal and spectral features are integrated to constitute musical individuality, it is necessary to investigate the relationships between spectral and temporal statistics inherent in music via various-order hierarchical models.

Study Purpose

The present study aimed to investigate the statistical differences and interactions between the temporal and spectral structure in improvisation among musicians using various-order Markov models, and to examine which information (spectral and temporal) and hierarchy represent the individualities of musical creativity. The statistical characteristics of the nth-order TP distribution of the spectral (pitch) and temporal sequences (pitch length and rest) in improvisational music were investigated. It was hypothesized that there were general statistical characteristics shared among musicians and specific statistical characteristics that were unique to each musician in both spectral and temporal sequences. Additionally, it was hypothesized that the detectability of the characteristics depends on hierarchy. If so, the individuality may depend on the depth of implicit knowledge. Furthermore, the chronological time-course variations of the entropies (uncertainly) and the predictability of each tone sequence were examined. It was hypothesized that implicit knowledge in music gradually shifts over a composer's lifetime. The present study first provided the findings on which information (spectral and temporal) and hierarchy (1st to 6th orders) represent the individualities of musical creativity.

Methods

Music Information Extraction

The music played by William John Evans (Autumn Leaves from Portrait in Jazz, 1959; Israel from Explorations, February 1961; I Love You Porgy from Waltz for Debby, June 1961; Stella by Starlight from Conversations with Myself, 1963; Who Can I Turn To? from Bill Evans at Town Hall, 1966; Someday My Prince Will Come from the Montreux Jazz Festival, 1968; A Time for Love from Alone, 1969), Herbert Jeffrey Hancock (Cantaloupe Island from Empyrean Isles, 1964; Maiden Voyage from Flood, 1975; Someday My Prince Will Come from The Piano, 1978; Dolphin Dance from Herbie Hancock Trio'81, 1981; Thieves in the Temple from The New Standard, 1996; Cottontail from Gershwin's World, 1998; The Sorcerer from Directions in Music, 2001), and McCoy Tyner (Man from Tanganyika from Tender Moments, 1967; Folks from Echoes of a Friend, 1972; You Stepped Out of a Dream from Fly with the Wind, 1976; For Tomorrow from Inner Voice; 1977; The Habana Sun from The Legend of the Hour, 1981; Autumn Leaves from Revelations, 1988; Just in Time from Dimensions, 1984) were used in the present study. The highest pitches including the length were chosen based on the following definitions: the highest pitches that can be played at a given point in time, pitches with slurs that can be counted as one, and grace notes were excluded. In addition, the rests that were related to highest-pitch sequences were also extracted. This spectral and temporal information were divided into four types of sequences: (1) a pitch sequence without length and rest information (i.e., pitch sequence without rhythms); (2) a rhythm sequence without pitch information (i.e., rhythm sequence without pitches); (3) a pitch sequence with length and rest information (i.e., pitch sequence with rhythms); and (4) a rhythm sequence with pitch information (i.e., rhythm sequence with pitches).

Stochastic Calculation

Pitch Sequence Without Rhythms

For each type of pitch sequence, all pitches were numbered so that the first pitch was 0 in each transition, and an increase or decrease in a semitone was 1 and −1 based on the first pitch, respectively. Representative examples were shown in Figure 1A. This revealed the relative pitch-interval patterns but not the absolute pitch patterns [30, 98]. This procedure was used to eliminate the effects of the change in key on transitional patterns. Interpretation of the key change depends on the musician, and it is difficult to define in an objective manner. Thus, the results in the present study may represent a variation in the statistics associated with relative pitch rather than absolute pitch. According to recent neurophysiological studies, human's implicit-learning system of auditory sequence capture relative rather than absolute transition patterns. In each piece of music for each musician, the TPs of the pitch sequences were calculated as a statistic based on multi-order Markov chains. The probability of a forthcoming pitch was statistically defined by the last pitch to six successive pitches (i.e., first- to six-order Markov chains). The nth-order Markov model is based on the conditional probability of an element e_n+1, given the preceding n elements:

\begin{array}{l} P (e_{n + 1} | e_{n}) = \frac{P (e_{n + 1} \cap e_{n})}{P (e_{n})} & (1) \end{array}

Rhythm Sequence Without Pitches

The onset times of each note were used for analyses. Although note onsets ignore the length of notes and rests, this methodology can capture the most essential rhythmic features of the music [30,99]. To extract a temporal interval between adjacent notes, all onset times were subtracted from the onset of the preceding note. Then, for each type of rhythm sequence, the second to last temporal interval was divided by the first temporal interval. Representative examples are shown in Figure 1B. This revealed relative rhythm patterns but not absolute rhythm patterns; it is independent of the tempo of each piece of music. In each piece of music in each musician, the TPs of the rhythm sequences were calculated as a statistic based on multi-order Markov chains. The probability of a forthcoming temporal interval was statistically defined by the last temporal interval to six successive temporal intervals, respectively (i.e., first- to six-order Markov chains).

FIGURE 1

Figure 1. Representative phrases of transition patterns in pitch sequence without rhythms (A), rhythm sequences without pitches (B), pitch sequence with rhythms (C), and rhythm sequences with pitches (D). The musical information was extracted by listening music information recording media and originally written for the present study.

Pitch Sequence With Rhythms

The two methodologies of pitch and rhythm sequences were combined. For each type of sequence, all pitches were numbered so that the first pitch was 0 in each transition, and an increase or decrease in a semitone was 1 and −1 based on the first pitch, respectively. Additionally, for each type of pitch sequence, all onset times were subtracted from the onset of the preceding note, and the second to last temporal intervals were divided by the first temporal interval. The representative examples were shown in Figure 1C. For each piece of music for each musician, the TPs of the pitch sequences with rhythms were calculated as a statistic based on multi-order Markov chains. The probability of a forthcoming pitch with temporal information was statistically defined by the last pitch with temporal information to six successive pitches with temporal information, respectively (i.e., first- to six-order Markov chains). In the first-order hierarchical model of the pitch sequence with rhythms, a temporal interval was calculated as a ratio to the crotchet (i.e., quarter note), because only a temporal interval is included for each sequence and the note length cannot be calculated as a relative temporal interval. Thus, the patterns of pitch sequence (p) with rhythms (r) were represented as [p] with [r].

Rhythm Sequence With Pitches

The methodologies of sequence extraction were the same as those of the pitch sequence with rhythm (see Figure 1D), whereas the TPs of the rhythm, but not pitch, sequences were calculated as a statistic based on multi-order Markov chains. The probability of a forthcoming temporal interval with pitch was statistically defined by the last temporal interval with pitch to six successive temporal interval with pitch (i.e., first- to six-order Markov chains). Thus, the relative pattern of rhythm sequence (r) with pitches (p) were represented as [r] with [p].

Statistical Analysis

The TP distributions were analyzed by principal component analysis. The criteria of eigenvalue were set over 1. The first two components (i.e., the first and second highest cumulative contribution ratios) were adopted in the present study. Then, the information contents [I(e_n+1|e_n)] of TP were calculated based on information theory (Shannon, 1951). Furthermore, the conditional entropy [H(AB)] in n-order was calculated from information content:

\begin{array}{l} I (e_{n + 1} | e_{n}) = l o g_{2} \frac{1}{P (e_{n + 1} | e_{n})} (b i t) & (2) \end{array}

\begin{array}{l} H (B | A) = {- \sum}_{i} \sum_{j} P (a i) P (b j | a i) l o g_{2} P (b j | a i) (b i t) & (3) \end{array}

where P(bj|ai) is a conditional probability of sequence “ai bj.” The entropy were chronologically ordered based on the time courses in which music is played in each musician. The time-course variations of the entropies were analyzed by multiple regression analyses using the stepwise method. The criteria of the variance inflation factor (VIF) and condition index (CI) were set at VIF < 2 and CI < 20 to confirm that there was no multi collinearity (Cohen et al., 2003).

Furthermore, in each musician, seven pieces of music were averaged in each type of sequence. The transitional patterns with first to fifth highest TPs in each musician, which show higher predictabilities in each musician, were used in the regression analyses. The transitional patterns were chronologically ordered based on the time courses in which music is played in each musician. The time-course variations of the TPs were analyzed by multiple regression analyses using the stepwise method. The criteria of the variance inflation factor (VIF) and condition index (CI) were set at VIF < 2 and CI < 20 to confirm that there was no multi collinearity.

The logit transformation was applied to normalize the TPs. Then, using the transitional patterns with first to fifth highest TPs in each musician, the repeated-measure analysis of variances (ANOVAs) with a between-factor player (WJ. Evans vs. HJ. Hancock vs. M. Tyner) and a within-factor sequences for each hierarchy of Markov model were conducted. When we detected significant effects, Bonferroni-corrected post-hoc tests were conducted for further analysis. Statistical significance levels were set at p = 0.05 for all analyses.

Results