AUTHOR=von Wegner Frederic
TITLE=Partial Autoinformation to Characterize Symbolic Sequences
JOURNAL=Frontiers in Physiology
VOLUME=9
YEAR=2018
URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2018.01382
DOI=10.3389/fphys.2018.01382
ISSN=1664-042X
ABSTRACT=
An information-theoretic approach to numerically determine the Markov order of discrete stochastic processes defined over a finite state space is introduced. To measure statistical dependencies between different time points of symbolic time series, two information-theoretic measures are proposed. The first measure is time-lagged mutual information between the random variables Xn and Xn+k, representing the values of the process at time points n and n + k, respectively. The measure will be termed autoinformation, in analogy to the autocorrelation function for metric time series, but using Shannon entropy rather than linear correlation. This measure is complemented by the conditional mutual information between Xn and Xn+k, removing the influence of the intermediate values Xn+k−1, …, Xn+1. The second measure is termed partial autoinformation, in analogy to the partial autocorrelation function (PACF) in metric time series analysis. Mathematical relations with known quantities such as the entropy rate and active information storage are established. Both measures are applied to a number of examples, ranging from theoretical Markov and non-Markov processes with known stochastic properties, to models from statistical physics, and finally, to a discrete transform of an EEG data set. The combination of autoinformation and partial autoinformation yields important insights into the temporal structure of the data in all test cases. For first- and higher-order Markov processes, partial autoinformation correctly identifies the order parameter, but also suggests extended, non-Markovian effects in the examples that lack the Markov property. For three hidden Markov models (HMMs), the underlying Markov order is found. The combination of both quantities may be used as an early step in the analysis of experimental, non-metric time series and can be employed to discover higher-order Markov dependencies, non-Markovianity and periodicities in symbolic time series.