A Machine Learning Based Approach to Control Network Activity
-
1
University of Freiburg, Biomicrotechnology, Institute of Microsystems Engineering, Germany
-
2
University of Freiburg, Bernstein Center Freiburg, Germany
-
3
University of Freiburg, Machine Learning Lab, Germany
Motivation
Electrical stimulation of the brain is increasingly used as a strategy to alleviate the symptoms of a range of neurological disorders, and as a possible means to artificially inject information into neural circuits, e.g. towards bidirectional neural prostheses [1]. Conventionally, stimulation of neuronal networks explicitly or implicitly assumes that the response to repeated constant stimuli is predictable. The measured response, however, typically results from interaction with additional neuronal activity not controlled by the stimulus [2]. Constant stimuli are therefore not optimal to reliably induce specific responses. Yet, without suitable models of the interaction between stimulus and ongoing activity it is not possible to adjust individual stimuli such that a defined response feature is achieved optimally.
To address these challenges, we propose an autonomous closed-loop paradigm using techniques of Reinforcement Learning (RL). The approach poses the following questions: How to (1) identify and capture informative activity patterns in a quantifiable ‘state’ so that a well posed control problem may be formulated, (2) find the optimal stimulation strategy given a goal and (3) evaluate the quality of the solution found. In this study we consider a toy control problem defined for a generic network of neurons. Our objective is to demonstrate how these questions could be addressed and thus apply an RL controller to autonomously adjust stimulus settings without prior knowledge of the rules governing the interaction of electrical stimulation with ongoing activity in the network.
Material and Methods
To develop the concept and techniques, we employed generic neuronal networks in vitro as a model system. Cultured neuronal networks exhibit activity characterized by intermittent network-wide spontaneous bursts (SB), separated by periods of reduced activity. Electrical stimulation of the network between SBs also evokes bursts of action potentials (responses). For our experiments, we selected one stimulating and recording channel each. Response strengths depended on the latency of the stimulus relative to the previous SB and can be described by a saturating exponential model [3]. However, this period of latency is also prone to interruption by ongoing activity. Therefore stimulus efficacy, defined here as the response strengths per SB, depends on both opposing modalities. Using phenomenological models, we show that their dynamic interplay presents a trade-off scenario that admits a network-specific unique optimal stimulus latency that maximizes stimulus efficacy. In this study, we asked if an RL based controller can autonomously find the ideal balance in this trade-off: the optimal stimulus latency. An open-loop characterization of each network was used to make parametric model-based predictions of optimal stimulus latencies. The quality of the controller's learned strategy was evaluated using these predictions.
Results
In order to extract the parameters of the response strength model, stimuli were first delivered at random latencies relative to SBs in an open-loop setting. A statistical model of the probability of occurrence of SBs was estimated using spontaneous activity recordings. Weighting the response strengths with the interruption probabilities yielded quasi-concave objective functions and unique optimal latencies for each of the 20 networks studied (Fig. 1A,B).
In a closed-loop session, an RL controller interacted with the network with the goal of autonomously maximizing stimulus efficacy. Learning proceeded in alternating training and testing sessions. During training, the controller explored the parameter space while in testing, the learned strategy was executed.
Stimulus latencies learned by the controller were strongly correlated with optimal latencies as predicted from open-loop studies (r=0.94, p<10^-8, n=17 networks, Fig. 1C). Moreover, in 94.2% of the sessions (n=52, 11 networks), the percentage of interrupted events per session diminished post learning. After learning, stimulus efficacy improved in each of these networks, further supporting the effectiveness of the learning algorithm (Fig. 1D).
Discussion
Closed-loop stimulation has been proposed as a promising strategy to intervene in the dynamics of pathological networks while adapting to ongoing activity. The selection of signal features to close such a loop and strategies to identify optimal stimulus settings given a desired network response remain open problems. We propose methods of RL to autonomously choose optimal control policies given a pre-defined goal. We considered a toy problem that captures some of the major challenges that a closed-loop paradigm would face in a biomedical application, i.e. in a complex, adaptive environment. Balancing the trade-off of response strengths and interruptions involves finding the dependence of response strengths on stimulus latencies and adapting at the same time to the dynamics of ongoing activity. In this study, we demonstrate the capacity of RL based techniques to address such a challenge. Using phenomenological models derived from prior studies on such networks, we independently validate the performance of the controller.
Conclusion
We show that an RL based autonomous strategy is capable of choosing optimal strategies in the context of a dynamic neuronal system. We focused on a trade-off problem: to maximize a derived feature of the response (stimulus efficacy measured as the response strength per SB) with a priori unknown value. Estimates of a unique network-specific optimal strategy can be computed for this problem. This allowed us to validate the latencies learned autonomously by the controller. Our paradigm offers the ability to learn optimal interaction strategies in the absence of complete knowledge about the network or quantitative principles defining its dynamics.
Acknowledgements
This project was supported by BrainLinks-BrainTools Cluster of Excellence (DFG-EXC 1086) and the Bernstein Focus Neurotechnology Freiburg*Tübingen (BMBF FKZ 01GQ0830).
References
[1] Raspopovic et al. (2014) Sci Transl Med 6, 222ra19.
[2] Arieli, A. et al. (1996) Science 273, 1868–1871.
[3] Weihberger,O. et al. (2013) J. Neurophys 109,1764–1774.
Figure Legend
(A) Fitted models of the probability of avoiding interruptions due to SBs (blue), response strengths(orange), and the resulting weighted response curve (black) shown for a network. An optimal latency of ~1.5 s emerges in this case.
(B) All predicted objective functions for each of the 20 networks studied were quasiconcave and unique choices of optimal latencies were available.
(C) Across networks, learned stimulus latencies showed a positive correlation with predicted optimal values.
(D) After learning, mean rewards increased in each network, indicative of the improvement in stimulation efficacy.
Keywords:
reinforcement learning,
microelectrode arrays,
Neuronal cultures,
closed-loop stimulation
Conference:
MEA Meeting 2016 |
10th International Meeting on Substrate-Integrated Electrode Arrays, Reutlingen, Germany, 28 Jun - 1 Jul, 2016.
Presentation Type:
oral
Topic:
MEA Meeting 2016
Citation:
Kumar
SS,
Wülfing
J,
Okujeni
S,
Boedecker
J,
Riedmiller
M and
Egert
U
(2016). A Machine Learning Based Approach to Control Network Activity.
Front. Neurosci.
Conference Abstract:
MEA Meeting 2016 |
10th International Meeting on Substrate-Integrated Electrode Arrays.
doi: 10.3389/conf.fnins.2016.93.00117
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
22 Jun 2016;
Published Online:
24 Jun 2016.
*
Correspondence:
Dr. Sreedhar S Kumar, University of Freiburg, Biomicrotechnology, Institute of Microsystems Engineering, Freiburg, Germany, sreedhar.kumar@imtek.uni-freiburg.de