AUTHOR=Magalhães Bruno R. C. , Sterling Thomas , Hines Michael , Schürmann Felix 

TITLE=Asynchronous Branch-Parallel Simulation of Detailed Neuron Models

JOURNAL=Frontiers in Neuroinformatics

VOLUME=Volume 13 - 2019

YEAR=2019

URL=https://www.frontiersin.org/journals/neuroinformatics/articles/10.3389/fninf.2019.00054

DOI=10.3389/fninf.2019.00054

ISSN=1662-5196

ABSTRACT=Simulations of electrical activity of networks of morphologically detailed neuron models allow for a better understanding of the brain. State-of-the-art simulations describe the dynamics of ionic currents and biochemical processes within branching topological representations of the neurons. Acceleration of such simulation is possible in the weak scaling limit by modelling neurons as indivisible computation units and increasing the computing power. Strong scaling and simulations close to biological time are difficult, yet required for the study of synaptic plasticity and other use cases requiring simulation of neurons for long periods of time. Current methods rely on parallel Gaussian elimination, computing triangulation and substitution of many branches simultaneously.
Existing limitations are: (a) high heterogeneity of compute time per neuron leads to high computational load imbalance; and (b) difficulty in providing a computation model that fully utilises the computing resources on distributed multi-core architectures with Single Instruction Multiple Data (SIMD) capabilities.

To address these issues, we present a strategy that extracts flow-dependencies between parameters of the ODEs and the algebraic solver of individual neurons. Based on the resulting dependencies map, we provide three techniques for memory, communication, and computation reorganization that yield a load-balanced distributed asynchronous execution. The new computation model distributes datasets and balances computational workload across a distributed memory space, exposing a tree-based parallelism of neuron topological structure, an embarrassingly parallel execution model of neuron subtrees, and a SIMD acceleration of subtree state updates.

The capabilities of our methods are demonstrated on a prototype implementation developed on the core compute kernel of the NEURON scientific application, built on the HPX runtime system for the ParalleX execution model. Our implementation yields a fully-asynchronous  distributed and parallel simulation that accelerates single neuron to medium-sized neural networks. Benchmark results display better strong scaling properties, finer-grained parallelism, and lower time to solution compared to the state of the art, on a wide range of distributed multi-core compute architectures.