Event Abstract

File format and library for neuroscience data and metadata

  • 1 Department Biologie II, Ludwig-Maximilians-Universität, German Neuroinformatics Node, Germany
  • 2 Universität Tübingen, Institut für Neurobiologie, Germany

Growing complexity of experiments and amount of acquired data in the field of neuroscience and electrophysiology pose increasing demands on data and metadata management. File formats that can represent and persist such information play a key role in this process. Existing file formats are often highly domain specific and typically designed for efficiency with respect to certain kinds of data, such as time series or image data, or specific recording software or devices. The fact that many formats are only accessible via proprietary software imposes further limitations. Moreover many existing formats only have limited support for metadata annotations. A common, open and standardized file format that is versatile enough to represent various kinds of data together with metadata has the potential to increase community-based tool development as well as data sharing among different labs.
Here we present such a file format: it is based on a well defined data model which can be used to represent data and metadata in various backends. In order to specify a concrete file format, we used the model to create a schema for HDF5 files (www.hdfgroup.org/HDF5/).
The data model is able to represent and describe multidimensional data. It supports storing time series, spike trains, image and image stack data and various other kinds of data. It further allows the definition of points and regions of interest which can represent for example events or data segments. All data elements can be annotated with additional metadata using the odML data model (Grewe et al. 2011), which is an intrinsic part of the model specification. The data model is designed for being as flexible as possible but still expressive enough to provide the information, including units, sampling rates and labels, that is necessary to create a plot from the data without human interaction. The data model by design is not domain specific, but supports type annotation, providing the means to represent data in the generic model as domain specific entities. Due to its flexible design the data model is compatible with many other formats and able to represent data from NEO (www.neuralensemble.org/neo) or Neuroshare (www.neuroshare.org) files.
In the HDF5 format, the data model is represented in a rather flat hierarchy. A file consists of two main groups for data and metadata, respectively. Thus, data and metadata are stored in the same file while links can be established between both parts. Though it is of course possible to read these files with the standard HDF5 libraries, specific APIs provide a more convenient way to access the data on a higher abstraction level. Therefore we developed a reference implementation in C++ that can be used to include the format in existing tools and environments and may serve as a guideline for implementations in other languages. For more information see www.g-node.org/nix.

Acknowledgements

Motivation for this work came out of the activities in the context of the Electrophysiology Task Force of the INCF Program on Standards for Datasharing. Supported by the German INCF Node (BMBF grants 01GQ0801 and 01GQ1302).

References

Grewe J, Wachtler T, Benda J (2011). A bottom-up approach to data annotation in neurophysiology. Front. Neuroinform. 5:16. doi: 10.3389/fninf.2011.00016

Keywords: File format, metadata, data acquisition, data management, C++, odml

Conference: Neuroinformatics 2014, Leiden, Netherlands, 25 Aug - 27 Aug, 2014.

Presentation Type: Poster, not to be considered for oral presentation

Topic: General neuroinformatics

Citation: Stoewer A, Kellner CJ, Benda J, Wachtler T and Grewe J (2014). File format and library for neuroscience data and metadata. Front. Neuroinform. Conference Abstract: Neuroinformatics 2014. doi: 10.3389/conf.fninf.2014.18.00027

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 04 Apr 2014; Published Online: 04 Jun 2014.

* Correspondence: Mr. Adrian Stoewer, Department Biologie II, Ludwig-Maximilians-Universität, German Neuroinformatics Node, Munich, Germany, adrian@stoewer.me