Event Abstract

The CARMEN data sharing portal project: what have we learned?

  • 1 University of Stirling, Computing Science and Mathematics, United Kingdom
  • 2 University of York, Computer Science, United Kingdom
  • 3 University of Cambridge, Applied Mathematics and Theoretical Physics, United Kingdom
  • 4 Newcastle University, Institute of Neuroscience, United Kingdom

The UK CARMEN project represents one of the first major efforts at sharing electrophysiological datasets, and the techniques for processing them, using a portal. It started in 2006 (with the late Professor Colin Ingram as Principal Investigator), with funding from the UK EPSRC, and this was followed on with funding from the UK BBSRC. It has been providing a gradually improving service for about five years, starting from the capability or sharing data, and adding services, and workflows. It has its own internal data format, Neural Data Format: by converting proprietary dataset types to this format using a service, we enable services and workflows that process this format to be applicable to datasets originating from many different types of recording platforms. Given the experience that we have gained from running this service, what have we learned? What would we do differently if we were to start again? Is there still interest in this type of capability, or has the world moved onwards? We recently put out a questionnaire to all registered users of CARMEN, and we have now some feedback from registered users, and (perhaps equally importantly) from people who registered and did not end up using the system. In general, the use of the system for secure data sharing and exchange seems to have been the most popular. Certainly, in the design of the system, we were very aware that geographically distributed neuroinformaticians and neuroscientists wanted to share their datasets, and to be able to do so in a way that was secure. This seems to have been one of the successes of the system. Yet had we only wanted to do that, we could have put together a much simpler system altogether! Certainly, it is the case that some users have used the system in a much more powerful way, as evidenced by the recent paper [Eglen et al 2014]. But such types of users have been relatively few. What is it that has put users off from more sophisticated interaction with what is in essence a platform that could be used for extended analysis and sharing of data from many different laboratories? One issue has been speed of access. Although the network at the server end is fast, many users do not have such fast access from their laboratories. The result is that uploading large datasets (and indeed downloading them as well) can be slow. There is little that the CARMEN staff can really do to help here, because the problem lies at the users end, and is not under the control of CARMEN itself. Many users complained that the use of the services and workflows was difficult. They reported that it was difficult to work out exactly how to use them, and even to find out exactly what services were available precisely. This is a bit disappointing: a great deal of time was spent in trying to make services usable, and in enabling effective search techniques, and providing information on these services within the system. But perhaps the system is complex to use, and many users, more used to expensively developed sites that were easier to use, did not spend the time really finding out what could be done. That said, it is clear that running services was relatively complex, and further, that running multiple services (and workflows) was really quite difficult to organise. There does seem to be some agreement that using CARMEN services and workflows for cross-dataset analysis (i.e. on datasets from a number of sources) is of interest, however, very little data on CARMEN has been made public, so unless users have direct access to datasets that they can then upload, this type of activity has been difficult. Some users simply did not like the concept: they wanted something that was decentralised, and could use many local machines. Some wanted a more professionally designed “look and feel” as well. For the first of these, the issue of dataset size is problematic - indeed, that was the reason for the basic design, with the concept of bringing the processing to the data, rather than the other way around. For the second, we too would have liked to employ more professional designers, but the budget did not stretch that far. Another suggestion is direct integration on to the systems that the neurophysiologists are already using. This would be a great idea, but there are many such systems (although integrating it on to Matlab, which is often used for initial data analysis would be a possibility). In addition, the CARMEN project has been running at a time of rapid technological change within the Internet. Much of the user-facing processing was designed initially to use Java applets, because in that way we could provide systems that enabled uploading and downloading in a secure and effective fashion. But times have moved on, and one would now expect to use a mixture of HTML5 and JavaScript for these types of purposes. The datasets are very complex, particularly when one includes the multiplicity of data types in electrophysiological datasets (simple time series, excerpted sections, spikes, etc., plus the metadata that describes the representation, and the experiments that produced the dataset). Neural Data Format (NDF) caters for these. At the time that the NDF was designed, HDF5 was not really able to work with data in the way that we desired. This is no longer the case, and were we to redesign NDF, we would now use an HDF5 based format. This would be a major task, but we can get around the issues by creating services to translate between HDF5 and NDF. It is worth noting that HDF5 alone does not solve the problem. Indeed one of the INCF Task Forces has been developing an HDF5 format for this type of application, and this work is only now nearing completion. Another aspect of technological change lies in data display. When CARMEN started, there was no straightforward way of enabling complex data display in a browser (short of a very complex Java applet). As a result, we used a proprietary piece of software for data display. Now, however, thanks to the large expansion in the capabilities of JavaScript, this is no longer the case. Reading over the users comments, it appears that CARMEN, or a portal like it, remains a popular idea: however, it needs to be easy to use, both for upload/download and for running services and workflows. Documentation needs to be better, and easy to find (perhaps easy to find is critically important here). We are planning a new project proposal, and we will be taking these issues into account.

Acknowledgements

UK EPSRC grant EP/E002331/1 and BBSRC grant BB/IO01042/1.

References

[Eglen et al 2014] A data repository and analysis framework for spontaneous neural activity recordings in developing retina, SJ Eglen, M Weeks, M Jessop, J Simonotto, T Jackson and E Sernagor, GigaScience, 3:3, 2014, doi:10.1186/2047-217X-3-3

Keywords: neuroscience data portal, data processing services, data processing workflows, data sharing, electrophysiological time series data

Conference: Neuroinformatics 2014, Leiden, Netherlands, 25 Aug - 27 Aug, 2014.

Presentation Type: Poster, not to be considered for oral presentation

Topic: Infrastructural and portal services

Citation: Smith LS, Austin J, Eglen S, Jackson T, Jessop M, Liang B, Weeks M and Sernagor E (2014). The CARMEN data sharing portal project: what have we learned?. Front. Neuroinform. Conference Abstract: Neuroinformatics 2014. doi: 10.3389/conf.fninf.2014.18.00068

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 28 Apr 2014; Published Online: 04 Jun 2014.

* Correspondence: Prof. Leslie S Smith, University of Stirling, Computing Science and Mathematics, Stirling, Scotland, FK9 4LA, United Kingdom, l.s.smith@cs.stir.ac.uk