AUTHOR=Bianchi Luigi , Ferrante Raffaele , Hu Yaoping , Sahonero-Alvarez Guillermo , Zenia Nusrat Z. 

TITLE=Merging Brain-Computer Interface P300 speller datasets: Perspectives and pitfalls

JOURNAL=Frontiers in Neuroergonomics

VOLUME=Volume 3 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/neuroergonomics/articles/10.3389/fnrgo.2022.1045653

DOI=10.3389/fnrgo.2022.1045653

ISSN=2673-6195

ABSTRACT=In the last decades, the P300 Speller paradigm was replicated in many experiments, and collected data were released to the public domain to allow research groups, particularly those in the field of machine learning, to test and improve their algorithms for higher performances of brain-computer interface (BCI) systems. Training data is needed to learn the identification of brain activity. The more training data are available, the better the algorithms will perform. The availability of larger datasets is highly desirable, eventually obtained by merging datasets from different repositories. The main obstacle to such merging is that all public datasets are released in various file formats because no standard way is established to share these data. Additionally, all datasets necessitate reading documents or scientific papers to retrieve relevant information, which prevents automating the processing. In this study, we thus adopted a unique file format to demonstrate the importance of having a standard and to propose which information should be stored and why. We described our process to convert a dozen P300 Speller datasets and reported the main encountered problems while converting them into the same file format. All the datasets are characterized by the same 6 x 6 matrix of alphanumeric symbols (characters and numbers or symbols) and by the same subset of acquired signals (8 EEG sensors at the same recording sites). Nearly a million stimuli were converted, relative to about 7 thousand spelled characters and belonging to 127 subjects. This represents the most extensive available platform for training and testing new algorithms on this specific paradigm (the P300 Speller). The platform could potentially allow exploring transfer learning procedures to reduce or eliminate the time needed to train a classifier to improve the performance and accuracy of such BCI systems.