AUTHOR=Jamieson Blair , Stubbs Matt , Ramanna Sheela , Walker John , Prouse Nick , Akutsu Ryosuke , de Perio Patrick , Fedorko Wojciech TITLE=Using machine learning to improve neutron identification in water Cherenkov detectors JOURNAL=Frontiers in Big Data VOLUME=5 YEAR=2022 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2022.978857 DOI=10.3389/fdata.2022.978857 ISSN=2624-909X ABSTRACT=

Water Cherenkov detectors like Super-Kamiokande, and the next generation Hyper-Kamiokande are adding gadolinium to their water to improve the detection of neutrons. By detecting neutrons in addition to the leptons in neutrino interactions, an improved separation between neutrino and anti-neutrinos, and reduced backgrounds for proton decay searches can be expected. The neutron signal itself is still small and can be confused with muon spallation and other background sources. In this paper, machine learning techniques are employed to optimize the neutron capture detection capability in the new intermediate water Cherenkov detector (IWCD) for Hyper-K. In particular, boosted decision tree (XGBoost), graph convolutional network (GCN), and dynamic graph convolutional neural network (DGCNN) models are developed and benchmarked against a statistical likelihood-based approach, achieving up to a 10% increase in classification accuracy. Characteristic features are also engineered from the datasets and analyzed using SHAP (SHapley Additive exPlanations) to provide insight into the pivotal factors influencing event type outcomes. The dataset used in this research consisted of roughly 1.6 million simulated particle gun events, divided nearly evenly between neutron capture and a background electron source. The current samples used for training are representative only, and more realistic samples will need to be made for the analyses of real data. The current class split is 50/50, but there is expected to be a difference between the classes in the real experiment, and one might consider using resampling techniques to address the issue of serious imbalances in the class distribution in real data if necessary.