AUTHOR=Kraft Kaisa , Velhonoja Otso , Eerola Tuomas , Suikkanen Sanna , Tamminen Timo , Haraguchi Lumi , Ylöstalo Pasi , Kielosto Sami , Johansson Milla , Lensu Lasse , Kälviäinen Heikki , Haario Heikki , Seppälä Jukka 

TITLE=Towards operational phytoplankton recognition with automated high-throughput imaging, near-real-time data processing, and convolutional neural networks

JOURNAL=Frontiers in Marine Science

VOLUME=Volume 9 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2022.867695

DOI=10.3389/fmars.2022.867695

ISSN=2296-7745

ABSTRACT=Plankton communities form the basis of aquatic ecosystems and elucidating their role in increasingly important environmental issues is a persistent research question. Recent technological advances in automated microscopic imaging, together with cloud platforms for high-performance computing, have created possibilities for collecting and processing detailed high-frequency data on planktonic communities, opening new horizons for testing core hypotheses in aquatic ecosystems. Analyzing continuous streams of big data calls for development and deployment of novel computer vision and machine learning systems. The implementation of these analysis systems is not always straightforward with regards to operationality, and issues regarding dataflows, computing and data treatment need to be considered. 
We created a data pipeline for automated near-real-time classification of phytoplankton during remote deployment of imaging flow cytometer (Imaging FlowCytobot, IFCB). Convolutional neural network (CNN) is used to classify continuous imaging data with probability thresholds used to filter out images not belonging to our existing classes. The automated data flow and classification system were used to monitor dominating species of filamentous cyanobacteria on the coast of Finland during summer 2021. We demonstrate that good phytoplankton recognition can be achieved with transfer learning utilizing a relatively shallow, publicly available, pre-trained CNN model and fine-tuning it with community-specific phytoplankton images (overall F1-score of 0.95 for test set of our labeled image data complemented with a 50% unclassifiable image portion). This enables both fast training and low computing resource requirements for model deployment making it easy to modify and applicable in wide range of situations. The system performed well when used to classify a natural phytoplankton community over different seasons (overall F1-score 0.82 for our evaluation data set). Furthermore, we address the key challenges of image classification for varying planktonic communities and analyze the practical implications of confused classes. 
We published our labeled image data set of Baltic Sea phytoplankton community for the training of image recognition models (~63000 images in 50 classes)  to accelerate implementation of imaging systems for other brackish and freshwater communities. Our evaluation data set, 59 fully annotated samples of natural communities throughout an annual cycle, is also available for model testing purposes (~150000 images).