Monitoring bycatch of protected species is a fisheries management priority. In practice, protected species bycatch is difficult to precisely or accurately estimate with commonly used ratio estimators or parametric, linear model-based methods. Machine-learning algorithms have been proposed as means of overcoming some of the analytical hurdles in estimating protected species bycatch.
Using 17 years of set-specific bycatch data derived from 100% observer coverage of the Hawaii shallow-set longline fishery and 25 aligned environmental predictors, we evaluated a new approach for protected species bycatch estimation using Ensemble Random Forests (ERFs). We tested the ability of ERFs to predict interactions with five protected species with varying levels of bycatch in the fishery and methods for correcting these predictions using Type I and Type II error rates from the training data. We also assessed the amount of training data needed to inform a ERF approach by mimicking the sequential addition of new data in each subsequent fishing year.
We showed that ERF bycatch estimation was most effective for species with greater than 2% interaction rates and error correction improved bycatch estimates for all species but introduced a tendency to regress estimates towards mean rates in the training data. Training data needs differed among species but those above 2% interaction rates required 7-12 years of bycatch data.
Our machine learning approach can improve bycatch estimates for rare species but comparisons are needed to other approaches to assess which methods perform best for hyperrare species.