Accurately predicting the competitive performance of elite athletes is an essential prerequisite for formulating competitive strategies. Women’s all-around speed skating event consists of four individual subevents, and the competition system is complex and challenging to make accurate predictions on their performance.
The present study aims to explore the feasibility and effectiveness of machine learning algorithms for predicting the performance of women’s all-around speed skating event and provide effective training and competition strategies.
The data, consisting of 16 seasons of world-class women’s all-around speed skating competition results, used in the present study came from the International Skating Union (ISU). According to the competition rules, distinct features are filtered using lasso regression, and a 5,000 m race model and a medal model are built using a fivefold cross-validation method.
The results showed that the support vector machine model was the most stable among the 5,000 m race and the medal models, with the highest AUC (0.86, 0.81, respectively). Furthermore, 3,000 m points are the main characteristic factors that decide whether an athlete can qualify for the final. The 11th lap of the 5,000 m, the second lap of the 500 m, and the fourth lap of the 1,500 m are the main characteristic factors that affect the athlete’s ability to win medals.
Compared with logistic regression, random forest, K-nearest neighbor, naive Bayes, neural network, support vector machine is a more viable algorithm to establish the performance prediction model of women’s all-around speed skating event; excellent performance in the 3,000 m event can facilitate athletes to advance to the final, and athletes with outstanding performance in the 500 m event are more likely competitive for medals.