Human motion prediction is one of the fundamental studies of computer vision. Much work based on deep learning has shown impressive performance for it in recent years. However, long-term prediction and human skeletal deformation are still challenging tasks for human motion prediction. For accurate prediction, this paper proposes a GCN-based two-stage prediction method. We train a prediction model in the first stage. Using multiple cascaded spatial attention graph convolution layers (SAGCL) to extract features, the prediction model generates an initial motion sequence of future actions based on the observed pose. Since the initial pose generated in the first stage often deviates from natural human body motion, such as a motion sequence in which the length of a bone is changed. So the task of the second stage is to fine-tune the predicted pose and make it closer to natural motion. We present a fine-tuning model including multiple cascaded causally temporal-graph convolution layers (CT-GCL). We apply the spatial coordinate error of joints and bone length error as loss functions to train the fine-tuning model. We validate our model on Human3.6m and CMU-MoCap datasets. Extensive experiments show that the two-stage prediction method outperforms state-of-the-art methods. The limitations of proposed methods are discussed as well, hoping to make a breakthrough in future exploration.
Deep learning has achieved enormous success in various computer tasks. The excellent performance depends heavily on adequate training datasets, however, it is difficult to obtain abundant samples in practical applications. Few-shot learning is proposed to address the data limitation problem in the training process, which can perform rapid learning with few samples by utilizing prior knowledge. In this paper, we focus on few-shot classification to conduct a survey about the recent methods. First, we elaborate on the definition of the few-shot classification problem. Then we propose a newly organized taxonomy, discuss the application scenarios in which each method is effective, and compare the pros and cons of different methods. We classify few-shot image classification methods from four perspectives: (i) Data augmentation, which contains sample-level and task-level data augmentation. (ii) Metric-based method, which analyzes both feature embedding and metric function. (iii) Optimization method, which is compared from the aspects of self-learning and mutual learning. (iv) Model-based method, which is discussed from the perspectives of memory-based, rapid adaptation and multi-task learning. Finally, we conduct the conclusion and prospect of this paper.
Recent RNN models deal with various dimensions of MTS as independent channels, which may lead to the loss of dependencies between different dimensions or the loss of associated information between each dimension and the global. To process MTS in a holistic way without losing the inter-relationship among dimensions, this paper proposes a novel Long-and Short-term Time-series network based on geometric algebra (GA), dubbed GA-LSTNet. Specifically, taking advantage of GA, multi-dimensional data at each time point of MTS is represented as GA multi-vectors to capture the inherent structures and preserve the correlation of those dimensions. In particular, traditional real-valued RNN, real-valued LSTM, and the back-propagation through time are extended to the GA domain. We evaluate the performance of the proposed GA-LSTNet model in prediction tasks on four well-known MTS datasets, and compared the prediction performance with other six methods. The experimental results indicate that our GA-LSTNet model outperforms traditional real-valued LSTNet with higher prediction accuracy, providing a more accurate solution for the existing shortcomings of MTS prediction models.
Human motion prediction based on 3D skeleton data is an active research topic in computer vision and multimedia analysis, which involves many disciplines, such as image processing, pattern recognition, and artificial intelligence. As an effective representation of human motion, human 3D skeleton data is favored by researchers because it provide resistant to light effects, scene changes, etc. earlier studies on human motion prediction focuses mainly on RBG data-based techniques. In recent years, researchers have proposed the fusion of human skeleton data and depth learning methods for human motion prediction and achieved good results. We first introduced human motion prediction research background and significance in this survey. We then summarized the latest deep learning-based techniques for predicting human motion in recent years. Finally, a detailed paper review and future development discussion are provided.