Introduction: Facial expression recognition has always been a hot topic in computer vision and artificial intelligence. In recent years, deep learning models have achieved good results in accurately recognizing facial expressions. BILSTM network is such a model. However, the BILSTM network's performance depends largely on its hyperparameters, which is a challenge for optimization.
Methods: In this paper, a Northern Goshawk optimization (NGO) algorithm is proposed to optimize the hyperparameters of BILSTM network for facial expression recognition. The proposed methods were evaluated and compared with other methods on the FER2013, FERplus and RAF-DB datasets, taking into account factors such as cultural background, race and gender.
Results: The results show that the recognition accuracy of the model on FER2013 and FERPlus data sets is much higher than that of the traditional VGG16 network. The recognition accuracy is 89.72% on the RAF-DB dataset, which is 5.45, 9.63, 7.36, and 3.18% higher than that of the proposed facial expression recognition algorithms DLP-CNN, gACNN, pACNN, and LDL-ALSG in recent 2 years, respectively.
Discussion: In conclusion, NGO algorithm effectively optimized the hyperparameters of BILSTM network, improved the performance of facial expression recognition, and provided a new method for the hyperparameter optimization of BILSTM network for facial expression recognition.
Introduction: Boxing as a sport is growing on Chinese campuses, resulting in a coaching shortage. The human pose estimation technology can be employed to estimate boxing poses and teach interns to relieve the shortage. Currently, 3D cameras can provide more depth information than 2D cameras. It can potentially improve the estimation. However, the input channels are inconsistent between 2D and 3D images, and there is a lack of detailed analysis about the key point location, which indicates the network design for improving the human pose estimation technology.
Method: Therefore, a model transfer with channel patching was implemented to solve the problems of channel inconsistency. The differences between the key points were analyzed. Three popular and highly structured 2D models of OpenPose (OP), stacked Hourglass (HG), and High Resolution (HR) networks were employed. Ways of reusing RGB channels were investigated to fill up the depth channel. Then, their performances were investigated to find out the limitations of each network structure.
Results and discussion: The results show that model transfer learning by the mean way of RGB channels patching the lacking channel can improve the average accuracies of pose key points from 1 to 20% than without transfer. 3D accuracies are 0.3 to 0.5% higher than 2D baselines. The stacked structure of the network shows better on hip and knee points than the parallel structure, although the parallel design shows much better on the residue points. As a result, the model transfer can practically fulfill boxing pose estimation from 2D to 3D.
Filter pruning is widely used for inference acceleration and compatibility with off-the-shelf hardware devices. Some filter pruning methods have proposed various criteria to approximate the importance of filters, and then sort the filters globally or locally to prune the redundant parameters. However, the current criterion-based methods have problems: (1) parameters with smaller criterion values for extracting edge features are easily ignored, and (2) there is a strong correlation between different criteria, resulting in similar pruning structures. In this article, we propose a novel simple but effective pruning method based on filter similarity, which is used to evaluate the similarity between filters instead of the importance of a single filter. The proposed method first calculates the similarity of the filters pairwise in one convolutional layer and then obtains the similarity distribution. Finally, the filters with high similarity to others are deleted from the distribution or set to zero. In addition, the proposed algorithm does not need to specify the pruning rate for each layer, and only needs to set the desired FLOPs or parameter reduction to obtain the final compression model. We also provide iterative pruning strategies for hard pruning and soft pruning to satisfy the tradeoff requirements of accuracy and memory in different scenarios. Extensive experiments on various representative benchmark datasets across different network architectures demonstrate the effectiveness of our proposed method. For example, on CIFAR10, the proposed algorithm achieves 61.1% FLOPs reduction by removing 58.3% of the parameters, with no loss in Top-1 accuracy on ResNet-56; and reduces 53.05% FLOPs on ResNet-50 with only 0.29% Top-1 accuracy degradation on ILSVRC-2012.
Motivation: Image dehazing, as a key prerequisite of high-level computer vision tasks, has gained extensive attention in recent years. Traditional model-based methods acquire dehazed images via the atmospheric scattering model, which dehazed favorably but often causes artifacts due to the error of parameter estimation. By contrast, recent model-free methods directly restore dehazed images by building an end-to-end network, which achieves better color fidelity. To improve the dehazing effect, we combine the complementary merits of these two categories and propose a physical-model guided self-distillation network for single image dehazing named PMGSDN.
Proposed method: First, we propose a novel attention guided feature extraction block (AGFEB) and build a deep feature extraction network by it. Second, we propose three early-exit branches and embed the dark channel prior information to the network to merge the merits of model-based methods and model-free methods, and then we adopt self-distillation to transfer the features from the deeper layers (perform as teacher) to shallow early-exit branches (perform as student) to improve the dehazing effect.
Results: For I-HAZE and O-HAZE datasets, better than the other methods, the proposed method achieves the best values of PSNR and SSIM being 17.41dB, 0.813, 18.48dB, and 0.802. Moreover, for real-world images, the proposed method also obtains high quality dehazed results.
Conclusion: Experimental results on both synthetic and real-world images demonstrate that the proposed PMGSDN can effectively dehaze images, resulting in dehazed results with clear textures and good color fidelity.