AUTHOR=Zhou Shichao , Li Haoyan , Wang Zhuowei , Zhang Zekai 

TITLE=In defense of local descriptor-based few-shot object detection

JOURNAL=Frontiers in Neuroscience

VOLUME=Volume 18 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2024.1349204

DOI=10.3389/fnins.2024.1349204

ISSN=1662-453X

ABSTRACT=State-of-the-art image object detection computational models require an intensive parameters fine-tuning stage (using Deep Convolution Network, etc.) with tens or hundreds of training examples. In contrast, human intelligence can robustly learn a new concept from just a few instances (i.e., few-shot detection). The distinctive perception mechanisms between these two families of systems enlighten us to revisit classical hand-craft local descriptors (e.g., SIFT, HOG, etc.) as well as non-parametric visual models, which innately require no learning/training phase. Herein, we claim that the inferior performance of these local descriptors mainly results from a lack of global structure sense. To address this issue, we refine local descriptors with spatial contextual attention of neighbour affinities and then embed the local descriptors into discriminative subspace guided by Kernel-InfoNCE loss. Differing from conventional quantization of local descriptors in high-dimensional feature space or isometric dimension reduction, we actually seek a brain-inspired few-shot feature representation for the object manifold, which combines data-independent primitives representation and semantic contexts learning, and thus helps with generalization. The obtained embeddings as pattern vectors/tensors permit us an accelerated but non-parametric visual similarity computation as the decision rule for final detection.Our approach to few-shot object detection is nearly learning-free, and experiments on remote sensing imageries (approximate 2-D affine space) confirm the efficacy of our model.