Event Abstract

Quantifying the difficulty of object recognition tasks via scaling of accuracy vs. training set size

  • 1 Los Alamos National Laboratory, United States
  • 2 Santa Fe Institute , United States
  • 3 New Mexico Consortium , United States
  • 4 University North Texas, United States

Hierarchical models of primate visual cortex (e.g. neocognitron/HMAX) have been shown to perform as well or better than other computer vision approaches in object identification tasks. However, to date only small system sizes (in numbers of neurons and synapses) have been used, commensurate with the scale of visual training sets, containing typically hundreds of images or a few minutes of video (<1 gigapixel). A rough estimate translates the size of the human visual cortex, in terms of number of neurons and synapses, to ~1 petaflop of computation, while the scale of human visual experience greatly exceeds standard computer vision datasets: the retina delivers ~1 petapixel/year to the brain (few terapixel/day), driving learning at many levels of the cortical system. This disparity of scales raises the question of how system performance may increase with larger amounts of unsupervised and supervised learning and whether it can approach human accuracy given sufficient training. Here, we describe work at Los Alamos National Laboratory (LANL) to develop large-scale functional models of visual cortex on LANL’s supercomputers. We present quantitative criteria for assessing when a set of learned local representations is complete, based on its statistical evolution with the size of unsupervised learning sets. We compare representations resulting from several learning rules to well-known prototypes in primary visual cortex and known constraints in higher layers. We then quantify the difficulty of different object recognition tasks via the improvement in classification performance with the size of the supervised training set. We show that classification performance rises in a general quantitative way with the number of examples in the training set, and that these scaling coefficients quantify the difficulty of the task. Specifically we find a universal form where accuracy=a+b log(N), where a, and b are constants that depend on the details of the system architecture and layer representations and N is the number of images in the training set. Testing is performed on a fixed set of images (not seen during training), and a fit to accuracy versus N gives a, and b, and the critical set size N*=exp[(100-a)/b]. Thus, more difficult tasks correspond to larger necessary training sets N*. We compute N* for different standard datasets and in a variety of circumstances by varying learning rules and number of layers in the model. For example, comparing the behavior of a classifier (SVM) trained using the standard fixed Gabor V1 and imprinted V2, versus Hebbian learned V1 and V2, we see that the fully learned model starts at a lower accuracy, but eventually matches performance of the standard model as the training set grows, predicting N*imprinted=3996 > N*learned=3130 in animal/no animal object identification tasks. Finally, we discuss how scaling of performance accuracy with N provides a path to systematic improvements in system performance, regardless of initial benchmarks for small datasets, and supplies a quantitative criterion for the inclusion of new ingredients in the model, for example dealing with feed-back and lateral connectivity.

Conference: Computational and Systems Neuroscience 2010, Salt Lake City, UT, United States, 25 Feb - 2 Mar, 2010.

Presentation Type: Poster Presentation

Topic: Poster session II

Citation: Brumby S, Bettencourt LM, Rasmussen C, Bennett R, Ham M and Kenyon G (2010). Quantifying the difficulty of object recognition tasks via scaling of accuracy vs. training set size. Front. Neurosci. Conference Abstract: Computational and Systems Neuroscience 2010. doi: 10.3389/conf.fnins.2010.03.00325

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 08 Mar 2010; Published Online: 08 Mar 2010.

* Correspondence: Steven Brumby, Los Alamos National Laboratory, Los Alamos, United States, brumby@lanl.gov