AUTHOR=Pan Xingyu , Tougne Laure TITLE=A New Database of Digits Extracted from Coins with Hard-to-Segment Foreground for Optical Character Recognition Evaluation JOURNAL=Frontiers in ICT VOLUME=4 YEAR=2017 URL=https://www.frontiersin.org/journals/ict/articles/10.3389/fict.2017.00009 DOI=10.3389/fict.2017.00009 ISSN=2297-198X ABSTRACT=
Since the release date struck on a coin is important information of its monetary type, recognition of extracted digits may assist in identification of monetary types. However, digit images extracted from coins are challenging for conventional optical character recognition methods because the foreground of such digits has very often the same color as their background. In addition, other noises, including the wear of coin metal, make it more difficult to obtain a correct segmentation of the character shape. To address those challenges, this article presents the CoinNUMS database for automatic digit recognition. The database CoinNUMS, containing 3,006 digit images, is divided into three subsets. The first subset CoinNUMS_geni consists of 606 digit images manually cropped from high-resolution photographs of well-conserved coins from GENI coin photographs; the second subset CoinNUMS_pcgs_a consists of 1,200 digit images automatically extracted from a subset of the USA_Grading numismatic database containing coins in different quality; the last subset CoinNUMS_pcgs_m consists of 1,200 digit images manually extracted from the same coin photographs as CoinNUMS_pcgs_a. In CoinNUMS_pcgs_a and CoinNUMS_pcgs_m, the digit images are extracted from the release date. In CoinNUMS_geni, the digit images can come from the cropped date, the face value, or any other legends containing digits in the coin. To show the difficulty of these databases, we have tested recognition algorithms of the literature. The database and the results of the tested algorithms will be freely available on a dedicated website.