Commentary on “Challenges and Prospects in Vision and Language Research”

  • 3,283

    Total downloads

  • 27k

    Total views and downloads

About this Research Topic

Submission closed

Background

We invite open peer commentaries on the questions raised by the target article in this Research Topic, and on the perspective the article brings to the research area. Commentary is to be construed broadly, to include (without being limited to):

- Substantive criticism directly responding to the target article,
- Different perspectives on the questions and issues raised by the target article, and
- Alternative solutions to the problems raised by the target article.

The authors of the target paper will have an opportunity to respond to the commentaries as well.

The goal of these commentaries is to stimulate discussion in the research community and so advance the state of thinking in the field. We solicit a broad range of such responses, in line with this goal. The journal offers multiple article types; typically, commentaries will be Perspective, Conceptual Analysis, General Commentary, or Opinion type articles, but other article types may be considered if relevant.

Target Article:
"Challenges and Prospects in Vision and Language Research", by Kushal Kafle, Robik Shrestha and Christopher Kanan

Abstract:
Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding.
However, the datasets and evaluation procedures used in these tasks are replete with flaws which allows the vision and language (V&L) algorithms to achieve a good performance without a robust understanding of vision and language. We argue for this position based on several recent studies in V&L literature and our own observations of dataset bias, robustness, and spurious correlations. Finally, we propose that several of these challenges can be mitigated by creation of carefully designed benchmarks.

Keywords: vision and language, deep learning, natural language processing, computer vision, image classification, object detection, entity recognition, sentiment analysis, question-answering, dialog systems, language and computation

Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic editors