About this Research Topic
Possible solutions may lay towards a three-pronged approach, which involves improvements in datasets, algorithms, and evaluation metrics used for V&L research. Recently, advances have been made in each of those tasks including the creation of synthetic datasets (NLVR, CLEVR), new variations on existing datasets (VQA-CP, TDIUC, nocaps), and altogether new tasks (GQA, FOIL, Social-IQ). In parallel, a plethora of new algorithms, evaluation metrics, and critical analyses regarding dataset bias, spurious correlations, interpretability, out-of-distribution performance, and related issues have been implemented.
By bringing together researchers from machine learning, computer vision, natural language processing areas, and experts from a variety of application domains, this Research Topic aims at representing the state-of-the-art in V&L research and at fostering new foundational research towards robust, fair, and interpretable AI for V&L.
Therefore, we seek a broad range of original contributions by researchers and practitioners from different disciplines within the V&L domain. We welcome submissions regarding novel algorithms, datasets, analysis, and other innovations that make advancements in highlighting and addressing challenges in vision and language research, particularly along the lines of demonstrating improved algorithmic fairness, interpretability, and robustness to bias, spurious correlations, long-tailed and out-of-distribution data.
The submissions may include, but not limited to, the following topics:
- Novel algorithms and techniques that help improve the state-of-the-art in existing V&L tasks;
- Novel V&L algorithms that are less prone to dataset bias and spurious correlations, enforce demographic fairness and/or are more interpretable and explainable;
- Novel datasets, sub-tasks, and challenges that help test for new capabilities and/or highlight shortcomings with existing datasets and algorithms in V&L;
- Controlled test sets aimed to evaluate specific abilities involved in the language grounded visual understanding;
- Probing tasks aiming to evaluate the quality of the multimodal V&L representations;
- Novel evaluation metrics that help accurate and fair evaluation of V&L algorithms with respect to dataset bias, label imbalance, lack of compositionality, and other related issues with V&L tasks;
- Previously unknown analysis, key-observations, discussion, and insights about bias and related issues in existing V&L datasets and algorithms;
- Negative or critical results regarding practices currently used in mainstream V&L research;
- Successes or challenges of integrating vision and language in a novel application domain.
We also welcome well-formulated survey articles, opinion pieces, position papers, or commentaries regarding the current state and future prospects of V&L research as long as it fits within the theme of the Research Topic.
Keywords: vision and language, bias and fairness, explainable visual grounding, probing tasks for vision and language, evaluation of vision and language models
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.