Publication Date:
2019
abstract:
Relational reasoning in Computer Vision has recently shown impressive results on visual question answering tasks. On the challenging dataset called CLEVR, the recently proposed Relation Network (RN), a simple plug-and-play module and one of the state-of-the-art approaches, has obtained a very good accuracy (95.5%) answering relational questions. In this paper, we define a sub-field of Content-Based Image Retrieval (CBIR) called Relational-CBIR (R-CBIR), in which we are interested in retrieving images with given relationships among objects. To this aim, we employ the RN architecture in order to extract relation-aware features from CLEVR images. To prove the effectiveness of these features, we extended both CLEVR and Sort-of-CLEVR datasets generating a ground-truth for R-CBIR by exploiting relational data embedded into scene-graphs. Furthermore, we propose a modification of the RN module - a two-stage Relation Network (2S-RN) - that enabled us to extract relation-aware features by using a preprocessing stage able to focus on the image content, leaving the question apart. Experiments show that our RN features, especially the 2S-RN ones, outperform the RMAC state-of-the-art features on this new challenging task.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
deep learning; relational learning; content-based image retrieval; multimedia information retrieval; computer vision
List of contributors:
Carrara, Fabio; Messina, Nicola; Amato, Giuseppe; Gennaro, Claudio; Falchi, Fabrizio
Full Text:
Book title:
ECCV 2018: Computer Vision - ECCV 2018 Workshops