Weakly Supervised Referring Image Segmentation with Infra-Chunk and In…
페이지 정보
조회 22회 작성일 26-03-19 15:27
본문
Abstract
Referring image segmentation aims to localize the object in an image referred to by a naturallanguage expression. Most previous studies rely on large-scale datasets with segmentation labels,which are costly to obtain. In this work, we present a weakly supervised learning method thatutilizes only readily available image-text pairs.We first train a vision-language model for image-text matching and extract visual saliency mapsusing Grad-CAM to identify regions corresponding to each word. However, Grad-CAM presents twomajor limitations. First, it does not sufficiently capture semantic relationships between words.To address this, we model these relationships through intra-chunk and inter-chunk consistency.Second, it tends to highlight only small regions of the target object, resulting in low recall.To overcome this, we refine localization maps using Transformer-based self-attention andunsupervised object shape priors.Experiments on benchmark datasets (RefCOCO, RefCOCO+, G-Ref) demonstrate that our methodsignificantly outperforms existing approaches. Furthermore, the proposed method is applicableacross various levels of supervision and consistently achieves superior performance.
첨부파일
-
Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency.pdf (1.6M)
4회 다운로드 | DATE : 26-03-19 15:27