TY - JOUR
T1 - HiLo
T2 - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
AU - Zhou, Zijian
AU - Shi, Miaojing
AU - Caesar, Holger
N1 - Funding Information:
In this work we proposed the HiLo framework to tackle the long-tail problem with relational semantic overlap in Panoptic Scene Graph generation. The HiLo framework simultaneously learns the high and low frequency relations in different network branches and unifies their strengths by aligning their predictions. We also constructed a HiLo baseline to allow high-quality panoptic segmentation to improve PSG performance. Experimental results demonstrate that our method achieves state-of-the-art performance on the PSG dataset, confirming its effectiveness. In future work, we will investigate how knowledge distillation [28, 23] can be used to fuse the high and low branches in our method, as well as its application to downstream tasks such as visual question answering and image captioning. Acknowledgment The authors would like to thank Prof. Tomasz Radzik for helpful discussions. Computing resources provided by King’s Computational Research, Engineering and Technology Environment (CREATE). This work was supported by the European Union’s Horizon 2020 FET Proactive Program under Agreement 101017857 and Fundamental Research Funds for the Central Universities.
Publisher Copyright:
© 2023 IEEE.
PY - 2024/1/15
Y1 - 2024/1/15
N2 - Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo.
AB - Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo.
UR - http://www.scopus.com/inward/record.url?scp=85179262958&partnerID=8YFLogxK
U2 - 10.1109/ICCV51070.2023.01978
DO - 10.1109/ICCV51070.2023.01978
M3 - Conference paper
SN - 1550-5499
SP - 21580
EP - 21591
JO - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
JF - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Y2 - 1 October 2023 through 6 October 2023
ER -