HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation

Zijian Zhou; Miaojing Shi; Holger Caesar

doi:10.1109/ICCV51070.2023.01978

HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation

Zijian Zhou, Miaojing Shi^*, Holger Caesar

^*Corresponding author for this work

Informatics

Delft University of Technology

Research output: Contribution to journal › Conference paper › peer-review

7 Citations (Scopus)

51 Downloads (Pure)

Abstract

Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo.

Original language	English
Pages (from-to)	21580-21591
Number of pages	12
Journal	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
DOIs	https://doi.org/10.1109/ICCV51070.2023.01978
Publication status	Published - 15 Jan 2024
Event	2023 IEEE/CVF International Conference on Computer Vision (ICCV) - Paris, France Duration: 1 Oct 2023 → 6 Oct 2023

Access to Document

10.1109/ICCV51070.2023.01978

HiLo Exploiting High Low_ZHOU_Accepted14July2023_GREEN AAMAccepted author manuscript, 1.48 MB

https://openaccess.thecvf.com/content/ICCV2023/papers/Zhou_HiLo_Exploiting_High_Low_Frequency_Relations_for_Unbiased_Panoptic_Scene_ICCV_2023_paper.pdf

Cite this

@article{73e0b3622fa84a0baa8a016644730143,

title = "HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation",

abstract = "Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo.",

author = "Zijian Zhou and Miaojing Shi and Holger Caesar",

note = "Funding Information: In this work we proposed the HiLo framework to tackle the long-tail problem with relational semantic overlap in Panoptic Scene Graph generation. The HiLo framework simultaneously learns the high and low frequency relations in different network branches and unifies their strengths by aligning their predictions. We also constructed a HiLo baseline to allow high-quality panoptic segmentation to improve PSG performance. Experimental results demonstrate that our method achieves state-of-the-art performance on the PSG dataset, confirming its effectiveness. In future work, we will investigate how knowledge distillation [28, 23] can be used to fuse the high and low branches in our method, as well as its application to downstream tasks such as visual question answering and image captioning. Acknowledgment The authors would like to thank Prof. Tomasz Radzik for helpful discussions. Computing resources provided by King{\textquoteright}s Computational Research, Engineering and Technology Environment (CREATE). This work was supported by the European Union{\textquoteright}s Horizon 2020 FET Proactive Program under Agreement 101017857 and Fundamental Research Funds for the Central Universities. Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision (ICCV) ; Conference date: 01-10-2023 Through 06-10-2023",

year = "2024",

month = jan,

day = "15",

doi = "10.1109/ICCV51070.2023.01978",

language = "English",

pages = "21580--21591",

journal = "2023 IEEE/CVF International Conference on Computer Vision (ICCV)",

issn = "1550-5499",

publisher = "IEEE",

}

TY - JOUR

T1 - HiLo

T2 - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

AU - Zhou, Zijian

AU - Shi, Miaojing

AU - Caesar, Holger

N1 - Funding Information: In this work we proposed the HiLo framework to tackle the long-tail problem with relational semantic overlap in Panoptic Scene Graph generation. The HiLo framework simultaneously learns the high and low frequency relations in different network branches and unifies their strengths by aligning their predictions. We also constructed a HiLo baseline to allow high-quality panoptic segmentation to improve PSG performance. Experimental results demonstrate that our method achieves state-of-the-art performance on the PSG dataset, confirming its effectiveness. In future work, we will investigate how knowledge distillation [28, 23] can be used to fuse the high and low branches in our method, as well as its application to downstream tasks such as visual question answering and image captioning. Acknowledgment The authors would like to thank Prof. Tomasz Radzik for helpful discussions. Computing resources provided by King’s Computational Research, Engineering and Technology Environment (CREATE). This work was supported by the European Union’s Horizon 2020 FET Proactive Program under Agreement 101017857 and Fundamental Research Funds for the Central Universities. Publisher Copyright: © 2023 IEEE.

PY - 2024/1/15

Y1 - 2024/1/15

N2 - Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo.

AB - Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo.

UR - http://www.scopus.com/inward/record.url?scp=85179262958&partnerID=8YFLogxK

U2 - 10.1109/ICCV51070.2023.01978

DO - 10.1109/ICCV51070.2023.01978

M3 - Conference paper

SN - 1550-5499

SP - 21580

EP - 21591

JO - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

JF - 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Y2 - 1 October 2023 through 6 October 2023

ER -

HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation

Abstract

Access to Document

Other files and links

Fingerprint

Cite this