Multimodal Relation Extraction with Efficient Graph Alignment

Changmeng Zheng; Junhao Feng; Ze Fu; Yi Cai; Qing Li; Tao Wang

doi:10.1145/3474085.3476968

Multimodal Relation Extraction with Efficient Graph Alignment

Changmeng Zheng, Junhao Feng, Ze Fu, Yi Cai^*, Qing Li, Tao Wang

^*Corresponding author for this work

Biostatistics & Health Informatics

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

80 Citations (Scopus)

Abstract

Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.

Original language	English
Title of host publication	MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
Publisher	Association for Computing Machinery, Inc
Pages	5298-5306
Number of pages	9
ISBN (Electronic)	9781450386517
DOIs	https://doi.org/10.1145/3474085.3476968
Publication status	Published - 17 Oct 2021
Event	29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, China Duration: 20 Oct 2021 → 24 Oct 2021

Publication series

Name	MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

Conference

Conference	29th ACM International Conference on Multimedia, MM 2021
Country/Territory	China
City	Virtual, Online
Period	20/10/2021 → 24/10/2021

Keywords

graph alignment
multimodal dataset
multimodal relation extraction

Access to Document

10.1145/3474085.3476968

Cite this

Zheng, C., Feng, J., Fu, Z., Cai, Y., Li, Q., & Wang, T. (2021). Multimodal Relation Extraction with Efficient Graph Alignment. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 5298-5306). (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3474085.3476968

@inbook{d16756d7e10c470b863942479cf2815c,

title = "Multimodal Relation Extraction with Efficient Graph Alignment",

abstract = "Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.",

keywords = "graph alignment, multimodal dataset, multimodal relation extraction",

author = "Changmeng Zheng and Junhao Feng and Ze Fu and Yi Cai and Qing Li and Tao Wang",

note = "Funding Information: The work presented in this paper has been supported by Hong Kong Research Grants Council through a General Research Fund (project no. PolyU 11204919), National Natural Science Foundation of China (62076100), and Fundamental Research Funds for the Central Universities, SCUT (D2210010,D2200150,and D2201300), the Science and Technology Planning Project of Guangdong Province (2020B0101100002) Publisher Copyright: {\textcopyright} 2021 ACM.; 29th ACM International Conference on Multimedia, MM 2021 ; Conference date: 20-10-2021 Through 24-10-2021",

year = "2021",

month = oct,

day = "17",

doi = "10.1145/3474085.3476968",

language = "English",

series = "MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "5298--5306",

booktitle = "MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia",

}

Zheng, C, Feng, J, Fu, Z, Cai, Y, Li, Q & Wang, T 2021, Multimodal Relation Extraction with Efficient Graph Alignment. in MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia. MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, pp. 5298-5306, 29th ACM International Conference on Multimedia, MM 2021, Virtual, Online, China, 20/10/2021. https://doi.org/10.1145/3474085.3476968

Multimodal Relation Extraction with Efficient Graph Alignment. / Zheng, Changmeng; Feng, Junhao; Fu, Ze et al.
MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2021. p. 5298-5306 (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

TY - CHAP

T1 - Multimodal Relation Extraction with Efficient Graph Alignment

AU - Zheng, Changmeng

AU - Feng, Junhao

AU - Fu, Ze

AU - Cai, Yi

AU - Li, Qing

AU - Wang, Tao

N1 - Funding Information: The work presented in this paper has been supported by Hong Kong Research Grants Council through a General Research Fund (project no. PolyU 11204919), National Natural Science Foundation of China (62076100), and Fundamental Research Funds for the Central Universities, SCUT (D2210010,D2200150,and D2201300), the Science and Technology Planning Project of Guangdong Province (2020B0101100002) Publisher Copyright: © 2021 ACM.

PY - 2021/10/17

Y1 - 2021/10/17

N2 - Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.

AB - Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.

KW - graph alignment

KW - multimodal dataset

KW - multimodal relation extraction

UR - http://www.scopus.com/inward/record.url?scp=85119353679&partnerID=8YFLogxK

U2 - 10.1145/3474085.3476968

DO - 10.1145/3474085.3476968

M3 - Conference paper

AN - SCOPUS:85119353679

T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

SP - 5298

EP - 5306

BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

T2 - 29th ACM International Conference on Multimedia, MM 2021

Y2 - 20 October 2021 through 24 October 2021

ER -

Multimodal Relation Extraction with Efficient Graph Alignment

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this