TY - CHAP
T1 - Multimodal Relation Extraction with Efficient Graph Alignment
AU - Zheng, Changmeng
AU - Feng, Junhao
AU - Fu, Ze
AU - Cai, Yi
AU - Li, Qing
AU - Wang, Tao
N1 - Funding Information:
The work presented in this paper has been supported by Hong Kong Research Grants Council through a General Research Fund (project no. PolyU 11204919), National Natural Science Foundation of China (62076100), and Fundamental Research Funds for the Central Universities, SCUT (D2210010,D2200150,and D2201300), the Science and Technology Planning Project of Guangdong Province (2020B0101100002)
Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.
AB - Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.
KW - graph alignment
KW - multimodal dataset
KW - multimodal relation extraction
UR - http://www.scopus.com/inward/record.url?scp=85119353679&partnerID=8YFLogxK
U2 - 10.1145/3474085.3476968
DO - 10.1145/3474085.3476968
M3 - Conference paper
AN - SCOPUS:85119353679
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 5298
EP - 5306
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 29th ACM International Conference on Multimedia, MM 2021
Y2 - 20 October 2021 through 24 October 2021
ER -