PubHealthTab: A Public Health Table-based Dataset for Evidence-based Fact Checking

Mubashara Akhtar; Oana Cocarascu; Elena Simperl

PubHealthTab: A Public Health Table-based Dataset for Evidence-based Fact Checking

Mubashara Akhtar, Oana Cocarascu, Elena Simperl

King's College London

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

8 Citations (Scopus)

Abstract

Inspired by human fact checkers, who use different types of evidence (e.g. tables, images, audio) in addition to text, several datasets with tabular evidence data have been released in recent years. Whilst the datasets encourage research on table fact-checking, they rely on information from restricted data sources, such as Wikipedia for creating claims and extracting evidence data, making the fact-checking process different from the real-world process used by fact checkers. In this paper, we introduce PubHealthTab, a table fact-checking dataset based on real-world public health claims and noisy evidence tables from sources similar to those used by real fact checkers. We outline our approach for collecting evidence data from various websites and present an in-depth analysis of our dataset. Finally, we evaluate state-of-theart table representation and pre-trained models fine-tuned on our dataset, achieving an overall F1 score of 0.73.

Original language	English
Title of host publication	Findings of the Association for Computational Linguistics
Subtitle of host publication	NAACL 2022 - Findings
Publisher	Association for Computational Linguistics (ACL)
Pages	1-16
Number of pages	16
ISBN (Electronic)	9781955917766
Publication status	Published - 2022
Event	2022 Findings of the Association for Computational Linguistics: NAACL 2022 - Seattle, United States Duration: 10 Jul 2022 → 15 Jul 2022

Publication series

Name	Findings of the Association for Computational Linguistics: NAACL 2022 - Findings

Conference

Conference	2022 Findings of the Association for Computational Linguistics: NAACL 2022
Country/Territory	United States
City	Seattle
Period	10/07/2022 → 15/07/2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Cite this

Akhtar, Mubashara ; Cocarascu, Oana ; Simperl, Elena. / PubHealthTab : A Public Health Table-based Dataset for Evidence-based Fact Checking. Findings of the Association for Computational Linguistics: NAACL 2022 - Findings. Association for Computational Linguistics (ACL), 2022. pp. 1-16 (Findings of the Association for Computational Linguistics: NAACL 2022 - Findings).

@inbook{8c2688a3541d4030b0ba1c002570e902,

title = "PubHealthTab: A Public Health Table-based Dataset for Evidence-based Fact Checking",

abstract = "Inspired by human fact checkers, who use different types of evidence (e.g. tables, images, audio) in addition to text, several datasets with tabular evidence data have been released in recent years. Whilst the datasets encourage research on table fact-checking, they rely on information from restricted data sources, such as Wikipedia for creating claims and extracting evidence data, making the fact-checking process different from the real-world process used by fact checkers. In this paper, we introduce PubHealthTab, a table fact-checking dataset based on real-world public health claims and noisy evidence tables from sources similar to those used by real fact checkers. We outline our approach for collecting evidence data from various websites and present an in-depth analysis of our dataset. Finally, we evaluate state-of-theart table representation and pre-trained models fine-tuned on our dataset, achieving an overall F1 score of 0.73.",

author = "Mubashara Akhtar and Oana Cocarascu and Elena Simperl",

note = "Funding Information: The authors acknowledge support from the Distributed AI (DAI) research group at King{\textquoteright}s College London for creating the dataset. Publisher Copyright: {\textcopyright} Findings of the Association for Computational Linguistics: NAACL 2022 - Findings.; 2022 Findings of the Association for Computational Linguistics: NAACL 2022 ; Conference date: 10-07-2022 Through 15-07-2022",

year = "2022",

language = "English",

series = "Findings of the Association for Computational Linguistics: NAACL 2022 - Findings",

publisher = "Association for Computational Linguistics (ACL)",

pages = "1--16",

booktitle = "Findings of the Association for Computational Linguistics",

}

Akhtar, M , Cocarascu, O & Simperl, E 2022, PubHealthTab: A Public Health Table-based Dataset for Evidence-based Fact Checking. in Findings of the Association for Computational Linguistics: NAACL 2022 - Findings. Findings of the Association for Computational Linguistics: NAACL 2022 - Findings, Association for Computational Linguistics (ACL), pp. 1-16, 2022 Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, United States, 10/07/2022.

PubHealthTab: A Public Health Table-based Dataset for Evidence-based Fact Checking. / Akhtar, Mubashara ; Cocarascu, Oana ; Simperl, Elena.
Findings of the Association for Computational Linguistics: NAACL 2022 - Findings. Association for Computational Linguistics (ACL), 2022. p. 1-16 (Findings of the Association for Computational Linguistics: NAACL 2022 - Findings).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

TY - CHAP

T1 - PubHealthTab

T2 - 2022 Findings of the Association for Computational Linguistics: NAACL 2022

AU - Akhtar, Mubashara

AU - Cocarascu, Oana

AU - Simperl, Elena

N1 - Funding Information: The authors acknowledge support from the Distributed AI (DAI) research group at King’s College London for creating the dataset. Publisher Copyright: © Findings of the Association for Computational Linguistics: NAACL 2022 - Findings.

PY - 2022

Y1 - 2022

N2 - Inspired by human fact checkers, who use different types of evidence (e.g. tables, images, audio) in addition to text, several datasets with tabular evidence data have been released in recent years. Whilst the datasets encourage research on table fact-checking, they rely on information from restricted data sources, such as Wikipedia for creating claims and extracting evidence data, making the fact-checking process different from the real-world process used by fact checkers. In this paper, we introduce PubHealthTab, a table fact-checking dataset based on real-world public health claims and noisy evidence tables from sources similar to those used by real fact checkers. We outline our approach for collecting evidence data from various websites and present an in-depth analysis of our dataset. Finally, we evaluate state-of-theart table representation and pre-trained models fine-tuned on our dataset, achieving an overall F1 score of 0.73.

AB - Inspired by human fact checkers, who use different types of evidence (e.g. tables, images, audio) in addition to text, several datasets with tabular evidence data have been released in recent years. Whilst the datasets encourage research on table fact-checking, they rely on information from restricted data sources, such as Wikipedia for creating claims and extracting evidence data, making the fact-checking process different from the real-world process used by fact checkers. In this paper, we introduce PubHealthTab, a table fact-checking dataset based on real-world public health claims and noisy evidence tables from sources similar to those used by real fact checkers. We outline our approach for collecting evidence data from various websites and present an in-depth analysis of our dataset. Finally, we evaluate state-of-theart table representation and pre-trained models fine-tuned on our dataset, achieving an overall F1 score of 0.73.

UR - http://www.scopus.com/inward/record.url?scp=85137370433&partnerID=8YFLogxK

M3 - Conference paper

AN - SCOPUS:85137370433

T3 - Findings of the Association for Computational Linguistics: NAACL 2022 - Findings

SP - 1

EP - 16

BT - Findings of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

Y2 - 10 July 2022 through 15 July 2022

ER -

PubHealthTab: A Public Health Table-based Dataset for Evidence-based Fact Checking

Abstract

Publication series

Conference

UN SDGs

Other files and links

Fingerprint

Cite this