MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction

Jiazheng Li; Linyi Yang; Barry Smyth; Ruihai Dong

doi:10.1145/3340531.3412879

MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction

Jiazheng Li, Linyi Yang, Barry Smyth, Ruihai Dong

Informatics

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

24 Citations (Scopus)

Abstract

In the area of natural language processing, various financial datasets have informed recent research and analysis including financial news, financial reports, social media, and audio data from earnings calls. We introduce a new, large-scale multi-modal, text-audio paired, earnings-call dataset named MAEC, based on S&P 1500 companies. We describe the main features of MAEC, how it was collected and assembled, paying particular attention to the text-audio alignment process used. We present the approach used in this work as providing a suitable framework for processing similar forms of data in the future. The resulting dataset is more than six times larger than those currently available to the research community and we discuss its potential in terms of current and future research challenges and opportunities. All resources of this work are available at https://github.com/Earnings-Call-Dataset/

Original language	English
Title of host publication	Proceedings of the 29th ACM International Conference on Information & Knowledge Management
Place of Publication	New York, NY, USA
Publisher	Association for Computing Machinery
Pages	3063–3070
ISBN (Print)	9781450368599
DOIs	https://doi.org/10.1145/3340531.3412879
Publication status	Published - 2020

Publication series

Name	CIKM '20
Publisher	Association for Computing Machinery

Keywords

earnings conference calls
multimodal aligned datasets
financial risk prediction

Access to Document

10.1145/3340531.3412879

https://doi.org/10.1145/3340531.3412879

Cite this

@inbook{3f387f918da948cf894a14b582362b7f,

title = "MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction",

abstract = "In the area of natural language processing, various financial datasets have informed recent research and analysis including financial news, financial reports, social media, and audio data from earnings calls. We introduce a new, large-scale multi-modal, text-audio paired, earnings-call dataset named MAEC, based on S&P 1500 companies. We describe the main features of MAEC, how it was collected and assembled, paying particular attention to the text-audio alignment process used. We present the approach used in this work as providing a suitable framework for processing similar forms of data in the future. The resulting dataset is more than six times larger than those currently available to the research community and we discuss its potential in terms of current and future research challenges and opportunities. All resources of this work are available at https://github.com/Earnings-Call-Dataset/",

keywords = "earnings conference calls, multimodal aligned datasets, financial risk prediction",

author = "Jiazheng Li and Linyi Yang and Barry Smyth and Ruihai Dong",

year = "2020",

doi = "10.1145/3340531.3412879",

language = "English",

isbn = "9781450368599",

series = "CIKM '20",

publisher = "Association for Computing Machinery",

pages = "3063–3070",

booktitle = "Proceedings of the 29th ACM International Conference on Information & Knowledge Management",

}

MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction. / Li, Jiazheng; Yang, Linyi; Smyth, Barry et al.
Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York, NY, USA: Association for Computing Machinery, 2020. p. 3063–3070 (CIKM '20).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

TY - CHAP

T1 - MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction

AU - Li, Jiazheng

AU - Yang, Linyi

AU - Smyth, Barry

AU - Dong, Ruihai

PY - 2020

Y1 - 2020

N2 - In the area of natural language processing, various financial datasets have informed recent research and analysis including financial news, financial reports, social media, and audio data from earnings calls. We introduce a new, large-scale multi-modal, text-audio paired, earnings-call dataset named MAEC, based on S&P 1500 companies. We describe the main features of MAEC, how it was collected and assembled, paying particular attention to the text-audio alignment process used. We present the approach used in this work as providing a suitable framework for processing similar forms of data in the future. The resulting dataset is more than six times larger than those currently available to the research community and we discuss its potential in terms of current and future research challenges and opportunities. All resources of this work are available at https://github.com/Earnings-Call-Dataset/

AB - In the area of natural language processing, various financial datasets have informed recent research and analysis including financial news, financial reports, social media, and audio data from earnings calls. We introduce a new, large-scale multi-modal, text-audio paired, earnings-call dataset named MAEC, based on S&P 1500 companies. We describe the main features of MAEC, how it was collected and assembled, paying particular attention to the text-audio alignment process used. We present the approach used in this work as providing a suitable framework for processing similar forms of data in the future. The resulting dataset is more than six times larger than those currently available to the research community and we discuss its potential in terms of current and future research challenges and opportunities. All resources of this work are available at https://github.com/Earnings-Call-Dataset/

KW - earnings conference calls

KW - multimodal aligned datasets

KW - financial risk prediction

U2 - 10.1145/3340531.3412879

DO - 10.1145/3340531.3412879

M3 - Conference paper

SN - 9781450368599

T3 - CIKM '20

SP - 3063

EP - 3070

BT - Proceedings of the 29th ACM International Conference on Information & Knowledge Management

PB - Association for Computing Machinery

CY - New York, NY, USA

ER -

MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction

Abstract

Publication series

Keywords

Access to Document

Fingerprint

Cite this