Is Explanation All You Need? An Expert Survey on LLM-generated Explanations for Abusive Language Detection

Chiara Di Bonaventura*, Lucia Siciliani, Pierpaolo Basile, Albert Merono Penuela, Barbara McGillivray

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

92 Downloads (Pure)

Abstract

Explainable abusive language detection has proven to help both users and content moderators, and recent research has focused on prompting LLMs to generate explanations for why a specific text is hateful. Yet, understanding the alignment of these generated explanations with human expectations and judgements is far from being solved. In this paper, we design a before-and-after study recruiting AI experts to evaluate the usefulness and trustworthiness of LLM-generated explanations for abusive language detection tasks, investigating multiple LLMs and learning strategies. Our experiments show that expectations in terms of usefulness and trustworthiness of LLM-generated explanations are not met, as their ratings decrease by 47.78% and 64.32%, respectively, after treatment. Further, our results suggest caution in using LLMs for explanation generation of abusive language detection due to (i) their cultural bias, and (ii) difficulty in reliably evaluating them with empirical metrics. In light of our results, we provide three recommendations to use LLMs responsibly for explainable abusive language detection.
Original languageEnglish
Title of host publicationTenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Place of PublicationPisa, Italy
Publication statusAccepted/In press - 4 Dec 2024

Keywords

  • Large Language Models
  • large language models
  • Explanation
  • hate speech
  • evaluation
  • human evaluation

Fingerprint

Dive into the research topics of 'Is Explanation All You Need? An Expert Survey on LLM-generated Explanations for Abusive Language Detection'. Together they form a unique fingerprint.

Cite this