Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning

Shae McFadden; Mark Kan; Lorenzo Cavallaro; Fabio Pierazzi

doi:10.1145/3576915.3624391

Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning

Shae McFadden, Mark Kan, Lorenzo Cavallaro, Fabio Pierazzi

University College London (UCL)

Research output: Chapter in Book/Report/Conference proceeding › Poster abstract › peer-review

25 Downloads (Pure)

Abstract

Intuitively, poisoned machine learning (ML) models may forget their adversarial manipulation via retraining. However, can we quantify the time required for model recovery? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain significantly more over time? This poster paper proposes RPAL, a new framework to answer these questions in the context of malware detection. To quantify recovery, we propose two new metrics: intercept, i.e., the first time in which the poisoned model's and vanilla model's performance intercept; recovery rate, i.e., the percentage of time after intercept that the poisoned model's performance is within a tolerance margin which approximates the vanilla model's performance. We conduct experiments on an Android malware dataset (2014-2016), with two feature abstractions based on Drebin and MaMaDroid, with uncertainty-sampling active learning (retraining), and label flipping (poisoning). We utilize the introduced parameter and metrics to demonstrate (i) how the active learning and poisoning rates impact recovery and (ii) that feature representation impacts recovery.

Original language	English
Title of host publication	ACM SIGSAC Conference on Computer and Communications Security (CCS)
Publisher	ACM
Pages	3561
Number of pages	3563
Edition	2023
ISBN (Electronic)	979-8-4007-0050-7
DOIs	https://doi.org/10.1145/3576915.3624391
Publication status	Published - 15 Nov 2023

Access to Document

10.1145/3576915.3624391

RPAL_Poster_23Accepted author manuscript, 634 KB

1 Conference paper

The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers
McFadden, S., Kan, M., Cavallaro, L. & Pierazzi, F., 20 Oct 2024, (Accepted/In press) Proceedings of the Annual Computer Security Applications Conference Workshops (ACSAC Workshops). 2024 ed. Honolulu, Hawaii, USA: IEEE, 12 p.
Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Open Access
File

Cite this

@inbook{f9dadc6bfe3e46cd8a70e1500e25a881,

title = "Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning",

abstract = "Intuitively, poisoned machine learning (ML) models may forget their adversarial manipulation via retraining. However, can we quantify the time required for model recovery? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain significantly more over time? This poster paper proposes RPAL, a new framework to answer these questions in the context of malware detection. To quantify recovery, we propose two new metrics: intercept, i.e., the first time in which the poisoned model's and vanilla model's performance intercept; recovery rate, i.e., the percentage of time after intercept that the poisoned model's performance is within a tolerance margin which approximates the vanilla model's performance. We conduct experiments on an Android malware dataset (2014-2016), with two feature abstractions based on Drebin and MaMaDroid, with uncertainty-sampling active learning (retraining), and label flipping (poisoning). We utilize the introduced parameter and metrics to demonstrate (i) how the active learning and poisoning rates impact recovery and (ii) that feature representation impacts recovery.",

author = "Shae McFadden and Mark Kan and Lorenzo Cavallaro and Fabio Pierazzi",

year = "2023",

month = nov,

day = "15",

doi = "10.1145/3576915.3624391",

language = "English",

pages = "3561",

booktitle = "ACM SIGSAC Conference on Computer and Communications Security (CCS)",

publisher = "ACM",

edition = "2023",

}

TY - CHAP

T1 - Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning

AU - McFadden, Shae

AU - Kan, Mark

AU - Cavallaro, Lorenzo

AU - Pierazzi, Fabio

PY - 2023/11/15

Y1 - 2023/11/15

N2 - Intuitively, poisoned machine learning (ML) models may forget their adversarial manipulation via retraining. However, can we quantify the time required for model recovery? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain significantly more over time? This poster paper proposes RPAL, a new framework to answer these questions in the context of malware detection. To quantify recovery, we propose two new metrics: intercept, i.e., the first time in which the poisoned model's and vanilla model's performance intercept; recovery rate, i.e., the percentage of time after intercept that the poisoned model's performance is within a tolerance margin which approximates the vanilla model's performance. We conduct experiments on an Android malware dataset (2014-2016), with two feature abstractions based on Drebin and MaMaDroid, with uncertainty-sampling active learning (retraining), and label flipping (poisoning). We utilize the introduced parameter and metrics to demonstrate (i) how the active learning and poisoning rates impact recovery and (ii) that feature representation impacts recovery.

AB - Intuitively, poisoned machine learning (ML) models may forget their adversarial manipulation via retraining. However, can we quantify the time required for model recovery? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain significantly more over time? This poster paper proposes RPAL, a new framework to answer these questions in the context of malware detection. To quantify recovery, we propose two new metrics: intercept, i.e., the first time in which the poisoned model's and vanilla model's performance intercept; recovery rate, i.e., the percentage of time after intercept that the poisoned model's performance is within a tolerance margin which approximates the vanilla model's performance. We conduct experiments on an Android malware dataset (2014-2016), with two feature abstractions based on Drebin and MaMaDroid, with uncertainty-sampling active learning (retraining), and label flipping (poisoning). We utilize the introduced parameter and metrics to demonstrate (i) how the active learning and poisoning rates impact recovery and (ii) that feature representation impacts recovery.

U2 - 10.1145/3576915.3624391

DO - 10.1145/3576915.3624391

M3 - Poster abstract

SP - 3561

BT - ACM SIGSAC Conference on Computer and Communications Security (CCS)

PB - ACM

ER -

Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning

Abstract

Access to Document

Fingerprint

Research output

The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers

Cite this