The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers

Shae McFadden; Mark Kan; Lorenzo Cavallaro; Fabio Pierazzi

The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers

Shae McFadden, Mark Kan, Lorenzo Cavallaro, Fabio Pierazzi

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

81 Downloads (Pure)

Abstract

Can a poisoned machine learning (ML) model passively recover from its adversarial manipulation by retraining with new samples, and regain non-poisoned performance? And if passive recovery is possible, how can it be quantified? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain more over time? This paper proposes the evaluation of passive recovery from ``availability data poisoning'' using active learning in the context of Android malware detection. To quantify passive recovery, we propose two metrics: intercept to assess the speed of recovery, and recovery rate to quantify the stability of recovery. To investigate passive recovery, we conduct our experiments at different rates of active learning, in conjunction with varying strengths of availability data poisoning. We perform our evaluation on 259,230 applications from AndroZoo, using the Drebin feature representation, with linear SVM, DNN, and Random Forest as classifiers. Our findings show the convergence of the poisoned models to their respective hypothetical non-poisoned models. Therefore, demonstrating that through the use of active learning as a concept drift mitigation strategy, passive recovery is feasible across the three classifiers evaluated.

Original language	English
Title of host publication	Proceedings of the Annual Computer Security Applications Conference Workshops (ACSAC Workshops)
Place of Publication	Honolulu, Hawaii, USA
Publisher	IEEE
Number of pages	12
Edition	2024
Publication status	Accepted/In press - 20 Oct 2024

Keywords

Supervised learning
Malware Classification
Poisoning
Active Learning
Passive Recovery

Access to Document

Passive_Recovery_RPALAccepted author manuscript, 648 KBLicence: CC BY

1 Poster abstract

Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning
McFadden, S., Kan, M., Cavallaro, L. & Pierazzi, F., 15 Nov 2023, ACM SIGSAC Conference on Computer and Communications Security (CCS). 2023 ed. ACM, p. 3561 3563 p.
Research output: Chapter in Book/Report/Conference proceeding › Poster abstract › peer-review

Open Access
File
27 Downloads (Pure)

Cite this

@inbook{acb2372744f543c6a3eec906e98552c2,

title = "The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers",

abstract = "Can a poisoned machine learning (ML) model passively recover from its adversarial manipulation by retraining with new samples, and regain non-poisoned performance? And if passive recovery is possible, how can it be quantified? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain more over time? This paper proposes the evaluation of passive recovery from ``availability data poisoning'' using active learning in the context of Android malware detection. To quantify passive recovery, we propose two metrics: intercept to assess the speed of recovery, and recovery rate to quantify the stability of recovery. To investigate passive recovery, we conduct our experiments at different rates of active learning, in conjunction with varying strengths of availability data poisoning. We perform our evaluation on 259,230 applications from AndroZoo, using the Drebin feature representation, with linear SVM, DNN, and Random Forest as classifiers. Our findings show the convergence of the poisoned models to their respective hypothetical non-poisoned models. Therefore, demonstrating that through the use of active learning as a concept drift mitigation strategy, passive recovery is feasible across the three classifiers evaluated.",

keywords = "Supervised learning, Malware Classification, Poisoning, Active Learning, Passive Recovery",

author = "Shae McFadden and Mark Kan and Lorenzo Cavallaro and Fabio Pierazzi",

year = "2024",

month = oct,

day = "20",

language = "English",

booktitle = "Proceedings of the Annual Computer Security Applications Conference Workshops (ACSAC Workshops)",

publisher = "IEEE",

edition = "2024",

}

The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers. / McFadden, Shae ; Kan, Mark; Cavallaro, Lorenzo et al.
Proceedings of the Annual Computer Security Applications Conference Workshops (ACSAC Workshops). 2024. ed. Honolulu, Hawaii, USA: IEEE, 2024.

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

TY - CHAP

T1 - The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers

AU - McFadden, Shae

AU - Kan, Mark

AU - Cavallaro, Lorenzo

AU - Pierazzi, Fabio

PY - 2024/10/20

Y1 - 2024/10/20

N2 - Can a poisoned machine learning (ML) model passively recover from its adversarial manipulation by retraining with new samples, and regain non-poisoned performance? And if passive recovery is possible, how can it be quantified? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain more over time? This paper proposes the evaluation of passive recovery from ``availability data poisoning'' using active learning in the context of Android malware detection. To quantify passive recovery, we propose two metrics: intercept to assess the speed of recovery, and recovery rate to quantify the stability of recovery. To investigate passive recovery, we conduct our experiments at different rates of active learning, in conjunction with varying strengths of availability data poisoning. We perform our evaluation on 259,230 applications from AndroZoo, using the Drebin feature representation, with linear SVM, DNN, and Random Forest as classifiers. Our findings show the convergence of the poisoned models to their respective hypothetical non-poisoned models. Therefore, demonstrating that through the use of active learning as a concept drift mitigation strategy, passive recovery is feasible across the three classifiers evaluated.

AB - Can a poisoned machine learning (ML) model passively recover from its adversarial manipulation by retraining with new samples, and regain non-poisoned performance? And if passive recovery is possible, how can it be quantified? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain more over time? This paper proposes the evaluation of passive recovery from ``availability data poisoning'' using active learning in the context of Android malware detection. To quantify passive recovery, we propose two metrics: intercept to assess the speed of recovery, and recovery rate to quantify the stability of recovery. To investigate passive recovery, we conduct our experiments at different rates of active learning, in conjunction with varying strengths of availability data poisoning. We perform our evaluation on 259,230 applications from AndroZoo, using the Drebin feature representation, with linear SVM, DNN, and Random Forest as classifiers. Our findings show the convergence of the poisoned models to their respective hypothetical non-poisoned models. Therefore, demonstrating that through the use of active learning as a concept drift mitigation strategy, passive recovery is feasible across the three classifiers evaluated.

KW - Supervised learning

KW - Malware Classification

KW - Poisoning

KW - Active Learning

KW - Passive Recovery

M3 - Conference paper

BT - Proceedings of the Annual Computer Security Applications Conference Workshops (ACSAC Workshops)

PB - IEEE

CY - Honolulu, Hawaii, USA

ER -

The Impact of Active Learning on Availability Data Poisoning for Android Malware Classifiers

Abstract

Keywords

Access to Document

Fingerprint

Research output

Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning

Cite this