Assessing and Measuring the Privacy Practices of Voice Assistant Applications

Jide Edu

Informatics

Student thesis: Doctoral Thesis › Doctor of Philosophy

Abstract

Smart Personal Voice Assistants (SPA) are fast becoming popular with the widespread introduction of desktop, phone and home assistants. Over a hundred million users now utilise SPA like Alexa, Siri, Google Assistant, Bixby and Cortana every day, and SPA devices have been sold in massive numbers. However, recent security and privacy incidents involving SPA like Alexa recording a private conversation and sending it to a random contact have increased users’ concerns about the security and privacy of these assistants. This thesis studies the security and privacy issues of SPA. In particular, the risks associated with the skills (voice applications) they leverage to extend and expand their functionality. Firstly, we present a classiﬁcation of SPA security and privacy issues and use it to systematically map current attacks and countermeasures to diﬀerent architectural elements. We show that those elements expose SPA to various risks, such as the complexity of their architecture, the AI features, the wide range of underlying technologies, and the open nature of the voice channel they use.

We then conduct a systematic study of SPA third-party skills as this is one of the architectural elements oﬀering a large attack surface. In particular, we study the permission model SPA providers oﬀer to developers and investigate how third-party skills use them to collect personal data. We further design a methodology that systematically identiﬁes potential privacy issues in the third-party skills by analysing the traceability between the permissions and the data practices stated by developers. In addition, we propose a highly accurate system to automate the traceability analysis at scale. Furthermore, we perform a longitudinal measurement study of the Amazon Alexa skills across the marketplaces for three years to demystify developers’ data practices and present an overview of the third-party skill ecosystem. Finally, we present an open tool that allows proactive audit of data collection practices in emerging technologies like SPA. The overall study resulted in two new datasets for smart assistants privacy assessment evaluation: the traceability-by-policy dataset (TBPD) and the permission-by-sentence dataset (PBSD). All these aim to contribute to the collective eﬀort towards establishing secure, privacy-aware assistants.

Date of Award	1 Oct 2022
Original language	English
Awarding Institution	King's College London
Supervisor	Jose Such (Supervisor) & Guillermo Suarez de Tangil Rotaeche (Supervisor)

Cite this

Documents

2022_Edu_Jide_1787928_ethesis
File: application/pdf, 3.68 MB
Type: Thesis