Speaker Anonymization -- Representation, Evaluation and Formal Guarantees

Abstract

Large-scale centralized storage of speech data poses severe privacy threats to the speakers. Indeed, the emergence and widespread usage of voice interfaces starting from telephone to mobile applications, and now digital assistants have enabled easier communication between the customers and the service providers. Massive speech data collection allows its users, for instance researchers, to develop tools for human convenience, like voice passwords for banking, personalized smart speakers, etc. However, centralized storage is vulnerable to cybersecurity threats which, when combined with advanced speech technologies like voice cloning, speaker recognition, and spoofing, may endow a malicious entity with the capability to re-identify speakers and breach their privacy by gaining access to their sensitive biometric characteristics, emotional states, personality attributes, pathological conditions, etc. Individuals and the members of civil society worldwide, and especially in Europe, are getting aware of this threat. With firm backing by the GDPR, several initiatives are being launched, including the publication of white papers and guidelines, to spread mass awareness and to regulate voice data so that the citizens’ privacy is protected.

This thesis is a timely effort to bolster such initiatives and propose solutions to remove the biometric identity of speakers from speech signals, thereby rendering them useless for re-identifying the speakers who spoke them. Besides the goal of protecting the speaker’s identity from malicious access, this thesis aims to explore the solutions which do so without degrading the usefulness of speech. We present several anonymization schemes based on voice conversion methods to achieve this two-fold objective. The output of such schemes is a high-quality speech signal that is usable for publication and a variety of downstream tasks. All the schemes are subjected to a rigorous evaluation protocol which is one of the major contributions of this thesis. This protocol led to the finding that the previous approaches do not effectively protect the privacy and thereby directly inspired the VoicePrivacy initiative which is an effort to gather individuals, industry, and the scientific community to participate in building a robust anonymization scheme. We introduce a range of anonymization schemes under the purview of the VoicePrivacy initiative and empirically prove their superiority in terms of privacy protection and utility. Finally, we endeavor to remove the residual speaker identity from the anonymized speech signal using the techniques inspired by differential privacy. Such techniques provide provable analytical guarantees to the proposed anonymization schemes and open up promising perspectives for future research.

In practice, the tools developed in this thesis are an essential component to build trust in any software ecosystem where voice data is stored, transmitted, processed, or published. They aim to help the organizations to comply with the rules mandated by civil governments and give a choice to individuals who wish to exercise their right to privacy.

Date
Dec 2, 2021 10:30 AM — 1:00 PM
Event
PhD thesis defense
Location
Inria Lille Nord Europe (Amphi Bat. B)
Avatar
Brij Mohan Lal Srivastava
Co-founder and CEO of Nijta

I am building a privacy-enabled voice analytics platform.