Evaluating Voice Conversion based Privacy Protection against Informed Attackers

Context

Distinguishable and Repeatable biometric references can be extracted from speech
Speech processing raises privacy threats
Anonymization ensures that original speaker cannot be linked to the published anonymized dataset
The anonymized data must be fit for use in downstream tasks

The speaker publishes the speech data after anonymization

The user tries to use the anonymized public data for downstream tasks

The attacker tries to discover the identity of the speaker in the public data

Logical speaker anonymization framework
- Convert source speaker’s voice to a target speaker
- Conversion may not be perfect, residual source speaker info might be present
- Choice of target speaker is critical for strength of anonymization
Allows data publication
First proposed in 2009 [2], many techniques proposed thereafter

VC methods are selected based on the following criteria:

VoiceMask: simple frequency warping based on composition of two functions $B$ and $Q$: $ f’ = B(Q(f, \alpha), \beta) $
Vocal Tract Length Normalization: learn transformation parameters between source and target class spectra
Disentangled Speech Representation: separate encoders for content (instance normalization) and speaker (average pooling) information

The target can be a single speaker from the pool or an average of many speakers.

All the data subsets are derived from LibriSpeech corpus.

Component	Dataset	Training
VoiceMask	None	No training required, ($\alpha$, $\beta$ are selected randomly from a predefined range)
Disentangled VC	LibriTTS 100h	(Content, Speaker) encoders are trained end-to-end to reconstruct the speech waveform
VTLN	LibriSpeech 460h	K-means clusters (8 centroids) and transformation paramters are learnt
Attackers		Anonymized training data to induce "knowledge" of anonymization
ASR for evaluation		End-to-End (CTC + Attention) ASR is trained

Reasonable privacy protection can be provided in Semi-Informed case

perm strategy outperforms the rest

Informed attacker shows speaker information can be discovered, but not realistic

Investigated VC algorithms for speaker anonymization
Target selection and attacker’s knowledge are critical for strength of anonymization
Simple methods can provide reasonable protection