Paper: | SLP-P7.8 |
Session: | Audio-visual and Multimodal Processing |
Time: | Wednesday, May 17, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech and Spoken Language Processing: Multi-modal/multimedia processing (such as audio/visual, etc) |
Title: |
Detecting Replay Attacks in Audiovisual Identity Verification |
Authors: |
Hervé Bredin, ENST / TSI, France; Antonio Miguel, University of Zaragoza, Spain; Ian Witten, University of Waikato, New Zealand; Gérard Chollet, ENST / TSI, France |
Abstract: |
We describe an algorithm that detects a lack of correspondence between speech and lip motion by detecting and monitoring the degree of synchrony between live audio and visual signals. It is simple, effective, and computationally inexpensive; providing a useful degree of robustness against basic replay attacks and against speech or image forgeries. The method is based on a cross-correlation analysis between two streams of features, one from the audio signal and the other from the image sequence. We argue that such an algorithm forms an effective first barrier against several kinds of replay attack that would defeat existing verification systems based on standard multimodal fusion techniques. In order to provide an evaluation mechanism for the new technique we have augmented the protocols that accompany the BANCA multimedia corpus by defining new scenarios. We obtain 0% equal-error rate (EER) on the simplest scenario and 35% on a more challenging one. |