Order statistics for voice activity detection in voip

More Info
expand_more

Abstract

Realtime voice communication over the Internet has rapidly gained popularity. It is indeed essential to reduce the total bandwidth consumption to efficiently use the available bandwidth for the subscribers having low speed connectivity and even otherwise. In this paper we introduce a novel technique to identify the voice and silent regions of a speech stream that is very much suitable for VoIP calls. We use an entropy measure, which is based on the spacings of order statistics of speech frames to differentiate the silence zones from the speech zones. We developed an algorithm that uses an adaptive thresholding to minimize the misdetection. The performance of our approach is compared with the built-in VAD of AMR codec. Our approach yields comparatively better saving in bandwidth yet maintaining a good quality of the speech streams. Further, the proposed approach has improved voice detection compared to the AMR schemes under noisy conditions. The ideas presented in this paper has been identified novel during the WIPO international patent search.