Optimized Speaker Diarization System using Discrete Wavelet Transform and Pyknogram

Main Article Content

Sukhvinder Kaur, J. S. Sohal

Abstract

The aim of this paper is to present an optimized speaker diarization system that efficiently detects speaker change points in multispeaker speech data. Speaker diarization is the process to detect speaker turns and group together segments uttered by the same speaker. It can be used in speaker recognition, audio information retrieval, audio transcription, audio clustering, indexing and captioning of TV shows and movies. In this proposed technique, the daubechies 40-wavelet transform is used to compress the audio stream in the ratio of 1:4; their features are extracted by enhanced spectrogram called pyknogram based on Teaser Kaiser Energy Operator (TKEO). This method relies on resonances (formants) and harmonic structure of speech which are enhanced by decomposing the spectral sub-bands into amplitude and frequency components. The weighted average of the instantaneous frequency components are used to derive a short-time estimate value for the dominant frequency in each subband over a fixed period of time 0.12msec. Sudden changes in the dominant frequency correspond to the speaker change point and are detected by using traditional delta Bayesian Information Criteria (?BIC). This technique do not uses voice activity detection process (VAD). For re-segmentation, Information Change Rate (ICR) is used. Finally, hierarchical clustering algorithm make groups of homogeneous segments and are plotted by Dendrogram function in Matlab. The results are evaluated by F-measure and diarization error rate. It shows that the proposed method gives fast and better results as compared to traditional method with Mel frequency cepstral coefficient (MFCC) and Bayesian Information Criteria (BIC) algorithms.

Article Details

How to Cite
, S. K. J. S. S. (2017). Optimized Speaker Diarization System using Discrete Wavelet Transform and Pyknogram. International Journal on Future Revolution in Computer Science &Amp; Communication Engineering, 3(9), 52–58. Retrieved from http://www.ijfrcsce.org/index.php/ijfrcsce/article/view/221
Section
Articles