Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
Abstract
:1. Introduction
2. Literature Review
3. The Proposed Model
3.1. Pre-Processing
Algorithm 1: Pseudocode of AMF |
(1) Consider “A” input matrix with N columns and M rows. (2) Create a matrix with N + 2 columns and M + 2 rows by adding a zero to the side of the input matrix (3) Take a mask of size 3 × 3. (4) Place the mask on the initial component, viz., the first column and row of matrix “A”. (5) Select each element listed with the mask and arrange them in ascending sequence. (6) Take the median value (center component) from the sorted array and substitute the component A(1, 1) with the median values (7) Slide the mask to the following component. (8) Reiterate the 4 to 7 steps until each matrix “A” element is substituted with the respective median values. |
3.2. MFCC and Spectrograms
3.3. Inception v3-Based Feature Extraction
3.4. ALO-Based Hyperparameter Tuning
3.5. LSTM-RNN-Based Verification Process
4. Results and Discussion
Evaluation Metrics
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Machado, T.J.; Vieira Filho, J.; de Oliveira, M.A. Forensic speaker verification using ordinary least squares. Sensors 2019, 19, 4385. [Google Scholar] [CrossRef]
- Wang, Z.; ** in the presence of noise and reverberation conditions. IEEE Access 2017, 5, 15400–15413. [Google Scholar] [CrossRef]
- Huang, S.; Dang, H.; Jiang, R.; Hao, Y.; Xue, C.; Gu, W. Multilayer Hybrid Fuzzy Classification Based on SVM and Improved PSO for Speech Emotion Recognition. Electronics 2021, 10, 2891. [Google Scholar] [CrossRef]
- Swain, M.; Maji, B.; Kabisatpathy, P.; Routray, A. A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell. Syst. 2022, 8, 4237–4249. [Google Scholar] [CrossRef]
- Mardhotillah, R.; Dirgantoro, B.; Setianingsih, C. Speaker Recognition for Digital Forensic Audio Analysis using Support Vector Machine. In Proceedings of the 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 10–11 December 2020; pp. 514–519. [Google Scholar]
- Saleem, S.; Subhan, F.; Naseer, N.; Bais, A.; Imtiaz, A. Forensic speaker recognition: A new method based on extracting accent and language information from short utterances. Forensic Sci. Int. Digit. Investig. 2020, 34, 300982. [Google Scholar] [CrossRef]
- Khan, F.; Tarimer, I.; Alwageed, H.S.; Karadağ, B.C.; Fayaz, M.; Abdusalomov, A.B.; Cho, Y.-I. Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics 2022, 11, 3518. [Google Scholar] [CrossRef]
- Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5329–5333. [Google Scholar] [CrossRef]
- NIST. Speaker Recognition Evaluation 2016. Available online: https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016/ (accessed on 30 July 2020).
- Devi, K.J.; Singh, N.H.; Thongam, K. Automatic speaker recognition from speech signals using self-organizing feature map and hybrid neural network. Microprocess. Microsyst. 2020, 79, 103264. [Google Scholar] [CrossRef]
- Teixeira, F.; Abad, A.; Raj, B.; Trancoso, I. Towards End-to-End Private Automatic Speaker Recognition. ar**v 2022, ar**v:2206.11750. [Google Scholar]
- Gao, H.; Hu, M.; Gao, T.; Cheng, R. Robust detection of median filtering based on combined features of the difference image. Signal Process. Image Commun. 2019, 72, 126–133. [Google Scholar] [CrossRef]
- Ma, Z.; Fokoué, E. Accent Recognition for Noisy Audio Signals. Serdica J. Comput. 2014, 8, 169–182. [Google Scholar] [CrossRef]
- Wang, C.; Chen, D.; Hao, L.; Liu, X.; Zeng, Y.; Chen, J.; Zhang, G. Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 2019, 7, 146533–146541. [Google Scholar] [CrossRef]
- Dong, H.; Xu, Y.; Li, X.; Yang, Z.; Zou, C. An improved ant-lion optimizer with a dynamic random walk and dynamic opposite learning. Knowl.-Based Syst. 2021, 216, 106752. [Google Scholar] [CrossRef]
- Zhang, Y.; **ong, R.; He, H.; Pecht, M.G. Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE Trans. Veh. Technol. 2018, 67, 5695–5705. [Google Scholar] [CrossRef]
Label | No. of Speakers | No. of Samples |
---|---|---|
C-1 | Speaker-1 | 80 |
C-2 | Speaker-2 | 80 |
C-3 | Speaker-3 | 80 |
C-4 | Speaker-4 | 80 |
C-5 | Speaker-5 | 80 |
Total Number of Samples | 400 |
Labels | Accuracy | Error Rate | Precision | Recall | F-Score | G-Measure |
---|---|---|---|---|---|---|
Training Phase (80%) | ||||||
C-1 | 94.69 | 05.31 | 88.06 | 86.76 | 87.41 | 87.41 |
C-2 | 96.25 | 03.75 | 90.28 | 92.86 | 91.55 | 91.56 |
C-3 | 95.31 | 04.69 | 94.12 | 80.00 | 86.49 | 86.77 |
C-4 | 95.31 | 04.69 | 84.13 | 91.38 | 87.60 | 87.68 |
C-5 | 94.06 | 05.94 | 83.58 | 87.50 | 85.50 | 85.52 |
Average | 95.12 | 04.87 | 88.03 | 87.70 | 87.71 | 87.79 |
Testing Phase (20%) | ||||||
C-1 | 98.75 | 01.25 | 92.31 | 100.00 | 96.00 | 96.08 |
C-2 | 100.00 | 00.00 | 100.00 | 100.00 | 100.00 | 100.00 |
C-3 | 100.00 | 00.00 | 100.00 | 100.00 | 100.00 | 100.00 |
C-4 | 100.00 | 00.00 | 100.00 | 100.00 | 100.00 | 100.00 |
C-5 | 98.75 | 01.25 | 100.00 | 93.75 | 96.77 | 96.82 |
Average | 99.50 | 00.50 | 98.46 | 98.75 | 98.55 | 98.58 |
Labels | Accuracy | Error Rate | Precision | Recall | F-Score | G-Measure |
---|---|---|---|---|---|---|
Training Phase (70%) | ||||||
C-1 | 94.64 | 05.36 | 85.19 | 86.79 | 85.98 | 85.99 |
C-2 | 93.57 | 06.43 | 86.27 | 80.00 | 83.02 | 83.08 |
C-3 | 96.43 | 03.57 | 93.48 | 86.00 | 89.58 | 89.66 |
C-4 | 96.79 | 03.21 | 88.73 | 98.44 | 93.33 | 93.46 |
C-5 | 95.71 | 04.29 | 89.66 | 89.66 | 89.66 | 89.66 |
Average | 95.43 | 04.57 | 88.67 | 88.18 | 88.31 | 88.37 |
Training Phase (30%) | ||||||
C-1 | 95.83 | 04.17 | 86.67 | 96.30 | 91.23 | 91.35 |
C-2 | 95.00 | 05.00 | 88.00 | 88.00 | 88.00 | 88.00 |
C-3 | 95.83 | 04.17 | 93.10 | 90.00 | 91.53 | 91.54 |
C-4 | 98.33 | 01.67 | 88.89 | 100.00 | 94.12 | 94.28 |
C-5 | 95.00 | 05.00 | 94.44 | 77.27 | 85.00 | 85.43 |
Average | 96.00 | 04.00 | 90.22 | 90.31 | 89.97 | 90.12 |
Methods | Accuracy | Error Rate |
---|---|---|
TTFEM-AFSV | 99.50 | 0.50 |
MFCC-SOFM-MLP-GD | 96.92 | 3.08 |
MFCC-SOFM-MLP-GDM | 97.05 | 2.95 |
MFCC-SOFM-MLP-BR | 97.62 | 2.38 |
MFCC-FW | 97.32 | 2.68 |
DWT-MFCC | 98.87 | 1.13 |
FUSION | 97.81 | 2.19 |
Methods | Noise 10% | Noise 30% | Noise 50% | Noise 70% | Noise 90% |
---|---|---|---|---|---|
MFCC-SOFM-MLP-GD | 81.342 | 78.543 | 75.536 | 73.625 | 71.8727 |
MFCC-SOFM-MLP-GDM | 58.425 | 56.837 | 53.625 | 54.938 | 52.633 |
MFCC-SOFM-MLP-BR | 66.526 | 67.425 | 65.827 | 63.672 | 61.938 |
MFCC-FW | 51.324 | 48.928 | 47.423 | 48.533 | 45.746 |
TTFEM-AFSV | 31.543 | 27.983 | 29.326 | 26.543 | 23.655 |
No of Speakers | Unfiltered Data without AMF Filtering | Unfiltered Data with AMF Filtering |
---|---|---|
Speaker 1 | 76.24 | 93.42 |
Speaker 2 | 76.32 | 93.61 |
Speaker 3 | 76.19 | 93.59 |
Speaker 4 | 76.58 | 94.21 |
Speaker 5 | 77.02 | 94.36 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gaurav; Bhardwaj, S.; Agarwal, R. Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model. Electronics 2023, 12, 2342. https://doi.org/10.3390/electronics12102342
Gaurav, Bhardwaj S, Agarwal R. Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model. Electronics. 2023; 12(10):2342. https://doi.org/10.3390/electronics12102342
Chicago/Turabian StyleGaurav, Saurabh Bhardwaj, and Ravinder Agarwal. 2023. "Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model" Electronics 12, no. 10: 2342. https://doi.org/10.3390/electronics12102342