Skip to content
Snippets Groups Projects

webApp_DoVoiceInteraction

The model_training folder contains the major code for training of the embedding model. Most of the code is inherited from the publicly available git repository at https://github.com/RF5/simple-speaker-embedding. The model is based on GRU model started with a convolutional encoder. The GRU model has three layers with 786 hidden units each and the model operates on raw waveform. The loss function is the G2E2 loss introduced by Li Wan etc, available at https://arxiv.org/abs/1710.10467. The training is based on the voxceleb1 dataset, where we split the dataset into train, validation and test set in 8:1:1 ratio. The model is stored in ‘convgru_ckpt_forvoxceleb1_strip.pt’ file for local reference and maybe updated in the future. The file ‘show_the_tag.py’ utilizes model to generate embeddings and calculate the speaker embedding similarity based on the cosine distance. To make the clustering/identification method an online manner, for each new recorded utterance, we calculate the cosine distance between the new speaker’s embedding and all existed speakers embedding and give it a new label if the similarity is above the threshold.

____