Your photo
img/photo.jpg
img/photo.jpg
Yochai Yemini
I completed my Ph.D. in Electrical Engineering at Bar-Ilan University, advised by Prof. Sharon Gannot and Dr. Ethan Fetaya.
My research focuses on deep learning for audio-visual speech processing - spanning speech separation, enhancement, dereverberation, and generative synthesis. A central theme is leveraging visual cues such as lip movements to guide and improve acoustic models.
I have interned at OriginAI, working on audio-visual speech separation, and at Amazon Alexa, improving wake-word detection for non-English accents.
Publications
SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling
Diffusion-based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Scene-Agnostic Multi-Microphone Speech Dereverberation
GP-Tree: A Gaussian Process Classifier for Few-Shot Incremental Learning
A Composite DNN Architecture for Speech Enhancement
Single Microphone Speech Separation by Diffusion-Based HMM Estimation
Speech Enhancement Using a Multidimensional Mixture-Maximum Model