Your photo
img/photo.jpg

Yochai Yemini

I completed my Ph.D. in Electrical Engineering at Bar-Ilan University, advised by Prof. Sharon Gannot and Dr. Ethan Fetaya.

My research focuses on deep learning for audio-visual speech processing - spanning speech separation, enhancement, dereverberation, and generative synthesis. A central theme is leveraging visual cues such as lip movements to guide and improve acoustic models.

I have interned at OriginAI, working on audio-visual speech separation, and at Amazon Alexa, improving wake-word detection for non-English accents.

Publications

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

Preprint · Paper

Diffusion-based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior

ICASSP 2026 · Paper

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

ICLR 2024 · Paper

Scene-Agnostic Multi-Microphone Speech Dereverberation

Interspeech 2021 · Paper

GP-Tree: A Gaussian Process Classifier for Few-Shot Incremental Learning

ICML 2021 · Paper

A Composite DNN Architecture for Speech Enhancement

ICASSP 2020 · Paper

Single Microphone Speech Separation by Diffusion-Based HMM Estimation

EURASIP Journal on Audio, Speech, and Music Processing, 2016 · Paper

Speech Enhancement Using a Multidimensional Mixture-Maximum Model

IWAENC 2010 · Paper