MAGE: A Coarse-to-Fine Speech Enhancer with Masked Audio Generator

The Hieu Pham^†*, Tan Dat Nguyen^‡*, Phuong Thanh Tran Nguyen^†, Joon Son Chung^‡, Duc Dung Nguyen^†

^† AITech Lab, Ho Chi Minh City University of Technology, VNUHCM
^‡ Korea Advanced Institute of Science and Technology (KAIST)
* These authors contributed equally to this work

Existing speech enhancement (SE) models face a trade-off: discriminative methods offer efficiency but generalize poorly, while powerful generative approaches are computationally expensive and difficult to train. To address this, we introduce MAGE, a lightweight mask-based generative audio enhancer model. Finetuned from the Qwen2.5-0.5B language model and built upon the BigCodec, MAGE has a small footprint of approximately 200 million parameters. MAGE employs a novel coarse-to-fine scarcity-aware masking strategy alongside an auxiliary corrector to improve learning efficiency and smooth the generation process, respectively. Extensive experiments on various enhancement tasks demonstrate that MAGE sets a new benchmark in enhancement performance while remaining significantly more efficient than previous methods.

Try Demo Paper Code

Live Demo

Test MAGE speech enhancement with your own audio. Record your voice or upload an audio file to see the enhancement in action.

Important: Audio must be in 16kHz sample rate for optimal results.

Note: This demo runs on shared infrastructure and may take a few moments to process. Please be patient.

Ready to record 00:00

Drop your audio file here

or click to browse

Supports: WAV, MP3, FLAC, M4A (will be converted to 16kHz WAV)

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Audio Generator

Live Demo

Your Recording

Drop your audio file here

Selected File

Processing your audio...

Enhancement Results

Original Audio

Enhanced by MAGE

Processing Error

Audio Samples Demo