Existing speech enhancement (SE) models face a trade-off: discriminative methods offer efficiency but generalize poorly, while powerful generative approaches are computationally expensive and difficult to train. To address this, we introduce MAGE, a lightweight mask-based generative audio enhancer model. Finetuned from the Qwen2.5-0.5B language model and built upon the BigCodec, MAGE has a small footprint of approximately 200 million parameters. MAGE employs a novel coarse-to-fine scarcity-aware masking strategy alongside an auxiliary corrector to improve learning efficiency and smooth the generation process, respectively. Extensive experiments on various enhancement tasks demonstrate that MAGE sets a new benchmark in enhancement performance while remaining significantly more efficient than previous methods.
Test MAGE speech enhancement with your own audio. Record your voice or upload an audio file to see the enhancement in action.
or click to browse
Supports: WAV, MP3, FLAC, M4A (will be converted to 16kHz WAV)Compare audio quality across different models. Click on any audio player to show/hide its spectrogram visualization.