DPDFNet

DPDFNet is a family of causal, single‑channel speech enhancement models for real‑time noise suppression.
It builds on DeepFilterNet2 by adding Dual‑Path RNN (DPRNN) blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly.

Links


What’s in this repo

  • TFLite: *.tflite (root)
  • ONNX: onnx/*.onnx
  • PyTorch checkpoints: checkpoints/*.pth

Model variants

16 kHz models

Model DPRNN blocks Params (M) MACs (G)
baseline 0 2.31 0.36
dpdfnet2 2 2.49 1.35
dpdfnet4 4 2.84 2.36
dpdfnet8 8 3.54 4.37

48 kHz fullband model

Model DPRNN blocks Params (M) MACs (G)
dpdfnet2_48khz_hr 2 2.58 2.42
dpdfnet8_48khz_hr 8 3.63 7.17

Recommended inference (CPU-only, ONNX)

pip install dpdfnet

CLI

# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4

# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2

# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4

# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force

Python API

import soundfile as sf
import dpdfnet

# In-memory enhancement:
audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
sf.write("enhanced.wav", enhanced, sr)

# Enhance one file:
out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
print(out_path)

# Model listing:
for row in dpdfnet.available_models():
    print(row["name"], row["ready"], row["cached"])

# Download models:
dpdfnet.download()				# All models
dpdfnet.download("dpdfnet4")	# Specific model

Real-time Microphone Enhancement

Install sounddevice (not included in dpdfnet dependencies):

pip install sounddevice

StreamEnhancer processes audio chunk-by-chunk, preserving RNN state across calls. Any chunk size works; enhanced samples are returned as soon as enough data has accumulated for the first model frame (20 ms).

import numpy as np
import sounddevice as sd
import dpdfnet

INPUT_SR   = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010)   # 480 samples at 48 kHz

enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")

def callback(indata, outdata, frames, time, status):
    mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
    enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
    n = min(len(enhanced), frames)
    outdata[:n, 0] = enhanced[:n]
    if n < frames:
        outdata[n:] = 0.0   # silence while the first window accumulates

with sd.Stream(
    samplerate=INPUT_SR,
    blocksize=BLOCK_SIZE,
    channels=1,
    dtype="float32",
    callback=callback,
):
    print("Enhancing microphone input - press Ctrl+C to stop")
    try:
        while True:
            sd.sleep(100)
    except KeyboardInterrupt:
        pass

# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()

Latency
The first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay.

Sample rate
StreamEnhancer resamples internally. Pass your device's native rate as sample_rate; the return value is at the same rate.

Block size
Using BLOCK_SIZE = int(SR * 0.010) (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills.

Multiple streams
Create a separate StreamEnhancer per stream. Call enhancer.reset() between independent audio segments to clear RNN state.


Citation

@article{rika2025dpdfnet,
  title  = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
  author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
  year   = {2025}
}

License

Apache-2.0

Downloads last month
204
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ceva-IP/DPDFNet