AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper
• 2308.05734 • Published
• 38
This checkpoint is the result of finetuning AudioLDM 2 Music (https://huggingface.co/cvssp/audioldm2-music) on the challenge dataset + MusicCaps (https://www.kaggle.com/datasets/googleai/musiccaps)
First, install the required packages:
pip install --upgrade diffusers transformers accelerate
from diffusers import AudioLDM2Pipeline
import torch
repo_id = "vtrungnhan9/audioldm2-music-zac2023"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "This music is instrumental. The tempo is medium with synthesiser arrangements, digital drums and electronic music. The music is upbeat, pulsating, youthful, buoyant, exciting, punchy, psychedelic and has propulsive beats with a dance groove. This music is Techno Pop/EDM."
neg_prompt = "bad quality"
audio = pipe(prompt, negative_prompt=neg_prompt, num_inference_steps=200, audio_length_in_s=10.0, guidance_scale=10).audios[0]
The resulting audio output can be saved as a .wav file:
import scipy
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
Or displayed in a Jupyter Notebook / Google Colab:
from IPython.display import Audio
Audio(audio, rate=16000)
[More Information Needed]
Please refer at https://github.com/declare-lab/tango/blob/master/train.py for training procedure
BibTeX:
@article{liu2023audioldm2,
title={"AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining"},
author={Haohe Liu and Qiao Tian and Yi Yuan and Xubo Liu and Xinhao Mei and Qiuqiang Kong and Yuping Wang and Wenwu Wang and Yuxuan Wang and Mark D. Plumbley},
journal={arXiv preprint arXiv:2308.05734},
year={2023}
}