Text-to-Audio Models Efficiency Leaderboard

Model Name MelRTF E2E_RTF
AudioGen - 2.1924
AudioLDM 1.5393 1.5441
AudioLDM 2 2.9864 2.9883
Auffusion 1.4323 1.4452
MAGNeT - 0.2517
Make-An-Audio 0.4499 0.4568
Make-An-Audio2 0.2098 0.2163
Stable-Audio - 1.1652
AudioLDM 2 1.7732 1.7794
Tango 1.7725 1.7787

For diffusion-based models, a step number of 200 is used, which is a reasonable trade-off between speed and performance.