Text-to-Audio Models Efficiency Leaderboard
Model Name | MelRTF | E2E_RTF |
---|---|---|
AudioGen | - | 2.1924 |
AudioLDM | 1.5393 | 1.5441 |
AudioLDM 2 | 2.9864 | 2.9883 |
Auffusion | 1.4323 | 1.4452 |
MAGNeT | - | 0.2517 |
Make-An-Audio | 0.4499 | 0.4568 |
Make-An-Audio2 | 0.2098 | 0.2163 |
Stable-Audio | - | 1.1652 |
AudioLDM 2 | 1.7732 | 1.7794 |
Tango | 1.7725 | 1.7787 |
For diffusion-based models, a step number of 200 is used, which is a reasonable trade-off between speed and performance.