Text-to-Audio Models Efficiency Leaderboard
| Model Name | MelRTF | E2E_RTF |
|---|---|---|
| AudioGen | - | 2.1924 |
| AudioLDM | 1.5393 | 1.5441 |
| AudioLDM 2 | 2.9864 | 2.9883 |
| Auffusion | 1.4323 | 1.4452 |
| MAGNeT | - | 0.2517 |
| Make-An-Audio | 0.4499 | 0.4568 |
| Make-An-Audio2 | 0.2098 | 0.2163 |
| Stable-Audio | - | 1.1652 |
| AudioLDM 2 | 1.7732 | 1.7794 |
| Tango | 1.7725 | 1.7787 |
For diffusion-based models, a step number of 200 is used, which is a reasonable trade-off between speed and performance.