Text-to-Audio Models Accuracy Leaderboard
System | Objective | Subjective (Crowd / Expert) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
CE | CU | PC | PQ | CLAP | MPC | MCE | MPQ | MAli | MCU | |
AudioGen | 2.89 | 4.54 | 3.18 | 5.33 | 0.39 | 3.54 / 2.88 | 3.18 / 1.93 | 4.82 / 4.35 | 5.08 / 5.40 | 3.64 / 3.20 |
AudioLDM | 3.27 | 5.10 | 3.23 | 5.82 | 0.44 | 3.11 / 2.88 | 3.34 / 1.77 | 5.25 / 3.44 | 5.52 / 4.51 | 3.94 / 3.14 |
AudioLDM 2 | 3.48 | 5.54 | 3.00 | 6.09 | 0.40 | 3.31 / 2.80 | 3.87 / 3.64 | 5.29 / 6.84 | 5.06 / 7.51 | 4.63 / 4.50 |
Auffusion | 3.32 | 5.11 | 3.23 | 5.72 | 0.45 | 3.62 / 2.90 | 4.25 / 3.71 | 5.56 / 6.76 | 5.61 / 7.59 | 4.94 / 4.57 |
MAGNeT | 2.89 | 4.26 | 3.61 | 5.13 | 0.39 | 3.03 / 2.89 | 2.86 / 2.20 | 4.06 / 4.30 | 4.37 / 5.70 | 2.85 / 3.22 |
Make-An-Audio | 3.28 | 5.33 | 3.08 | 5.78 | 0.38 | 3.55 / 3.05 | 4.28 / 2.51 | 5.47 / 5.77 | 5.27 / 6.83 | 4.46 / 3.89 |
Make-An-Audio 2 | 3.23 | 4.98 | 3.17 | 5.58 | 0.43 | 3.86 / 2.88 | 3.70 / 3.30 | 5.40 / 6.63 | 5.56 / 7.40 | 4.55 / 3.90 |
Stable Audio Open | 3.05 | 5.02 | 2.74 | 5.63 | 0.35 | 2.73 / 2.41 | 2.90 / 2.34 | 4.51 / 4.91 | 4.20 / 5.99 | 3.56 / 3.19 |
Tango | 3.27 | 5.15 | 3.39 | 5.96 | 0.44 | 4.20 / 3.24 | 4.72 / 3.35 | 6.00 / 6.49 | 5.81 / 6.81 | 5.20 / 4.45 |
Tango 2 | 3.47 | 5.20 | 3.84 | 5.89 | 0.46 | 4.14 / 3.15 | 4.73 / 3.35 | 6.01 / 6.63 | 5.94 / 7.59 | 5.21 / 4.77 |
This leaderboard is based on the TTA-Bench evaluation framework, comprehensively considering accuracy, generalization, efficiency, bias, fairness, toxicity, and robustness.