Text-to-Audio Models Information
| Model | Basic Info | Model Configuration | Training Data | ||||
|---|---|---|---|---|---|---|---|
| Org. | License | Var. | Params | Arch. | Source | Dur. | |
| AudioGen | Meta | CBN4 | med | 1.5B | AR | AS, AC + 8 oth. | 6824 |
| AudioLDM | Surrey | CBNS4 | full | 739M | LDM | AS, AC + 2 oth. | 9031 |
| AudioLDM 2 | Surrey | CBNS4 | large | 712M | LDM | AC, AS + 3 oth. | 29510 |
| Auffusion | BUPT | CBNS4 | full | 1.1B | LDM | AC, AS + 9 oth. | 1990 |
| MAGNeT | Meta | CBN4 | med | 1.5B | NAR | licensed data | 16000 |
| Make-An-Audio | ZJU | MIT | — | 453M | LDM | AS, AC + 13 oth. | ~3k |
| Make-An-Audio 2 | ZJU | MIT | — | 937M | LDM | AS, AC + 10 oth. | 3700 |
| Stable-Audio Open | Stability AI | Comm. | 1.0 | 1057M | DiT | FS, FMA | 7300 |
| Tango | DeClaRe | CBNS4 | full | 866M | LDM | AS, AC + 7 oth. | 1.2M |
| Tango 2 | DeClaRe | CBNS4 | full | 866M | LDM | AL | - |