Google AI model to create music from text

Google MusicLM is a combination of models that converts texts to high-fidelity music

Google researchers have published a paper about an AI model which can generate high-fidelity music from text descriptions.

MusicLM structure combines MuLan + AudioLM and MuLan + w2b-Bert + Soundstream, says AI scientist Keunwoo Choi.

The three models cast “the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.”

MusicLM has outperformed previous versions both in audio quality and adherence to the text description. For future research, has released MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.

Google’s MusicLM has made available more than 5,000 music-text pairs available for people to experiment with its creation. The company isn’t planning to release the model to the public yet, but users can view and listen its generated music here.