Stability AI has introduced a new iteration of Stable Audio, featuring an expanded set of functions for creating sound clips.
Our new model takes AI music generation to the next level by letting you generate high-quality audio up to 3 minutes. Everyone can generate audio up to 3 minutes in length, including our free plan users! https://t.co/0xqQcrHLbwhttps://t.co/MlV0jiULEF
— Stable Audio (@stableaudio) April 3, 2024
The first-generation model could generate audio files up to 90 seconds long. Stable Audio 2.0 creates tracks twice as long with more customization options.
The previous version used only text as prompts, whereas the new model can use sound clips as references. The AI can adjust the style of the generated audio, providing more accurate results.
Stability AI representatives claim the model can create “structured compositions, including introduction, development, and outro.” Another improvement over the previous generation is the ability to create sound effects.
Stable Audio is based on a diffusion model. It differs from other AI algorithms in its training method: during testing, the model receives a collection of sound clips with errors and is tasked with restoring the original sound.
The new version uses a specialized implementation of a technology known as a latent diffusion model. Like other neural networks, such models are trained on a dataset similar to the files they will process during generation. However, before training begins, the dataset is transformed into a mathematical structure, making the AI development process more efficient.
This transformed dataset is called latent space and contains only the most important details. Less significant ones are removed, reducing the overall volume of information the AI models need to process during training. This allows for a reduction in equipment and costs.
Stability AI engineers have also added a new neural network based on the Transformer architecture, developed by Google in 2017. It is primarily used for building language models. The Transformer considers a large amount of contextual information when interpreting data, allowing it to achieve highly accurate results.
“The combination of these two elements leads to the creation of a model capable of recognizing and reproducing large-scale structures necessary for creating high-quality musical compositions,” states a press release from Stability AI.
Stable Audio 2.0 is available to users for free, and the API will allow other companies to integrate the AI model into their applications.
Earlier, Adobe unveiled Project Music GenAI Control, which helps people create and edit music without professional experience.
Back in February, Stability AI announced the third generation of Stable Diffusion.
