- From: Deyan Ginev <deyan.ginev@gmail.com>
- Date: Tue, 9 Apr 2024 12:51:00 -0400
- To: Patrick Ion <pion@umich.edu>
- Cc: Stephen Watt <smwatt@gmail.com>, www-math@w3.org
- Message-ID: <CANjPgh8tf62BQVCZj9cFzm3vNosBz1kasF5MEGOYkwKj39_FeA@mail.gmail.com>
Hi Patrick, all, Transformers are ubiquitous nowadays, I expect Suno AI is exclusively using that architecture. The key is how they've organized their training data and how they have separated the aspects between models/inputs. For example, when using the app it is clear that the musical genre is a separate input from the lyrics. And that you can auto-generate lyrics from a short English description using a separate model than the audio-generation model. I see that suno has an open source model, called bark, accessible here: https://github.com/suno-ai/bark Quoting its readme: "It follows a GPT style architecture similar to AudioLM and Vall-E and a quantized Audio representation from EnCodec. It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script." So it appears that they use a single unified model (I assume billions of parameters, based on the observed audio quality and coherence). Some key steps for that model (which is likely a weaker version than their v3 production model) - They use an LLM transformer to embed the input lyrics into latent space, hence getting all kinds of useful context (sentiment, long-term verse structure, etc) - Then they use an approach similar to AudioLM for "speech continuations" by mapping "the input audio to a sequence of discrete tokens" and again leveraging a transformer to learn in-context relationships. - Here is a guess from me: They likely have a huge collection of pure raw audio tracks for each genre in their training data, as a starting point before applying the lyrics. But they also likely use that as training data, so that the audio-transformer can interpolate adjacent variations to any given style example. You can find the AudioLM paper at: https://arxiv.org/abs/2209.03143 A key sentence from the abstract is also about data scale: "By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts." I have no practical experience with the audio modality, so take this as an "educated guess" from an math NLP practitioner. --- P.S. @Stephen: Eurovision is likely out of the question, but Suno AI has its own internal ranking which has all kinds of strange curiosities. I can't link to that however, it is the "Explore" tab in the app. Greetings, Deyan On Tue, Apr 9, 2024 at 12:31 PM Patrick Ion <pion@umich.edu> wrote: > Amazing! All sorts of things are being eclipsed. > > A question, Deyan, is whether you have any good idea how this > is possible? In particular, can you recommend any discussions > of the process going from input chat to starting up generation > from a pre-trained LLM? > > I just found 3Blue1Brown's recent course on AI, in particular > transformer technology, very interesting. > > Patrick > > > On Tue, Apr 9, 2024 at 12:08 PM Stephen Watt <smwatt@gmail.com> wrote: > >> Wow! Fantastic! Is it too late to enter Eurovision? >> >> On Tue, Apr 9, 2024 at 10:41 AM Deyan Ginev <deyan.ginev@gmail.com> >> wrote: >> >>> Hi everyone, >>> >>> I wanted to share a musical curiosity with readers here, purely for >>> entertainment. >>> >>> There is a new startup called "Suno AI" and based in Cambridge, MA, >>> which is innovating on the text-to-music generation front. That is now >>> encompassing all production aspects (lyrics, voice, instrumental). >>> >>> Impressively, they can work on any text as input, even spec text, and >>> have most music styles available. So it's a fun toy... >>> >>> Without further delay, here is an AI-generated song, using the start of >>> the MathML spec text as the input. I only rearranged the lyrics a little. >>> To showcase the tool better, here is the same input in 3 different styles >>> (they're about 1-2 minutes long, take 30 seconds to generate). >>> >>> style 1: >>> https://app.suno.ai/song/e473ab5d-6656-4efa-8aa3-8a3be1981d3c/ >>> >>> style 2: >>> https://app.suno.ai/song/7da4ffc3-aa2b-4505-9990-a30b844594e9 >>> >>> style 3: >>> https://app.suno.ai/song/4a68178f-eed9-43a5-a849-7d35c55e2669/ >>> >>> Enjoy, >>> Deyan >>> >>
Received on Tuesday, 9 April 2024 16:51:32 UTC