Ai art wiki is a free online encyclopedia, created and edited by volunteers. If you see wrong or missing information, feel free to edit it. Guidelines
|Official quote||(noun): riff + diffusion|
|License||Permissive by attribution|
|Description||Model trained on images of spectograms, which then converts them into audio files|
Riffusion is Stable Diffusion model fine-tuned on images of spectrograms paired with text. Audio processing happens downstream of the model. One of the most prominent features of this model is its ability to change composition over the course of the song. Riffusion is in early stages of development.
Spectrograms[edit | edit source]
Capabilities[edit | edit source]
- Img2img - can generate new spectograms on base of old ones to get other effects.
- Transition and development - done by generating samples between latent spaces of two prompts
Prompts[edit | edit source]
Lists of tried prompts.
Instruments/Sounds[edit | edit source]
Good: trumpet, typing, toilet flush, sax, saxophone, church bells, opera singer, dog barking
Questionable: male opera singer (occasionally turns into soprano lady)
Bad: Harmonica, Otamatone, Triangle, hand clicks (turns into maracas), hand claps
Performers[edit | edit source]
Generally, it seems like the more popular a music group is, the more the network is to produce something similar. If model doesn't know that artist, it will default to something between Jingle Punks and Kevin McLeod.
Good: Taylor Swift, Caravan Palace, Ariana Grande, Britney Spears
Questionable: Nicki Minaj
Bad: Ernst Buch, Mili, Mozart, Shostakovich
Genres[edit | edit source]
Good: gospel, jazz, epic, 90s house, uk garage (when denoising set to 1.0), bubblegum eurodance, reggaeton, lo-fi, synthpop
Questionable: swing, ballad
Bad: military march, anthem
Other[edit | edit source]
Bad: most musical theory terminology (legato, crescendo, etc.)
You can check the vocabulary file in repository to get an idea whether your prompt might be accepted or not (careful, it's 1 Mb text file). The lower the number, the more common this word is in dataset, the more likely it should be to produce a good result.