Ai art wiki is a free online encyclopedia, created and edited by volunteers. If you see wrong or missing information, feel free to edit it. Guidelines


From AI art wiki
Jump to navigation Jump to search
type txt2audio
Rock and roll electric guitar solo.png
Official quote (noun): riff + diffusion
License Permissive by attribution
Description Model trained on images of spectograms, which then converts them into audio files
Important tags

Riffusion is Stable Diffusion model fine-tuned on images of spectrograms paired with text. Audio processing happens downstream of the model. One of the most prominent features of this model is its ability to change composition over the course of the song. Riffusion is in early stages of development.


Spectrograms[edit | edit source]

Capabilities[edit | edit source]

  • Img2img - can generate new spectograms on base of old ones to get other effects.
  • Looping
  • Transition and development - done by generating samples between latent spaces of two prompts

Prompts[edit | edit source]

Lists of tried prompts.

Instruments/Sounds[edit | edit source]

Good: trumpet, typing, toilet flush, sax, saxophone, church bells, opera singer, dog barking

Questionable: male opera singer (occasionally turns into soprano lady)

Bad: Harmonica, Otamatone, Triangle, hand clicks (turns into maracas), hand claps

Performers[edit | edit source]

Generally, it seems like the more popular a music group is, the more the network is to produce something similar. If model doesn't know that artist, it will default to something between Jingle Punks and Kevin McLeod.

Good: Taylor Swift, Caravan Palace, Ariana Grande, Britney Spears

Questionable: Nicki Minaj

Bad: Ernst Buch, Mili, Mozart, Shostakovich

Genres[edit | edit source]

Good: gospel, jazz, epic, 90s house, uk garage (when denoising set to 1.0), bubblegum eurodance, reggaeton, lo-fi, synthpop

Questionable: swing, ballad

Bad: military march, anthem

Other[edit | edit source]

Bad: most musical theory terminology (legato, crescendo, etc.)

You can check the vocabulary file in repository to get an idea whether your prompt might be accepted or not (careful, it's 1 Mb text file). The lower the number, the more common this word is in dataset, the more likely it should be to produce a good result.