Sample results, Tooling on top of big-sleep #13

enricoros · 2021-01-22T11:24:10Z

big-sleep is GORGEOUS. We need to explore what it can do, where it shines, and what to avoid.

Adding a few pics down below, but I'm still in early experimentation - will update the thread later.

Puppies

« a colorful cartoon of a dog »

seed=553905700049900, iteration=160, lr=.07, size=256
same, iteration=490

« a colorful cartoon of a dog with blue eyes and a heart »

seed=555169003382600, iteration=400, lr=.07, size=256

Clouds

« clouds in the shape of a donut »

seed=581307748222100, iteration=360, lr=.07, size=256
seed=583134047383400, iteration=390, lr=.07, size=256

This post will be edited to add new samples

The text was updated successfully, but these errors were encountered:

lucidrains · 2021-01-22T15:49:11Z

Heart for a nose! Lol

enricoros · 2021-01-22T20:59:33Z

Haha, it's creative indeed. Wanted to show the impact of seeds and iterations, for people that are puzzled by seeing completely different results. Added some more samples. Would like to figure out a list of nouns and modifiers/adjectives that work very well with big-sleep. For instance "made of" is used in the DALL-E example and seems to work very well here too.

lucidrains · 2021-01-22T21:47:59Z

@enricoros at this rate, we may not need DALL-E!

htoyryla · 2021-01-23T07:36:34Z

Wanted to show the impact of seeds and iterations, for people that are puzzled by seeing completely different results.

Thinking of me? Don't recognise myself, having worked as a visual artist working with neural networks since 2015 mainly largely writing my own code. Now learning about these new techniques, for eventually finding ways to integrate them into my own work processes.

Yes, there is something uncanny about this. From the prompt "a cityscape in the style of Lionel Feininger" I got this. Of course someone familiar with Feininger's work would see the differences, but that would miss the similarities. Like someone influenced by Feininger...

"A cityscape in the style of Paul Klee". Again, not exactly Klee, but very much in the right direction. In fact, like someone influenced by Klee and Mark Rothko.

I was never really interested in BigGAN, but working on my own GAN code and my own image materiaIs to keep to my own style. Now I am wondering, not perplexed, about how we are suddenly getting so much more interesting results from BigGAN. Is it that these pictures were always there, but difficult to find (I experimented once a little in latent space search in BigGAN but gave up). Or is it that using different conditioning vectors on different layers (as I see that the code does) increases the variety even further, even if the BigGAN is still using the same weights as before?

lucidrains · 2021-01-23T07:58:38Z

@htoyryla Very nice pictures!

I think the fabulous results we are seeing is from the unique combination of the multimodal network (CLIP) and the GAN. The GAN has been trained to be able to generate textures and objects to realism, so it has the capacity to paint anything it wants. All the knobs are there. CLIP helps to guide it based on the immense amount of images and text it has seen (400 million I believe). CLIP is also composed of attention, which in my mind is more powerful than the gradients you could get from convolutions.

The other thing that makes this combination special is that BigGAN is class conditioned, so it begins training at a random class as starting point. I believe the different starting points leads to a high variety of end results, even when rerunning on the same text.

htoyryla · 2021-01-23T08:06:08Z

@lucidrains Thanks for the clear explanation. Feels so obvious now :) Extremely interesting.

Your code projects here are an excellent resource for learning about these developments. Compact and keeping to the essentials, still fully working.

htoyryla · 2021-01-23T08:33:43Z

The GAN has been trained to be able to generate textures and objects to realism, so it has the capacity to paint anything it wants. All the knobs are there.

I am still thinking of how the control from CLIP to BigGAN is implemented. Where the knobs are, so to say. It does not appear to be simply through a latent into the first layer but injection into multiple layers (which makes sense to me as different layers control different visual features).

No need for a long explanation though, I can go on to investigate on my own.

enricoros · 2021-01-23T09:44:32Z

Love the artistic direction of this thread. My interest (other than reading the answer to @htoyryla's question), is in the tooling on top of this beautiful technology that can allow for artistic control, save/restore/collaboration, and to not waste computing resources (and precious time!) running notebook for hours for a single image which is then discarded.

I'm summarizing my ideas for the transition in the "usability" of generative technologies in this picture:

What I'm not mentioning here is the plan for "the day after", which could use trained networks to replace the manual selection process, and weed out automatically pictures that are straight out garbage (we see many :).

This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :)

htoyryla · 2021-01-23T09:50:05Z

This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :)

I already tried saving the latents together with each intermediate image, and then made a separate script for generating images by interpolating between two stored latents. No problems with that, worked nicely.

enricoros · 2021-01-23T10:00:24Z

Please share the code! Open a pull request, or fork the repo and add. @htoyryla: what are your creation flows, using this tech?

htoyryla · 2021-01-23T10:23:25Z

Please share the code! Open a pull request, or fork the repo and add. @htoyryla: what are your creation flows, using this tech?

My experiment was based on an earlier version, so I will make a new fork, make the necessary changes, test and let you know. Nothing fancy, just how I did it.

My workflow in art is based on my own GAN, with lots of options, my own image sets, usually quite small and focused to limit the visual world. In addition I use other tools, such as pix2pix and the like, to modify images. Here I am simply getting familiar with these new technological options.

htoyryla · 2021-01-23T11:09:07Z

See here https://github.com/htoyryla/big-sleep . It will store latents in a pth file (named similar to the image) when save_progress is used.

The lines for storing are here https://github.com/htoyryla/big-sleep/blob/472699165a4d792f0837239836e7e5a1f45dcd88/big_sleep/big_sleep.py#L243-L246

bsmorph.py shows then that the latents can be loaded and that it is possible to interpolate between them.
The interpolation in bsmorph.py is very crude, feel free to use your own.

There is nothing yet for continuing training from stored latents, but it should be straightforward to initialise latents from a stored one. Use lats = torch.load(filename) to read latents from a file and then initialise the latents with lats.normu and lats.cls here

big-sleep/big_sleep/big_sleep.py

Lines 72 to 73 in a7ad18c

    
           self.normu = torch.nn.Parameter(torch.zeros(num_latents, 128).normal_(std = 1)) 
        
           self.cls = torch.nn.Parameter(torch.zeros(num_latents, 1000).normal_(mean = -3.9, std = .3))

htoyryla · 2021-01-23T11:55:10Z

Here's a morph between two latents I stored:

p2l.mp4

enricoros · 2021-01-23T23:57:33Z

Beautiful. Can't wait to learn from your code.

htoyryla · 2021-01-24T06:26:31Z

Beautiful. Can't wait to learn from your code.

Did you notice my comment about the code above?

TheodoreGalanos · 2021-01-24T07:55:33Z

Love the artistic direction of this thread. My interest (other than reading the answer to @htoyryla's question), is in the tooling on top of this beautiful technology that can allow for artistic control, save/restore/collaboration, and to not waste computing resources (and precious time!) running notebook for hours for a single image which is then discarded.

I'm summarizing my ideas for the transition in the "usability" of generative technologies in this picture:

What I'm not mentioning here is the plan for "the day after", which could use trained networks to replace the manual selection process, and weed out automatically pictures that are straight out garbage (we see many :).

This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :)

this is a very important discussion and right at the heart of my research. There are ofc many, many issues to be solved yet but the last couple of months even huge steps towards generative design workflows have been made.

I try to stay a bit sober however because as opposed to many wonderful users of these new workflows I'm not an artist. I'm an engineer and a designer so operationalizing this things under constraints of real world projects is a huge task. Thankfully, that is another area I feel has had some very important works come out in the last few months.

Such an exciting few years we're entering!

htoyryla · 2021-01-24T08:25:06Z

I try to stay a bit sober however because as opposed to many wonderful users of these new workflows I'm not an artist. I'm an engineer and a designer so operationalizing this things under constraints of real world projects is a huge task.

I am both. I worked decades in the development of specialised mobile networks, at times mediating between the customer and the actual development.

Currently, my approach to coding is to proceed in small steps. Experiments and enhancements that can be implemented in a single day. In the long run, it can still go far enough.

indiv0 · 2021-01-25T01:18:03Z

« a colorful cartoon of a dog »
* `seed=553905700049900, iteration=160, lr=0.7, size=256`

Perhaps I'm mistaken, but with lr=0.7 I'm getting completely wrong results (e.g. fully white or green images). Is this supposed to be lr=0.07 instead? I'm pretty new to all of this.

enricoros · 2021-01-25T04:14:23Z

@indiv0 GOOD CATCH! Updating the post with .07

enricoros · 2021-01-25T09:25:24Z

@lucidrains I'm experimenting with a UI for human-in-the-loop (@TheodoreGalanos).

Example of a few-hours of coding. Not connected with a backend. I want to have the backend remote, so I can run it on a headless Linux box with a more powerful GPU while viewing the results from my less powerful machine.

Would you be open to a few API changes to enable this sort of operation? mainly updating (some) hyper params, saving/restoring latent state, and getting the png buffer instead of saving it to disk.

lucidrains · 2021-01-25T16:34:06Z

@enricoros I'm all ears :) Just let me know how you would envision the API and I'll put in some time later this week!

indiv0 · 2021-01-25T19:00:40Z

If you guys need any help with this, just point me at an issue. I’m super new to ML but I’ve got some backend experience and I’d love to help out where I can, especially with @enricoros’ UI.

TheodoreGalanos · 2021-01-26T02:57:56Z

Nice job @enricoros ! This is a great start. I wonder can we use generated images as seeds for another generation with deep sleep? or that is too constrictive? Interactive (latent) evolution would preferrably happen like that although I can definitely see this as a sort of 1-loop run and at the end of multiple runs you have a basket of candidates to work with.

TheodoreGalanos · 2021-01-26T02:59:26Z

@enricoros I'm all ears :) Just let me know how you would envision the API and I'll put in some time later this week!

this might sound silly but is a hugginface-like API viable for this things?

enricoros · 2021-01-26T06:10:53Z

Nice job @enricoros ! This is a great start. I wonder can we use generated images as seeds for another generation with deep sleep? or that is too constrictive? Interactive (latent) evolution would preferrably happen like that although I can definitely see this as a sort of 1-loop run and at the end of multiple runs you have a basket of candidates to work with.

@TheodoreGalanos That's one of the options I want to enable. You could ideally continue the generation from the same hyperparams+latents, or even steer a new generation towards a different prompt - or even cross-pollinate latents and such.

this might sound silly but is a hugginface-like API viable for this things?

@TheodoreGalanos how would that API look like?

@lucidrains Thanks for volunteering :D, I'll keep you posted. At the moment I've added socket.io (websockets) support to a different cmdline util which uses Imagine() and I'm fighting off long blocking calls vs threaded execution of the websocket event loop.

@indiv0 If you have python experience, some experimental code is on https://github.com/enricoros/big-sleep-creator/blob/main/creator.py - I need to send flask-socket.io messages even while running long blocking operations (see line 126), so that the websocket doesn't disconnect from the UI. I can either block all (including socket messages) until an operation is complete, or execute everything in parallel (which parallelizes Image generation, which crashes the server). I don't have any experience here, let me know if you spot any mistake.

This is the current progress github.com/enricoros/big-sleep-creator:

Compared to the last update, now the WebApp connects to the big-sleep python process on the same or different machine (see the GPU info, top-right), and can sync status and run a generation operation, "imagine()". No results are retrieved yet, as I need PNG buffers to send back to the UI instead of files written to disk. I will have more progress towards the end of the week.

indiv0 · 2021-01-26T09:52:42Z

@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: ~~http://ec2-34-215-137-20.us-west-2.compute.amazonaws.com/~~ https://dank.xyz/ except mine is intended to run without needing colab.

I'll take a look at the websocket stuff.

enricoros · 2021-01-28T00:22:34Z

@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: https://dank.xyz/ except mine is intended to run without needing colab.

Looks really amazing, have you shared this link with people yet? I love the quality of the generated results. My idea is to be able to see and edit the 'dreams' while they are happening, to select the best ones and suppress the weird :)

indiv0 · 2021-01-28T08:27:05Z

Yeah I've shared it a little bit. Almost all of the submissions are from users, not me.

I absolutely agree with you. The human-in-the-loop functionality is critical. I plan to add an account system so that users can terminate/re-run their renders and get the results they want.

lucidrains · 2021-01-29T18:02:29Z

@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: ~~http://ec2-34-215-137-20.us-west-2.compute.amazonaws.com/~~ https://dank.xyz/ except mine is intended to run without needing colab.

I'll take a look at the websocket stuff.

great job! this is all that i hoped would happen ;) stop-gap measure before we all have an imagination machine in our living room :D

when we finally replicate DALL-E, the internet is going to explode :D

lucidrains · 2021-01-29T18:06:48Z

@indiv0 some suggestions (1) have Big Sleep generated up to N candidate images and have viewers vote on which one is the best (2) comments, disqus or home built (would be hilarious)

lucidrains · 2021-01-29T21:58:40Z

@indiv0 are you doing anything special for the site? or is it mostly all just run with the default settings?

indiv0 · 2021-01-30T00:55:28Z

@lucidrains For sure. Giving users more control over selecting optimal images is an important feature and would greatly help users generate good results.

Currently I'm running each query with 75 iterations for 7 epochs with a learning rate of 0.06.]

I can't WAIT until we can replicate DALL-E. You're absolutely right. Near real-time DALL-E will be an absolute game changer for creative expression online. In the meantime I'm going to work on adding extra models to the site (like deep-daze) and giving users more control over their renders.

Right now the limiting factor is actually the speed of the model. At 8 minutes per render the queue just keeps growing (I can't process requests fast enough) and I don't have infinite money to spend on GPUs so if we can think of any way to speed it up that'd be a huge win.

nerdyrodent · 2021-03-06T21:50:09Z

Does anyone have any ideas on saving / loading the latents with the currently release (0.7.0)?
If I load the saved latents like this:

noise1 = lat1.model.normu.to(device)
class1 = lat1.model.cls.to(device)

then I just get "raw" BigGAN images (mostly dogs), rather than the dream image?

Update:
Oh, I see. Now need to save like this:
lats = self.model.model.latents.cpu()

KiudLyrl · 2021-04-26T17:15:48Z

Hi, saving/restoring the latents does not seem to be enought.

It seems that big_sleep is a lot more agressive during the first iterations.
I think it has something to do with ema_decay but I'm not sure.

Do you guys have an idea?
Thanks

wolfgangmeyers · 2021-04-26T22:11:20Z

Hi, saving/restoring the latents does not seem to be enought.

It seems that big_sleep is a lot more agressive during the first iterations.
I think it has something to do with ema_decay but I'm not sure.

Do you guys have an idea?
Thanks

I think you might be right. I looked at the EMA class at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/ema.py#L16

After each iteration the accum value is multiplied by the ema_decay, but it is always initialized to 1. If the EMA constructor accepted initial accum value, I think it could either be saved or recalculated based on the current iteration. If I have time before your PR gets merged I may test it :)

wolfgangmeyers · 2021-04-27T03:37:10Z

Hi, saving/restoring the latents does not seem to be enought.
It seems that big_sleep is a lot more agressive during the first iterations.
I think it has something to do with ema_decay but I'm not sure.
Do you guys have an idea?
Thanks

I think you might be right. I looked at the EMA class at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/ema.py#L16

After each iteration the accum value is multiplied by the ema_decay, but it is always initialized to 1. If the EMA constructor accepted initial accum value, I think it could either be saved or recalculated based on the current iteration. If I have time before your PR gets merged I may test it :)

I've set the initial ema value to very low numbers and it doesn't affect the rate of change in the beginning. The only thing I've tweaked that seems to affect it is the learning rate, and the only consumer of that is the Adam optimizer. But setting the lr parameter to a low value does cause the rate of change to be slower, but it still tapers down over time. I think this is a behavior built into the Adam optimizer based on https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/ - but the docs are mostly beyond my understanding.

KiudLyrl · 2021-04-27T07:27:40Z

yeah changing accum does not work, I did a for loop that call 350 times the update function (my latent is from epoch 0, iteration 350) in the constructor of EMA but it changed the picture (darker, a little bit different) so there is something to do here

KiudLyrl · 2021-04-27T09:30:55Z

@wolfgangmeyers could you check #86
It seems to be working fine.

I was a bit agressive, I dumped the whole EMA/ADAM objects to disk and restores them

wolfgangmeyers · 2021-04-29T21:06:36Z

@wolfgangmeyers could you check #86
It seems to be working fine.

I was a bit agressive, I dumped the whole EMA/ADAM objects to disk and restores them

I was able to get it working - I think this is perfect for generating a large number of images quickly and then picking which ones to finish. I left some feedback on your PR, but I don't have permission to approve it :)

rafaelpuga · 2021-07-24T21:27:24Z

Hello! Does anyone know what Seed stands for? I can just use random numbers? I have the same doubt for iterations and learning rate. I keep using random numbers, but dont know what they mean. Anyone can help? :)

enricoros changed the title ~~This package is too pretty (pictures thread :)~~ This package is too pretty (pictures, self-questioning, realizations, and future of creativity) Jan 23, 2021

enricoros changed the title ~~This package is too pretty (pictures, self-questioning, realizations, and future of creativity)~~ This package is too pretty (pictures, self-questioning, realizations, and future of creativity :) Jan 23, 2021

enricoros changed the title ~~This package is too pretty (pictures, self-questioning, realizations, and future of creativity :)~~ Sample results, Tooling on top of big-sleep Jan 28, 2021

Sample results, Tooling on top of big-sleep #13

Sample results, Tooling on top of big-sleep #13

Comments

enricoros commented Jan 22, 2021 • edited Loading

Puppies

Clouds

lucidrains commented Jan 22, 2021

enricoros commented Jan 22, 2021 • edited Loading

lucidrains commented Jan 22, 2021

htoyryla commented Jan 23, 2021 • edited Loading

lucidrains commented Jan 23, 2021

htoyryla commented Jan 23, 2021 • edited Loading

htoyryla commented Jan 23, 2021

enricoros commented Jan 23, 2021

htoyryla commented Jan 23, 2021 • edited Loading

enricoros commented Jan 23, 2021 • edited Loading

htoyryla commented Jan 23, 2021

htoyryla commented Jan 23, 2021 • edited Loading

htoyryla commented Jan 23, 2021

enricoros commented Jan 23, 2021

htoyryla commented Jan 24, 2021

TheodoreGalanos commented Jan 24, 2021

htoyryla commented Jan 24, 2021 • edited Loading

indiv0 commented Jan 25, 2021 • edited Loading

enricoros commented Jan 25, 2021 • edited Loading

enricoros commented Jan 25, 2021 • edited Loading

lucidrains commented Jan 25, 2021

indiv0 commented Jan 25, 2021

TheodoreGalanos commented Jan 26, 2021

TheodoreGalanos commented Jan 26, 2021

enricoros commented Jan 26, 2021 • edited Loading

indiv0 commented Jan 26, 2021 • edited Loading

enricoros commented Jan 28, 2021

indiv0 commented Jan 28, 2021

lucidrains commented Jan 29, 2021 • edited Loading

lucidrains commented Jan 29, 2021

lucidrains commented Jan 29, 2021

indiv0 commented Jan 30, 2021

nerdyrodent commented Mar 6, 2021 • edited Loading

KiudLyrl commented Apr 26, 2021

wolfgangmeyers commented Apr 26, 2021

wolfgangmeyers commented Apr 27, 2021

KiudLyrl commented Apr 27, 2021

KiudLyrl commented Apr 27, 2021

wolfgangmeyers commented Apr 29, 2021

rafaelpuga commented Jul 24, 2021

enricoros commented Jan 22, 2021 •

edited

Loading

enricoros commented Jan 22, 2021 •

edited

Loading

htoyryla commented Jan 23, 2021 •

edited

Loading

htoyryla commented Jan 23, 2021 •

edited

Loading

htoyryla commented Jan 23, 2021 •

edited

Loading

enricoros commented Jan 23, 2021 •

edited

Loading

htoyryla commented Jan 23, 2021 •

edited

Loading

htoyryla commented Jan 24, 2021 •

edited

Loading

indiv0 commented Jan 25, 2021 •

edited

Loading

enricoros commented Jan 25, 2021 •

edited

Loading

enricoros commented Jan 25, 2021 •

edited

Loading

enricoros commented Jan 26, 2021 •

edited

Loading

indiv0 commented Jan 26, 2021 •

edited

Loading

lucidrains commented Jan 29, 2021 •

edited

Loading

nerdyrodent commented Mar 6, 2021 •

edited

Loading