-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample results, Tooling on top of big-sleep #13
Comments
Heart for a nose! Lol |
Haha, it's creative indeed. Wanted to show the impact of seeds and iterations, for people that are puzzled by seeing completely different results. Added some more samples. Would like to figure out a list of nouns and modifiers/adjectives that work very well with big-sleep. For instance "made of" is used in the DALL-E example and seems to work very well here too. |
@enricoros at this rate, we may not need DALL-E! |
Thinking of me? Don't recognise myself, having worked as a visual artist working with neural networks since 2015 mainly largely writing my own code. Now learning about these new techniques, for eventually finding ways to integrate them into my own work processes. Yes, there is something uncanny about this. From the prompt "a cityscape in the style of Lionel Feininger" I got this. Of course someone familiar with Feininger's work would see the differences, but that would miss the similarities. Like someone influenced by Feininger... "A cityscape in the style of Paul Klee". Again, not exactly Klee, but very much in the right direction. In fact, like someone influenced by Klee and Mark Rothko. I was never really interested in BigGAN, but working on my own GAN code and my own image materiaIs to keep to my own style. Now I am wondering, not perplexed, about how we are suddenly getting so much more interesting results from BigGAN. Is it that these pictures were always there, but difficult to find (I experimented once a little in latent space search in BigGAN but gave up). Or is it that using different conditioning vectors on different layers (as I see that the code does) increases the variety even further, even if the BigGAN is still using the same weights as before? |
@htoyryla Very nice pictures! I think the fabulous results we are seeing is from the unique combination of the multimodal network (CLIP) and the GAN. The GAN has been trained to be able to generate textures and objects to realism, so it has the capacity to paint anything it wants. All the knobs are there. CLIP helps to guide it based on the immense amount of images and text it has seen (400 million I believe). CLIP is also composed of attention, which in my mind is more powerful than the gradients you could get from convolutions. The other thing that makes this combination special is that BigGAN is class conditioned, so it begins training at a random class as starting point. I believe the different starting points leads to a high variety of end results, even when rerunning on the same text. |
@lucidrains Thanks for the clear explanation. Feels so obvious now :) Extremely interesting. Your code projects here are an excellent resource for learning about these developments. Compact and keeping to the essentials, still fully working. |
I am still thinking of how the control from CLIP to BigGAN is implemented. Where the knobs are, so to say. It does not appear to be simply through a latent into the first layer but injection into multiple layers (which makes sense to me as different layers control different visual features). No need for a long explanation though, I can go on to investigate on my own. |
Love the artistic direction of this thread. My interest (other than reading the answer to @htoyryla's question), is in the tooling on top of this beautiful technology that can allow for artistic control, save/restore/collaboration, and to not waste computing resources (and precious time!) running notebook for hours for a single image which is then discarded. I'm summarizing my ideas for the transition in the "usability" of generative technologies in this picture: What I'm not mentioning here is the plan for "the day after", which could use trained networks to replace the manual selection process, and weed out automatically pictures that are straight out garbage (we see many :). This would require API changes to make the library more controllable, executable in a step-by-step fashion, make latent space restorable/saveable (instead of starting from an initial seed and crossing fingers), not to mention then going into editing of the latent space from an UI (point, interpolate, etc). Am I going too far off the deep end? :) |
I already tried saving the latents together with each intermediate image, and then made a separate script for generating images by interpolating between two stored latents. No problems with that, worked nicely. |
Please share the code! Open a pull request, or fork the repo and add. @htoyryla: what are your creation flows, using this tech? |
My experiment was based on an earlier version, so I will make a new fork, make the necessary changes, test and let you know. Nothing fancy, just how I did it. My workflow in art is based on my own GAN, with lots of options, my own image sets, usually quite small and focused to limit the visual world. In addition I use other tools, such as pix2pix and the like, to modify images. Here I am simply getting familiar with these new technological options. |
See here https://github.com/htoyryla/big-sleep . It will store latents in a pth file (named similar to the image) when save_progress is used. The lines for storing are here https://github.com/htoyryla/big-sleep/blob/472699165a4d792f0837239836e7e5a1f45dcd88/big_sleep/big_sleep.py#L243-L246 bsmorph.py shows then that the latents can be loaded and that it is possible to interpolate between them. There is nothing yet for continuing training from stored latents, but it should be straightforward to initialise latents from a stored one. Use lats = torch.load(filename) to read latents from a file and then initialise the latents with lats.normu and lats.cls here big-sleep/big_sleep/big_sleep.py Lines 72 to 73 in a7ad18c
|
Here's a morph between two latents I stored: p2l.mp4 |
Beautiful. Can't wait to learn from your code. |
Did you notice my comment about the code above? |
this is a very important discussion and right at the heart of my research. There are ofc many, many issues to be solved yet but the last couple of months even huge steps towards generative design workflows have been made. I try to stay a bit sober however because as opposed to many wonderful users of these new workflows I'm not an artist. I'm an engineer and a designer so operationalizing this things under constraints of real world projects is a huge task. Thankfully, that is another area I feel has had some very important works come out in the last few months. Such an exciting few years we're entering! |
I am both. I worked decades in the development of specialised mobile networks, at times mediating between the customer and the actual development. Currently, my approach to coding is to proceed in small steps. Experiments and enhancements that can be implemented in a single day. In the long run, it can still go far enough. |
Perhaps I'm mistaken, but with |
@indiv0 GOOD CATCH! Updating the post with .07 |
@lucidrains I'm experimenting with a UI for human-in-the-loop (@TheodoreGalanos). Example of a few-hours of coding. Not connected with a backend. I want to have the backend remote, so I can run it on a headless Linux box with a more powerful GPU while viewing the results from my less powerful machine.
|
@enricoros I'm all ears :) Just let me know how you would envision the API and I'll put in some time later this week! |
If you guys need any help with this, just point me at an issue. I’m super new to ML but I’ve got some backend experience and I’d love to help out where I can, especially with @enricoros’ UI. |
Nice job @enricoros ! This is a great start. I wonder can we use generated images as seeds for another generation with deep sleep? or that is too constrictive? Interactive (latent) evolution would preferrably happen like that although I can definitely see this as a sort of 1-loop run and at the end of multiple runs you have a basket of candidates to work with. |
this might sound silly but is a hugginface-like API viable for this things? |
@TheodoreGalanos That's one of the options I want to enable. You could ideally continue the generation from the same hyperparams+latents, or even steer a new generation towards a different prompt - or even cross-pollinate latents and such.
@TheodoreGalanos how would that API look like? @lucidrains Thanks for volunteering :D, I'll keep you posted. At the moment I've added socket.io (websockets) support to a different cmdline util which uses Imagine() and I'm fighting off long blocking calls vs threaded execution of the websocket event loop. @indiv0 If you have python experience, some experimental code is on https://github.com/enricoros/big-sleep-creator/blob/main/creator.py - I need to send flask-socket.io messages even while running long blocking operations (see line 126), so that the websocket doesn't disconnect from the UI. I can either block all (including socket messages) until an operation is complete, or execute everything in parallel (which parallelizes Image generation, which crashes the server). I don't have any experience here, let me know if you spot any mistake. This is the current progress github.com/enricoros/big-sleep-creator: Compared to the last update, now the WebApp connects to the big-sleep python process on the same or different machine (see the GPU info, top-right), and can sync status and run a generation operation, "imagine()". No results are retrieved yet, as I need PNG buffers to send back to the UI instead of files written to disk. I will have more progress towards the end of the week. |
@enricoros Looks awesome! The UI looks like exactly what I'd want, personally. I'm working on something similar myself here: I'll take a look at the websocket stuff. |
Looks really amazing, have you shared this link with people yet? I love the quality of the generated results. My idea is to be able to see and edit the 'dreams' while they are happening, to select the best ones and suppress the weird :) |
Yeah I've shared it a little bit. Almost all of the submissions are from users, not me. I absolutely agree with you. The human-in-the-loop functionality is critical. I plan to add an account system so that users can terminate/re-run their renders and get the results they want. |
great job! this is all that i hoped would happen ;) stop-gap measure before we all have an imagination machine in our living room :D when we finally replicate DALL-E, the internet is going to explode :D |
@indiv0 some suggestions (1) have Big Sleep generated up to N candidate images and have viewers vote on which one is the best (2) comments, disqus or home built (would be hilarious) |
@indiv0 are you doing anything special for the site? or is it mostly all just run with the default settings? |
@lucidrains For sure. Giving users more control over selecting optimal images is an important feature and would greatly help users generate good results. Currently I'm running each query with 75 iterations for 7 epochs with a learning rate of 0.06.] I can't WAIT until we can replicate DALL-E. You're absolutely right. Near real-time DALL-E will be an absolute game changer for creative expression online. In the meantime I'm going to work on adding extra models to the site (like deep-daze) and giving users more control over their renders. Right now the limiting factor is actually the speed of the model. At 8 minutes per render the queue just keeps growing (I can't process requests fast enough) and I don't have infinite money to spend on GPUs so if we can think of any way to speed it up that'd be a huge win. |
Does anyone have any ideas on saving / loading the latents with the currently release (0.7.0)?
then I just get "raw" BigGAN images (mostly dogs), rather than the dream image? Update: |
Hi, saving/restoring the latents does not seem to be enought. It seems that big_sleep is a lot more agressive during the first iterations. Do you guys have an idea? |
I think you might be right. I looked at the EMA class at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/ema.py#L16 After each iteration the |
I've set the initial ema value to very low numbers and it doesn't affect the rate of change in the beginning. The only thing I've tweaked that seems to affect it is the learning rate, and the only consumer of that is the Adam optimizer. But setting the |
yeah changing accum does not work, I did a for loop that call 350 times the update function (my latent is from epoch 0, iteration 350) in the constructor of EMA but it changed the picture (darker, a little bit different) so there is something to do here |
@wolfgangmeyers could you check #86 I was a bit agressive, I dumped the whole EMA/ADAM objects to disk and restores them |
I was able to get it working - I think this is perfect for generating a large number of images quickly and then picking which ones to finish. I left some feedback on your PR, but I don't have permission to approve it :) |
Hello! Does anyone know what Seed stands for? I can just use random numbers? I have the same doubt for iterations and learning rate. I keep using random numbers, but dont know what they mean. Anyone can help? :) |
big-sleep is GORGEOUS. We need to explore what it can do, where it shines, and what to avoid.
Adding a few pics down below, but I'm still in early experimentation - will update the thread later.
Puppies
« a colorful cartoon of a dog »
seed=553905700049900, iteration=160, lr=.07, size=256
iteration=490
« a colorful cartoon of a dog with blue eyes and a heart »
seed=555169003382600, iteration=400, lr=.07, size=256
Clouds
« clouds in the shape of a donut »
seed=581307748222100, iteration=360, lr=.07, size=256
seed=583134047383400, iteration=390, lr=.07, size=256
This post will be edited to add new samples
The text was updated successfully, but these errors were encountered: