Replies: 1 comment 3 replies
-
HI, I a semantic segmentation controlnet should be trained the same as any other controlnet, I don't really understand what you're saying with the mask values, these values are not relevant for the controlnet, just for the semantic segmentation model, you just need to match the same classes with both. The controlnet is just trained with a text and an image pair, the original image, the condition image and a text to describe the original image, the model will learn everything else by itself. For example for an open pose controlnet, you don't tell it what part(color) of the skeleton is an arm, a leg or the head, you just describe the image and the model will learn the pose. Sadly I haven't trained a controlnet yet (I really want to so I have more insights) but you can learn how to train it from the datasets available When I do a training myself I will have more information, but I think the best way to train it is like I said, use something like the ade20k dataset and describe the image making sure it contains all the classes that are mapped. Just be advised that to train a controlnet you need a really big dataset, 5k or 10k won't do it, the best controlnet that works for me, was trained with over 500k images. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a question. When training ControlNet, the first image's semantic segmentation mask values (0, 1, 2) represent the background, the aircraft, and the train, respectively, and the corresponding text also describes these goals. In the next image, the semantic segmentation mask value represents other objects. Is it OK to do so? Or I need some other values to represent new objects. It would be great if someone had done similar experiments
Furthermore, this raises a question about how ControlNet understands the relationship between the layout map and the descriptive text. If the above is possible, then the layout map doesn't really need to represent any semantics, just the spatial layout. On the other hand, if this is not possible, it means that the layout map is better to provide semantic information as well.
Beta Was this translation helpful? Give feedback.
All reactions