Skip to content

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preservation #7955

Open
@clarencechen

Description

@clarencechen

Model/Pipeline/Scheduler description

Existing methods for facial identity transfer for diffusion denoising image generation models face challenges in achieving high fidelity and detailed identity (ID) consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details. To address these limitations, the authors introduce ConsistentID, an innovative method crafted for diverse identity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image.

ConsistentID is comprised of three key components:

  • A fine-tuned IP-Adapter-FaceID-Plus module to capture the overall facial context from the reference image.
  • Expanded textual descriptions of generated from the reference face image using LLAVA 1.5 to further refine facial features.
  • An ID-preservation network injecting Perceiver-remapped CLIP embeddings of separated facial regions into the embeddings of the expanded text prompt, optimized through the facial attention localization strategy aimed at preserving ID consistency in facial regions.

Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions.

Open source status

  • The model implementation is available.
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Arxiv: https://arxiv.org/pdf/2404.16771
Github: https://github.com/JackAILab/ConsistentID
Contact: @JackAILab

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions