Skip to content

Commit

Permalink
Update 1.md
Browse files Browse the repository at this point in the history
  • Loading branch information
andysingal authored Nov 24, 2024
1 parent 368cb3a commit f9fc4a3
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions Multimodal/1.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,17 @@

<p>Embark on a digital exploration into the fascinating universe of Multimodal Large Language Models (LLMs). As you step into the GitHub repository dedicated to these multimodal wonders, you'll find a treasure trove of innovation where language and vision converge in a cosmic dance.</p>p>

By definition, multimodal models are intended to process numerous input modalities, such as text, images and videos and generate output n many modalities. These specialized LMMs frequently use pre-trained large-scale vision or language models as a foundation.

In the architecture of these models, an image encoder is utilized to extract visual features, followed by a standard language model that generates a text Sequence.

It may include a combination of images, videos and text and generate text responses in an open ended format. This capability allows it to perform tasks effectively, such as image captioning and visual question answering.

These visual tokens conditiona frozen language model, a pre-trained language model that will not get updates during this process.




## 🧠 Unveiling the Multimodal Symphony:
The repository houses the intricacies of Multimodal LLM architectures, showcasing their ability to comprehend both text and images simultaneously. These models redefine the boundaries of artificial intelligence, presenting a harmonious blend of linguistic understanding and visual interpretation.

Expand Down

0 comments on commit f9fc4a3

Please sign in to comment.