multimodal

RefineSumm: Self-Refining MLLM for Generating a Multimodal Summarization Dataset

Multimodal Large Language Models (MLLMs) excel at synthesizing key information from diverse sources. However, generating accurate and faithful multimodal summaries is challenging, primarily due to the lack of appropriate multimodal datasets for …

Unified Embeddings for Multimodal Retrieval via Frozen LLMs