MMFM @ CVPR 2026 | What is Next in Multimodal Foundation Models?

About the Workshop

Multimodal Foundation Models (MMFMs) have revolutionized AI, achieving remarkable success across vision, language, speech, and beyond.

The 5th edition of this workshop aims to explore what is next in this rapidly evolving field, addressing fundamental challenges and charting paths forward. We bring together diverse leaders from academia and industry to discuss critical aspects including model design, training paradigms, generalization, efficiency, ethics, fairness, and open availability.

Topics of Interest

Vision / Sound&Speech / Robotics / Language FMs
Data and model scaling properties
Self / Semi / Weakly supervised training
Multimodal grounding in foundation models
Generative MMFMs (Text-to-Image/Video/3D)
Ethics, risks, and fairness

Invited Speakers

Workshop Schedule

Call for Papers

Archival Track

Full-length papers with proceedings in CVPR format (8 pages).

Non-Archival Track

Short papers or extended abstracts (4 pages) and accepted CVPR papers (8 Pages - CVPR format)

Papers will be peer-reviewed in a single blind format, and submissions need not be anonymized (but authors may choose to submit anonymously).

Important Deadlines (Tentative)

March 14, 2026, 23:59 AoEPaper Submission Deadline
April 01, 2026Notification to Authors
April 10, 2026Camera-ready Deadline
April 18, 2026Finalized Program

Submit via OpenReview

Organizers

Previous Editions

4th Edition (ICCV 2025)

Multimodal foundation models research and advances

View Workshop

3rd Edition (CVPR 2025)

Exploring multimodal foundation models

View Workshop