Paper decisions are now available on OpenReview! Go to OpenReview
CVPR Workshop

What is Next in
Multimodal Foundation Models?

The 5th Edition of the MMFM Workshop exploring the frontiers of vision, language, and beyond.

June 3, 2026 (Afternoon)
Colorado Convention Center (Rooms 3A-3D)

About the Workshop

Multimodal Foundation Models (MMFMs) have revolutionized AI, achieving remarkable success across vision, language, speech, and beyond.

The 5th edition of this workshop aims to explore what is next in this rapidly evolving field, addressing fundamental challenges and charting paths forward. We bring together diverse leaders from academia and industry to discuss critical aspects including model design, training paradigms, generalization, efficiency, ethics, fairness, and open availability.

Topics of Interest

  • Vision / Sound&Speech / Robotics / Language FMs
  • Data and model scaling properties
  • Self / Semi / Weakly supervised training
  • Multimodal grounding in foundation models
  • Generative MMFMs (Text-to-Image/Video/3D)
  • Ethics, risks, and fairness

Invited Speakers

Workshop Schedule

Call for Papers

Archival Track

Full-length papers with proceedings in CVPR format (8 pages).

Non-Archival Track

Short papers or extended abstracts (4 pages) and accepted CVPR papers (8 Pages - CVPR format)

Papers will be peer-reviewed in a single blind format, and submissions need not be anonymized (but authors may choose to submit anonymously).

Important Deadlines (Tentative)

  • March 14, 2026, 23:59 AoEPaper Submission Deadline
  • April 01, 2026Notification to Authors
  • April 10, 2026Camera-ready Deadline
  • April 18, 2026Finalized Program
Submit via OpenReview

Organizers

Previous Editions

4th Edition (ICCV 2025)

Multimodal foundation models research and advances

View Workshop

3rd Edition (CVPR 2025)

Exploring multimodal foundation models

View Workshop