CVPR Workshop

What is Next in
Multimodal Foundation Models?

The 5th Edition of the MMFM Workshop exploring the frontiers of vision, language, and beyond.

June 2026
Denver, CO

About the Workshop

Multimodal Foundation Models (MMFMs) have revolutionized AI, achieving remarkable success across vision, language, speech, and beyond.

The 5th edition of this workshop aims to explore what is next in this rapidly evolving field, addressing fundamental challenges and charting paths forward. We bring together diverse leaders from academia and industry to discuss critical aspects including model design, training paradigms, generalization, efficiency, ethics, fairness, and open availability.

Topics of Interest

  • Vision / Sound&Speech / Robotics / Language FMs
  • Data and model scaling properties
  • Self / Semi / Weakly supervised training
  • Multimodal grounding in foundation models
  • Generative MMFMs (Text-to-Image/Video/3D)
  • Ethics, risks, and fairness

Invited Speakers

Workshop Schedule

Call for Papers

Archival Track

Full-length papers with proceedings in CVPR format (8 pages).

Non-Archival Track

Short papers or extended abstracts (4 pages). Papers already accepted to CVPR are also welcome.

Important Deadlines (Tentative)

  • March 14, 2026Paper Submission Deadline
  • April 01, 2026Notification to Authors
  • April 11, 2026Camera-ready Deadline
  • April 18, 2026Finalized Program
Submit via OpenReview

Organizers

Previous Editions

4th Edition (ICCV 2025)

Multimodal foundation models research and advances

View Workshop

3rd Edition (CVPR 2025)

Exploring multimodal foundation models

View Workshop