About the Workshop
Multimodal Foundation Models (MMFMs) have revolutionized AI, achieving remarkable success across vision, language, speech, and beyond.
The 5th edition of this workshop aims to explore what is next in this rapidly evolving field, addressing fundamental challenges and charting paths forward. We bring together diverse leaders from academia and industry to discuss critical aspects including model design, training paradigms, generalization, efficiency, ethics, fairness, and open availability.
Topics of Interest
- Vision / Sound&Speech / Robotics / Language FMs
- Data and model scaling properties
- Self / Semi / Weakly supervised training
- Multimodal grounding in foundation models
- Generative MMFMs (Text-to-Image/Video/3D)
- Ethics, risks, and fairness
Invited Speakers
Workshop Schedule
Call for Papers
Archival Track
Full-length papers with proceedings in CVPR format (8 pages).
Non-Archival Track
Short papers or extended abstracts (4 pages). Papers already accepted to CVPR are also welcome.
Important Deadlines (Tentative)
- March 14, 2026Paper Submission Deadline
- April 01, 2026Notification to Authors
- April 11, 2026Camera-ready Deadline
- April 18, 2026Finalized Program