On May 31, 2026, during GTC Taipei at COMPUTEX, NVIDIA unveiled Cosmos 3, a model that promises to change how robots and autonomous vehicles understand the world. This is not just another iteration of the Cosmos family: it is a complete architectural leap, and it comes with a label no one else can claim — the first fully open “omnimodel” for physical AI.
The previous version of Cosmos was an ecosystem of specialized models — Cosmos Predict, Transfer, Reason, Policy — each doing a different thing. Cosmos 3 unifies everything into a single model capable of processing and generating text, images, video, environmental sound, and numerical actions (joint angles, gripper positions, trajectories). Five modalities in one system, with open weights anyone can download.
The architecture is as interesting as the proposition. Cosmos 3 uses Mixture-of-Transformers (MoT), which pairs two transformers in each layer: a “reasoner” handling autoregressive understanding (next-token prediction) and a “generator” handling diffusion-based generation (iterative denoising). The two interact through joint attention within each layer, allowing the model to reason about a scene and then generate it — or vice versa — in an integrated fashion. It’s not a VLM glued to a video generator; it’s a single model that does both with specialized parameters but constant communication.
NVIDIA released two entry variants: Cosmos 3 Super, with 64 billion parameters (32B reasoner + 32B generator), designed for massive synthetic data generation and deployment on Hopper or Blackwell GPUs. And Cosmos 3 Nano, with 16B (8B + 8B), optimized for workstations with RTX PRO 6000. A third variant, Cosmos 3 Edge, is announced as “coming soon” for real-time inference on edge devices.
Preliminary benchmarks show Cosmos 3 in first place among open models on Artificial Analysis, Physics-IQ, PAI-Bench, R-Bench (world generation accuracy), RoboLab and RoboArena (action policies), and VANTAGE-Bench (visual understanding). It’s early to know how it will compare against closed models like Gemini, but the mere existence of an open alternative in this space is significant.
The license is OpenMDW 1.1, a license created by the Linux Foundation specifically for world models. It permits use, modification, redistribution, and commercial deployment of weights, architecture, documentation, and code. It is not the most permissive license available — it has restrictions typical of model licenses — but it is a huge step forward compared to the closed models that dominate robotics.
NVIDIA also announced the Cosmos Coalition, a group of companies — Agile Robots, Black Forest Labs, Runway, Skild AI, among others — committed to collaboratively developing open world models. The message is clear: NVIDIA does not just want to be the GPU provider for physical AI; it wants to own the open model stack on which it is built.
Why does this matter? Because robotics and autonomous vehicles have been stuck in a classic problem: training a robot to understand the real world requires massive amounts of real-world data, which is expensive, slow, and difficult to scale. A world model like Cosmos 3 can generate synthetic environments, simulate trajectories, and evaluate policies in days rather than months. If the promise holds — and Jensen Huang said “the big bang of physical AI is just around the corner” — the impact on industries like manufacturing, logistics, construction, and transportation will be profound.
That said, context is needed. The term “fully open” is, in part, NVIDIA marketing. Training data is not fully disclosed, and although synthetic datasets are released, the processing pipeline is not entirely transparent. There is also no peer-reviewed technical paper — the deepest documentation is the Hugging Face blog, not an academic article. And the claim of reducing training cycles “from months to days” is directional, not a measured result.
But even with those caveats, Cosmos 3 represents a concrete step toward an AI that not only processes language and text but understands how the physical world works. And that anyone can download, modify, and use it to build robots changes the rules of the game.
Main source: How Cosmos 3 Helps Physical AI Think Before It Acts — NVIDIA Blog