San Francisco-based AI startup Deep Cogito is making waves with the release of a powerful new family of open-source large language models (LLMs). These models, ranging from 3B to 70B parameters, outperform current open models across key benchmarks—and aim to bring the field closer to general superintelligence.
Even more notable: the company’s flagship 70B model beats Meta’s Llama 4 109B Mixture-of-Experts (MoE) model in performance, a significant milestone in the open LLM race.
What Sets Deep Cogito Apart? IDA Training
At the core of these models is a novel alignment strategy called Iterated Distillation and Amplification (IDA). Unlike traditional approaches like Reinforcement Learning from Human Feedback (RLHF), IDA is built around two repeatable steps:
- Amplification: Uses advanced reasoning and more compute to derive stronger outputs.
- Distillation: Feeds those improved results back into the model’s parameters.
By repeating these steps, IDA creates a feedback loop that scales model intelligence more efficiently with compute—not just with human supervision. Deep Cogito claims this enables faster development and deeper capabilities, even with smaller teams.
In fact, the current models were trained by a small group in just 75 days.
Beating the Benchmarks, Model by Model
Deep Cogito has released LLMs at 3B, 8B, 14B, 32B, and 70B sizes, optimized for coding, agent-based use cases, and function calling. Each model supports dual operation modes:
- Standard LLM output, and
- “Thinking mode” with self-reflection before answering, similar to Claude 3.5.
While not focused on very long reasoning chains, the company says its models favor faster responses—a preference backed by user feedback.
Key performance highlights include:
- Cogito 70B:
- 91.73% on MMLU in standard mode (+6.4% vs Llama 3.3 70B)
- 91.00% in reasoning mode (+4.4% vs Deepseek R1 Distill 70B)
- Cogito 14B:
- Outperforms Qwen 2.5 and DeepSeek R1 across MMLU, GSM8K, and ARC
Deep Cogito models are fine-tuned on Llama and Qwen bases, but their training paradigm—IDA—drives the performance gains. The models are designed to be agent-friendly, supporting real-world uses in coding, automation, and general task-solving.
The company acknowledges that benchmarks aren’t the whole story. But based on early results, they believe the models deliver not just in tests—but in actual usage.
These releases are just the start. Deep Cogito plans to scale up fast, with larger MoE models on the way—including 109B, 400B, and 671B parameter models. All future models will be fully open-source, continuing their push for transparent and accessible AI development.
The team behind Deep Cogito emphasizes that we are still in the early stages of this scaling curve. But with the combination of fast training, strong benchmarks, and efficient self-improvement, the company believes it’s setting a new standard for open models—and laying the groundwork for superintelligent AI.