Inception raises $50 million to build diffusion models for code and text

26 2 minutes read

Inception raises 50 million to build diffusion models for code.jpg

With so much money flowing into AI startups, it’s time to be an AI researcher with an idea to test. If the idea is new enough, it may be easier to get the resources you need as an independent company rather than within a large laboratory.

That’s the story of Inception, a startup developing diffusion-based AI models that just raised $50 million in seed funding led by Menlo Ventures, with participation from Mayfield, Innovation Endeavors, Nvidia’s NVentures, Microsoft’s M12 Fund, Snowflake Ventures, and Databricks investment. Andrew Ng and Andrej Karpathy provided additional angel funding.

The project leader is Stanford University professor Stefano Ermon, whose research focuses on diffusion models, which generate outputs through iterative optimization, rather than word-for-word. These models power image-based AI systems like Stable Diffusion, Midjourney, and Sora. Having worked on these systems before the AI boom made them exciting, Ermon is using Inception to apply the same models to a broader range of tasks.

Along with funding, the company released a new version of its Mercury model, designed for software development. Mercury is already integrated into a number of development tools, including ProxyAI, Buildglare, and Kilo Code. Most importantly, Ermon says the diffusion approach will help Inception’s models maintain two of the most important metrics: latency (response time) and cost accounting.

“Diffusion-based LLM programs are much faster and more efficient than what anyone else is building today,” Ermon says. “It’s just a completely different approach where there are a lot of innovations that can still be brought to the table.”

Understanding the technical difference requires a little background. Diffusion models are structurally different from autoregression models, which dominate text-based AI services. Autoregressive models such as GPT-5 and Gemini work sequentially, predicting each word or part of the next word based on previously processed material. Diffusion models, trained to generate images, take a more holistic approach, gradually modifying the overall structure of the response until it matches the desired outcome.

Conventional wisdom is to use autoregression models for text applications, and this approach has been very successful for recent generations of AI models. But a growing body of research suggests that diffusion models may perform better when the model processes large amounts of text or manages data limitations. As Ermon says, these qualities become a real advantage when performing operations on large code bases.

TechCrunch event

San Francisco
|
October 13-15, 2026

Deployment models also have greater flexibility in how devices are used, a particularly important feature as AI infrastructure requirements become clear. Where autoregression models have to execute operations one at a time, diffusion models can handle many operations simultaneously, allowing for much less delay in complex tasks.

“We’ve been measured at over 1,000 tokens per second, which is much higher than anything possible with current autoregressive techniques, because our thing is designed to be parallel. It’s designed to be really, really fast,” Ermon says.

Don’t miss more hot News like this! Click here to discover the latest in Technology news!

2025-11-06 13:00:00

26 2 minutes read