While AI tools have transformed creative and coding workflows, the inference optimization process—making trained AI models run efficiently—remains slow, expensive, and manual. It’s a major barrier for startups and enterprises building and deploying AI at scale. TheStage AI, a US-based deeptech startup, is tackling this bottleneck head-on with a bold mission: to automate inference optimization across devices and cloud platforms.
The company has just secured $4.5 million in seed funding to accelerate development of its proprietary system and expand access to high-performance AI deployment. The round brought together strategic backers, including Mehreen Malik, Dominic Williams (DFINITY), Atlantic Labs (SoundCloud), Nick Davidov (DVC), and AAL VC.
Why inference optimization matters
Currently, AI engineers often spend months manually tuning models for specific hardware, which consumes significant GPU resources and inflates deployment costs. According to McKinsey, up to 70% of AI deployment expenses stem from GPU infrastructure alone. This makes inference—not training—the true cost center of operational AI.
TheStage AI’s answer is full-stack automation. Its platform removes the manual grind of adapting models for different environments, slashing time and cost without sacrificing quality or speed.
What does TheStage AI do?
The company’s flagship product, ANNA (Automatic NNs Analyzer), uses discrete math and AI to automatically optimize PyTorch models through quantization, pruning, and sparsification. ANNA can generate “Elastic models” that adjust in size and speed depending on device needs, from smartphones to GPUs and cloud servers. Users can pick a model like they would choose video quality on YouTube—faster when needed, higher quality when possible.
These optimized models are made available in a growing Model Library, featuring pre-optimized versions of open-source architectures like Stable Diffusion. TheStage AI also offers custom acceleration services for developers working on bespoke models.
Importantly, the platform supports a wide range of hardware—no vendor lock-in. Whether you’re using AWS, Google Cloud, Azure, or on-premise GPUs, TheStage AI adapts models to fit.
How the funding will be used
With fresh funding in hand, TheStage AI plans to:
- Expand ANNA’s core optimization capabilities
- Grow its Model Library of pre-optimized solutions
- Scale infrastructure for faster global deployment
- Deepen integrations with cloud platforms like AWS and Azure
- Accelerate customer acquisition, especially among app developers and AI engineers
The team’s long-term goal? “To embed into every major AI development tool and platform in the next 3–5 years,” says CEO Kirill Solodskih. With ANNA, users will be able to build tailored neural networks in minutes—confident they’re achieving peak efficiency across any hardware.
A Huawei-hardened founding team
TheStage AI was founded by four PhD-holding friends—Kirill Solodskih, Azim Kurbanov, Ruslan Aydarkhanov, and Max Petriev—who previously worked together at Huawei, where they developed neural network acceleration tech for flagship phones like the P50 and P60.
Their Huawei experience exposed the challenges of manual optimization. At the time, the team built internal tools that cut optimization timelines from over a year to just a few months. One patented algorithm proved critical when Huawei had to pivot from Kirin to Qualcomm chipsets under sanctions, requiring fast neural network adaptation.
That insight—that automation can radically reduce development time without sacrificing performance—led to the founding of TheStage AI.
In collaboration with Recraft.ai, TheStage AI’s ANNA system doubled model performance and cut processing time by 20% compared to PyTorch’s own compiler. This illustrates its edge over existing tools while maintaining full hardware flexibility.
Unlike other platforms that push server-bound inference or tie users to proprietary acceleration tools, TheStage AI serves two distinct audiences:
- App developers looking for pre-optimized, drop-in models
- Model builders seeking granular control over customized AI deployments
This dual-focus strategy sets it apart from competitors like Replicate, Fireworks, AWS SageMaker, TensorRT, and Intel Neural Compressor.
“We’re targeting a niche in the inference market where automation will not only boost performance but also support other platforms that rely on manual workflows,” the company explained.
A future-proof approach to scalable AI
The demand for smarter, faster, and more flexible inference is rising. With Meta committing $65 billion to infrastructure, and inference now accounting for most operational AI costs, the need for streamlined deployment is more urgent than ever.
TheStage AI’s approach—automating inference optimization while supporting open models and diverse hardware—lowers the barrier for widespread AI adoption. As CEO Solodskih puts it, “We’ve created a service that lets developers compress, package, and deploy models to any device as easily as copy and paste.”
Investor Mehreen Malik, who is supporting the team on business strategy, adds: “This is a rare opportunity. The right combination of hardware-agnostic software and execution can redefine how AI gets delivered.”
With its deeptech roots, elite engineering team, and strong investor confidence, TheStage AI is positioned to become a vital pillar in the AI infrastructure stack—automating what has long been a painful, costly process.