Nebius AI Platform: Microsoft & AWS Challenger?
Nebius' Token Factory: A Real Threat to AWS or Just Another AI Cloud?
Nebius, formerly a piece of Yandex, just rolled out Token Factory, a platform for deploying AI models. The claim? Enterprises can now deploy and optimize both open-source and custom models at scale. They’re touting support for over 60 open-source models, including the big names: DeepSeek, OpenAI's GPT-OSS, Meta's Llama, Nvidia's Nemotron, and Qwen. And current Nebius AI users get an automatic upgrade.
The pitch is classic cloud provider: sub-second latency, autoscaling throughput, and a promised 99.9% uptime. Nebius is aiming for workloads scaling to hundreds of millions of requests per minute. Ambitious, to say the least.
The Cloud Battlefield: Who's Really Competing?
Nebius CEO Roman Chernin says they want to be a large enterprise, not just a utility. Okay, but who are they really up against? The press release names AWS, Azure, and GCP – the usual suspects. But it also throws in Fireworks and Baseten. This list tells you something: Nebius isn't just after the broad cloud market, they're specifically targeting the AI model deployment niche. Fireworks and Baseten are specialists, not generalists. This isn't a head-on assault on Amazon; it's a flanking maneuver.
The interesting piece is the open-source angle. While AWS, Azure, and GCP all offer ways to deploy open-source models, they're not necessarily optimized for it. Token Factory's focus on this area gives it a potential edge, especially if they can deliver on the performance claims. The option for customers to host their own models is also a smart move, catering to enterprises with strict data privacy requirements.

Here's the rub: can they actually compete on price and performance? Nebius sells AI cloud capacity from data centers in the US, Europe, and Israel. Infrastructure costs are infrastructure costs. Unless they've got some secret sauce in their data center design or energy procurement (details on this remain scarce, unfortunately), they're playing the same game as everyone else.
The Million-Dollar Question: Execution
Token Factory promises sub-second latency and 99.9% uptime, even at hundreds of millions of requests per minute. These are big claims, and the devil is always in the details. What's the average latency? What's the P95 latency (the latency experienced by 95% of requests)? These are the metrics that matter to enterprises running real-world applications. Without this data, it's just marketing fluff. (I've seen too many providers hide behind averages that mask significant tail latency issues.)
And this is the part of the report that I find genuinely puzzling: the lack of independent benchmarks. Nebius could easily publish a head-to-head comparison against AWS SageMaker or Azure Machine Learning, showing Token Factory's performance on a standard set of models and datasets. The absence of this data is telling. Are they afraid of what the numbers will reveal? Or is it simply a matter of not having the resources to conduct a proper benchmark?
The fact that current Nebius AI users will be automatically upgraded to Token Factory is a positive sign, suggesting a smooth transition. But it also raises a question: how many users are we talking about? Is this a large-scale migration, or a relatively small group of early adopters? The scale of the existing user base will be a key indicator of Token Factory's initial success. According to a recent report, Nebius takes on Microsoft and AWS with new open-model AI platform: All you need to know - livemint.com, Nebius is directly challenging established cloud providers.
A Fight Worth Watching, But Don't Hold Your Breath
It’s easy to get caught up in the hype around new AI platforms, but it’s vital to maintain perspective. Nebius' Token Factory is an interesting development, but it's not a guaranteed AWS killer. The success of Token Factory will depend on execution, transparency, and a willingness to compete on more than just marketing promises. Show me the benchmarks, and then we'll talk.





