
When Behnam Bastani began working on inference systems at Meta and Roblox, he noticed the same pattern emerge over and over. Whenever his teams brought compute closer to where data originated, results improved immediately. Costs fell, responsiveness increased, and entirely new product capabilities became possible. This insight would later become the foundation of OpenInfer, the company Bastani now leads as cofounder and CEO.
OpenInfer is building an inference platform designed for a world where most data will not live in the cloud. Robotics, drones, satellites, connected devices, and industrial systems continuously generate information at the physical edge. Bastani believes this shift demands a new approach to how AI runs at scale.
“We realized in 2022 and 2023 that physical AI is going to generate huge amounts of data on the edge,” he said during our conversation. “You cannot shove all that data to the cloud. Compute has to move to where the data is. That requires a new game.”
Why the cloud is not enough
Modern inference frameworks were created for large batches, large models, and uniform hardware inside data centers. Edge environments look nothing like that. Devices vary widely. Workloads are small and real time. Connectivity is unreliable. Yet customers still expect the same accuracy and capability they get from cloud AI.
“People are not going to give up accuracy when they come to the edge,” Bastani said. “Today the world says, make the model smaller to fit the device. But you lose quality. Customers want the same level of capability, plus lower cost, greater control over data, or better responsiveness.”
According to Bastani, this gap between expectation and reality is one of the biggest misconceptions in the industry. Enterprises often assume that running AI locally means accepting degraded performance, when in fact the true limitations come from legacy architectures that were never designed for heterogeneous, distributed environments.
Full stack engineering for a new paradigm
OpenInfer believes that solving the edge inference problem requires rethinking the entire stack. Optimizing one layer at a time is not enough.
“It requires full system integration,” Bastani said. “Hardware, software, the operating system, the kernel, networking. Everything needs to be rethought. Other companies work at one layer. But to bring a new paradigm to the edge, the whole stack must be redesigned to work together.”
This philosophy explains why OpenInfer is often compared to an operating system rather than a deployment tool. The platform not only runs inference, but also schedules, moves, and rebalances operations in real time based on available compute and data flow. It treats edge environments as dynamic, multi-device systems rather than isolated endpoints.
“When we talk about hybrid inference, others talk about deployment. We talk about distribution,” Bastani said. “We run inference while constantly adjusting compute during the cycle itself. That is the difference.”
Who needs OpenInfer today
Bastani breaks potential customers into three groups. The first requires convincing. The second thinks the technology would be nice to have. The third says something else entirely.
“Some customers say, if I do not have this, I die,” he explained. “Those are the customers we start with. They are losing market share or they cannot deliver their product because cloud dependence is blocking them. When they are restricted by cost, latency, or data control, that is an immediate signal they are ready.”
Examples include robotics companies dealing with real time movement, industrial systems that cannot tolerate latency, or use cases where sensitive data cannot leave physical environments.
The technical breakthrough ahead
Bastani believes that the biggest leaps in inference performance over the next few years will come from advances in memory design. Even in cloud environments, memory bandwidth and footprint remain the main bottlenecks for large models. Companies like Groq, Cerebras, and Nvidia are already addressing this. OpenInfer is preparing for the same shift at the edge.
“We are unblocking memory constraints with our own hardware reference design and software stack,” he said. “Over time more hardware vendors will adopt these capabilities.”
In other words, the next wave of intelligence at the edge will depend not only on better models, but also on better orchestration of how memory, compute, and data interact inside constrained environments.
Building for enterprise and government scale
OpenInfer raised eight million dollars in early 2025. The company is investing in three priorities. The first is hiring. The second is deepening relationships with strategic partners and customers. The third is ensuring its tooling meets the highest expectations of enterprise and government deployments.
“We are going after high end enterprise and government practices that require the next level of quality,” Bastani said. “Their bar is much higher than consumer space.”
This focus reflects OpenInfer’s ambition to become the default platform for any environment where AI must run at scale without relying fully on cloud infrastructure.
OpenInfer is not trying to shrink models or replicate cloud tools in smaller form. It is rebuilding the infrastructure needed for AI to live where data originates. That vision requires deep systems engineering, meticulous communication, and careful category creation.
For Bastani, the opportunity is clear. As more intelligence moves into the physical world, the companies that bring high quality inference to the edge will shape how billions of devices think and act.
OpenInfer wants to lead that movement.






