Armen Fljyan_OrionVC_Oct ‘25
_The Foundations of Defensibility
Defensibility is about building lasting advantages that shield a business from rivals. In the digital world, this typically comes from four sources: economies of scale (Amazon), brand loyalty (Apple), embedding into workflows (SAP), and network effects (LinkedIn). Network effects reshaped the internet era, with platforms like Facebook, Uber, and Airbnb thriving by creating self-reinforcing loops where each new user increased value for others.
_The Emergence of Data Networks Effects in AI
AI introduced a new moat: data networks. Unlike user-driven network effects, these relied on data scale to improve performance. Tesla illustrates this: every mile driven by its fleet generates data that strengthens its autonomous algorithms, attracting more drivers and reinforcing its edge. Early AI companies leaned heavily on such loops, especially in areas like natural language processing and recommendations.
Diminishing Returns in Data-Driven Defensibility
But the rise of foundational models like GPT-3 and GPT-4 changed the game. Instead of building proprietary models, companies could fine-tune pretrained ones with far less data. While a few hundred quality samples can boost performance dramatically, beyond a few thousand the gains flatten out. At that point, the cost of extra data outweighs the benefit—weakening the moat of data scale.
_Toward Agentic AI: A New Layer of Defensibility
This brings us to the next frontier: agents. Foundation models gave us a leap in generalization—but they are static. Once trained, they don’t improve. That makes them poorly suited for workflows requiring long-horizon reasoning, adaptation, and error recovery. Today’s agents still break easily. They can’t debug, replan, or persist. Prompting alone isn’t enough to teach real-world behaviors. What’s missing is memory, feedback, and the ability to learn from interaction—capabilities reinforcement learning provides.
Right now, reinforcement learning in AI looks like the pre-GPT-3 era: narrow tasks, brittle fine-tuning, no scale. But that is changing quickly. Dozens of startups are emerging to serve internal teams at model providers, building the new substrate: simulated work environments, autograded evaluations, and scalable, economically meaningful task distributions. Think of them as training gyms for agents.
Labs are paying close attention. Just as pretraining required billions of tokens, generalist agents will need millions of interaction traces across the workflows that dominate labor spend-email triage, customer support, form-filling, ticketing, enterprise coordination. These traces become the raw material for building adaptive, resilient systems.
For startups, the implications are clear: brittle, hardcoded flows won’t hold. As generalist models trained through RL-as-a-Service improve, the next defensible layer will be reward design, eval engineering, and task simulation. The model matters, but the environment - the lived context where the agent learns is what makes it useful.
Reward design defines what “good” looks like in a workflow, capturing outcomes like accuracy, efficiency, and user satisfaction. Eval engineering builds the continuous tests and benchmarks that keep agents reliable at scale. Task simulation creates rich, safe environments where agents can practice messy, real-world scenarios. Together, these layers shape behavior more than the model itself—the environment becomes the true source of defensibility.
And this is where the true defensibility of the agentic era emerges. In traditional B2B SaaS, switching costs are high but finite: retraining users, migrating data, and adjusting processes. Painful, yes, but ultimately solvable. By contrast, replacing an RL-trained agent that has been embedded in a company for years means discarding a living system that has absorbed thousands of edge cases, refined its behavior through continuous feedback, and accumulated a body of tacit organizational knowledge. A new agent would need to start over from scratch,an exponentially harder barrier to overcome.
Over time, RL-as-a-Service will shift from being a novel capability to a core utility—making RL agents as necessary to companies as electricity itself.
Should you have any further questions, let’s talk. You can get in touch or follow me on Linkedin