Introduction
For the last decade, cloud-native architecture has optimised for stateless execution: ephemeral compute, horizontal elasticity, and infrastructure abstraction. That model remains highly effective for APIs, event handlers, and short-lived workloads. But AI-native systems, especially agentic and long-running ones, this introduces a different set of runtime demands: continuity, memory, supervision, and durable coordination.
Stateless systems worked extremely well
For the past decade, cloud-native architecture matured into a predictable, robust standard. While containerisation initially solved deployment headaches, it birthed a new challenge: orchestration. This friction sparked an evolution in hosting that moved us away from managing ‘servers’ and toward managing ‘intents.’
From the operational rigour of Kubernetes to the abstraction of Fargate, Container Apps, and Lambda, we entered the golden age of the stateless microservice. With request/response as the dominant interaction pattern, we offloaded the hardest parts of distributed systems being scaling, healing, and resource allocation to the cloud provider.
Statelessness was our superpower; it simplified operations to a single metric: can we spin up another instance fast enough to handle the next request? This simplicity allowed us to treat infrastructure as a commodity, but as we move into AI-native systems, that ‘fire and forget’ simplicity is hitting a new wall.
Operational Simplicity: The “Cattle, Not Pets” Standard
The brilliance of statelessness lay in its operational silence. By stripping away local state, we removed the need for complex data synchronisation and “sticky” sessions that used to plague legacy systems. If a container became unhealthy, Kubernetes or Fargate didn’t waste time trying to diagnose or “fix” it; the orchestrator simply terminated the instance and spun up a fresh one. This “cattle, not pets” philosophy meant that system recovery was instantaneous and automated. Engineering teams could sleep through the night because the infrastructure was self-healing by design, governed by simple health checks rather than manual intervention.
Predictability: Deterministic Scaling
This architectural constraint provided a level of predictability that redefined the software lifecycle. Because every request was treated as an isolated, independent event, performance became deterministic. We could simulate high-traffic bursts in staging with total confidence that the 1,000th instance would behave exactly like the first (although this was not always easy). This consistency bridged the gap between development and production, allowing engineers to focus on shipping business logic rather than hunting down the idiosyncratic “ghosts in the machine” that typically haunt stateful environments.
Cost-Efficiency: Scaling to Zero
Financially, the shift to serverless and managed containers turned infrastructure into a true utility. The ability to “scale to zero” meant we finally stopped paying for idle CPUs waiting for work. Whether using Lambda’s per-millisecond billing or the event-driven scaling of Container Apps, cost became a direct, transparent reflection of actual demand. This efficiency allowed startups to run enterprise-grade architectures on a shoestring budget, radically optimising margins by ensuring that every pound spent was tied to a successful request/response cycle.
AI-native systems introduce orchestration pressure
For years, infrastructure dominated architecture discussions, and stateless services defined our runtime thinking. However, the rise of AI-native systems has fundamentally shifted these requirements, introducing a new level of orchestration pressure. Albeit an older problem in new clothes.
Not every AI workload creates orchestration pressure of course. A prompt-in, response-out inference endpoint can still be treated much like any other stateless service. The pressure appears when systems become:
- or dependent on persistent memory across sessions.
- long-running,
- tool-using,
- multi-step,
- human-in-the-loop,
- multi-agent
AI systems introduce six pressures that traditional request/response systems rarely had to solve:
- Reasoning is inherently stateful: Effective AI requires maintaining the context of a problem-solving process.
- Orchestration matters: Coordinating multiple models and tools requires precise oversight.
- Execution is probabilistic: Unlike deterministic code, AI outputs vary, requiring logic to handle uncertainty.
- Conversations are long-running: Interactions often span multiple exchanges, necessitating persistent session management.
- Memory and supervision matter: Systems must remember past interactions and allow for human-in-the-loop or automated oversight.
- Streaming matters: Real-time data flow is essential for the responsive, “live” feel of modern AI.
These shifts have made high-level coordination patterns architecturally vital once again. We are seeing a resurgence in the importance of actors, workflows, event systems, orchestration runtimes, and stateful execution as the backbone of the next generation of software.
This pressure manifests in several critical operational requirements. Maintaining conversational continuity and managing memory ensures the system doesn’t “forget” the user mid-task. Because AI is non-deterministic, robust retries and long-running execution are necessary to see complex goals through to completion, even when a model stumbles.
Furthermore, as systems move toward agency, tool coordination and runtime planning become the glue that connects reasoning to action. Finally, because these models can hallucinate or drift, supervision provides the essential guardrails for reliability.
Ultimately, these are as much orchestration concerns as AI concerns. They represent a shift from simply calling an API to managing a complex, living process.
This marks a definitive departure from the “stateless microservices” era. In that world, we treated every request as an isolated event, offloading state to a database and assuming the network was the only real point of failure.
But in an AI-native world, the “request” is no longer a discrete event; it is a persistent, evolving journey. We can no longer afford to treat the execution layer as a passive pipe. Instead, the runtime must become an active participant; one capable of holding context, recovering from probabilistic failures, and managing the lifecycle of a thought. We are moving from a world of static endpoints to a world of dynamic, stateful agents.
Orchestration is becoming the runtime
Traditional orchestration was largely concerned with infrastructure:
- start containers
- schedule workloads
- scale replicas
- recover failed nodes
AI-native orchestration operates at a different level:
- coordinate reasoning
- manage memory
- invoke tools
- supervise execution
- route work between agents
- recover long-running processes
The orchestration layer is moving up the stack. It is no longer simply deciding where code runs. It is increasingly deciding how work progresses.
The return of durable execution
A phrase I have heard a few times over the years: “failure is not an option.” Ok, so how many 9s? 4 or 5? To be told it must never fail, never go down for maintenance. My fellow architects, I am sure you have had to carefully explain that nothing can be 100% or it’s going to cost you lots of money.
However, AI has changed the stakes. AI systems increasingly behave more like long-running processes than traditional web requests. When a “request” is actually a twenty-minute chain of reasoning involving multiple tool calls and human approvals, a simple network glitch shouldn’t mean starting from scratch. This is where the marketplace has responded with heavy hitters like Temporal, Dapr, and LangGraph.
- Temporal has become the gold standard for durable workflows (this is very cool stuff). It treats the entire execution as a “virtual thread” that can be paused, moved between servers, and resumed months later. If a worker dies mid-thought, Temporal performs execution recovery so seamlessly the LLM never even knows it “tripped.”
- Dapr (Distributed Application Runtime) offers a more modular approach. By using Dapr Workflows and its Actors building block, you get state persistence and resiliency baked into your sidecar. It’s particularly powerful for those looking to build “Durable Agents” that can survive across distributed environments without being locked into a single workflow engine.
- LangGraph shifts the focus to the cognitive loop. While Temporal and Dapr handle the “plumbing” of durability, LangGraph manages the state of the logic—ensuring the agent’s memory and branching paths remain consistent even as the “conversation” evolves.
Durable execution is the ability to persist workflow progress and resume from the last known step rather than restarting from the beginning after failure, pause, or redeployment. AI workflows frequently pause for:
- human approval
- external systems
- asynchronous events
- scheduled execution
In this new architecture, we aren’t just calling APIs; we are managing the lifecycle of a thought. We’ve moved from stateless fire-and-forget to a world where the runtime substrate guarantees that if a process starts, it will finish.
Why actor systems feel very relevant
If you’ve ever worked with actors, for me it is Akka.NET, you know it changes how you think about software. You stop seeing systems as a series of database rows and start seeing them as a collection of living, breathing entities. Three years ago, I created a solution using actors (Akka.NET) in a commercial project; at the time it felt niche but the correct architectural choice for the problem. Today, it feels like a prerequisite for the agentic era.
AI agents are, by definition, autonomous entities. This brings several core actor concepts back to the forefront:
- State Isolation and Ownership: In an actor system, only the actor can touch its state. This is exactly what an AI agent needs—a dedicated sandbox where its specific memory, personality, and “thought history” are protected from the rest of the system.
- Distributed Identity: Actors allow us to treat an agent as a first-class citizen with a unique ID. Whether that agent is helping one user today or a million tomorrow, the system can route messages to that specific “brain” across a cluster effortlessly.
- Supervision and Resilience: My favourite part of Akka.NET was the supervision tree. If a child actor fails, the parent decides how to handle it. In AI, where model outputs are probabilistic, and tool calls often fail, having a hierarchical “manager” that can supervise and recover a “worker” agent is essential for stability.
The Key Insight:
“An AI Agent is essentially an Actor with a Brain.”
While traditional actors use hard-coded logic to respond to messages, AI agents use LLMs to decide their next move. But the packaging remains the same. Frameworks like Dapr Actors or Microsoft Orleans are seeing a resurgence because they provide the perfect “host” for an agent—giving it a permanent home, a mailbox for communication, and the guarantee that its state will survive a crash.
We are moving away from treating AI as a utility function and toward treating it as a distributed population of stateful entities.
Multi-agent systems
The rise of multi-agent systems amplifies these orchestration concerns. Once work is distributed across specialised agents, research agents, planning agents, coding agents, review agents. The platform must coordinate communication, state propagation, supervision, and recovery across the entire population. What appears to be an AI problem quickly becomes a distributed systems problem.
Observability becomes behavioural
Traditional observability focused on:
- latency
- throughput
- error rates
AI-native observability introduces new concerns:
- reasoning traces
- tool execution chains
- memory usage
- agent interactions
- prompt lineage
- decision provenance
We are no longer observing requests, we are observing behaviour. This is another reason orchestration matters.
Kubernetes and the runtime platform
In the stateless era, if a pod crashed, K8s spun up a new one, and we didn’t care because the state was in the database. But as we move toward AI-native systems, we are seeing a fundamental tension: the “stateless” nature of Kubernetes is colliding with the “stateful” needs of AI.
- The Orchestration Gap: Traditional Kubernetes orchestration focuses on availability: is the container running? AI orchestration requires continuity: is the reasoning process still alive? This forces us to move beyond simple deployments toward a more sophisticated use.
- StatefulSets and the Persistence Tax: While StatefulSets provide stable identifiers and persistent storage, they come with a significant operational tax. In my experience with Akka.NET on EKS, managing scalability and handling pod rescheduling without losing the “hot” state of an active agent requires a level of configuration that traditional web-dev teams rarely encounter.
- Operational Implications: We are moving from “cattle” (disposable pods) back toward something more like “pets”—or perhaps “managed entities.” The operational burden shifts from scaling horizontally to managing locality. If an agent is mid-conversation, the orchestrator needs to ensure the “brain” and its “memory” stay close together to avoid the latency of constant database round-trips.
The Key Insight:
“Kubernetes is increasingly being used as the foundation upon which stateful execution engines are built.“
The “runtime substrate” must now do more than just provide CPU and RAM; it must provide a durable home for long-running logic. This is why tools like Dapr are becoming so popular; they abstract that “persistence tax” away from the developer while letting Kubernetes handle the heavy lifting of the infrastructure.
What this means for serverless-first architectures
Serverless is not going away. Fargate, Container Apps, Functions, and Lambda remain excellent solutions for:
- APIs
- event handlers
- integration endpoints
- document processing
- short-lived workloads
The architectural shift is that AI-native systems introduce a second runtime model alongside them. Increasingly we will see hybrid architectures:
- Stateless services remain serverless.
- Long-running reasoning moves to orchestration runtimes.
- Agent memory moves to specialised stores.
- Workflow engines manage durable execution.
- Event systems connect everything together.
The question is no longer: Serverless or stateful? It becomes: Which parts of the system require continuity?
The likely outcome is not the replacement of serverless, but its coexistence with a new generation of orchestration runtimes.
For many organisations, Kubernetes may remain invisible behind services such as Azure Container Apps or AWS Fargate, while specialised runtimes such as Temporal, Dapr Workflows, LangGraph, or managed agent platforms provide the durability and coordination layer above it.
The future is likely hybrid. Not everything needs durable execution. But increasingly, the most valuable AI workloads will.
Closing thoughts
The interesting architectural shift may not be AI models themselves. It may be the return of orchestration.
For the last decade we optimised for stateless execution, treating infrastructure as an interchangeable commodity and reducing software to independent request/response interactions.
AI-native systems are reintroducing concepts many architects thought had become niche: workflows, actors, supervision, durable execution, coordination, and long-lived state. The next generation of platforms will not simply host intelligence. They will orchestrate it.
The architectural mistake would be to treat AI-native workloads as if they were simply another stateless API tier. For many of the most valuable systems, continuity is becoming a runtime requirement, not an implementation detail.
References
https://medium.com/mongodb/why-multi-agent-systems-need-memory-engineering-153a81f8d5be- https://zbrain.ai/stateful-architecture-for-agentic-ai-systems/
Leave a Reply