What Do I Get as a Paying Customer of an AI API Service?

Small and mid-sized businesses I know are not investing in AI capabilities. They are stable businesses with clients, a steady pipeline of work, and clear signs they will grow that business. They also know they are under AI pressure — on their top line, on their bottom line, and on the question of what their clients will expect from them next. Yet they are not moving. Not even to evaluate. They discuss it. They research it at the edges. They defer the decision to the near future, month after month.

They know something from long experience. The question this article tries to answer is what, exactly, that is.

I am a paying customer of claude.ai (and its app) which is the frontier LLM model available to consumers.

I also pay for access to Anthropic's API. I build products on it. I have live clients using those products. And I subscribe to Anthropic's incident notifications because when the service has a problem, my products could have a problem.

In the sixteen days between April 28 and May 13, 2026, I received more than twenty-five incident notifications. Multiple on some days. Elevated errors on specific models. On some days elevated errors across multiple models simultaneously. Some days the API was unavailable, on others claude.ai unavailable. These are Anthropic's own subject lines, in my inbox, timestamped.

That is what prompted the question in the title of this article. It is a fair question for any paying customer of any service to ask.

What the Incidents Actually Mean for a Builder

Not all incidents are equal, and the difference matters. I will focus on the API in this article.

A full outage is the easier problem. If the API is down, detection is immediate, I can surface a clean error to my users and wait for recovery. Disruptive, but manageable. In the past four months, I have had only two instances wherein potential customers testing my product for purchase were hit by the disruption and output generation was affected towards the end of the job.

Partial degradation is the harder problem. Elevated errors on some requests, not others. A long sequential pipeline — the kind that generates a structured document screen by screen, stage by stage — may be mid-execution before the degradation becomes apparent. Cost has already been incurred. State is uncertain. Recovery is not a clean restart. This is the incident type that costs money directly, not just time.

The incident feed I am looking at contains both types. The partial degradations are the ones that I have to now consider and factor into how I am building on this infrastructure.

The Model I Chose, and the Policy Behind It

I did not choose Anthropic and Sonnet 4.5 by chance. The choice was made after evaluation of OpenAI, Google Gemini, Anthropic, Grok, and others at the capability level for my purpose, then at the model level across all over time, then at organisational transparency and their own business models, and finally against specific requirements of what I was planning to build. I chose Sonnet 4.5 for my products even though the 4.6 was already available with better reasoning capabilities and performance and that was also a deliberate decision. My intention was (and is) to migrate to the latest model at the earliest since it represents the best capability within a model family.

This is where using claude.ai becomes very useful as a decision factor for what model I use on Claude API calls. Throughout the outages I could see how performance degradation affected my personal projects on claude.ai. But I had an advantage because I could mitigate, query and persevere to get the quality output required. It is not the same on my platforms that have multiple agents calling Claude API — the systems are AI-native and meant to work independently to provide outputs that humans can then work on. Which is why on products with live clients, I run on latest-minus-one. The newest model is being stress-tested at scale in real production environments across thousands of applications simultaneously. The model one generation behind has had a cycle to demonstrate whether it stabilises. A client-facing disruption caused by model instability is not the same kind of problem as a delayed feature. One erodes trust in ways that are hard to recover.

That policy was built on an assumption: that a model would not be retired before its replacement had demonstrated production stability. It seemed like basic stewardship of a developer ecosystem.

The incident feed is asking me to revisit that assumption.

The Deprecation Calendar and the Stability Curve

Anthropic's model lifecycle process gives sixty days notice before retirement for publicly released models. Claude Sonnet 4.5 has a retirement floor of September 29, 2026. That is the documented commitment. Details on model deprecations are at: platform.claude.com/docs/en/about-claude/model-deprecations

What the documentation does not address is the relationship between when a model is retired and when its replacement is actually stable. These are treated as independent questions. The deprecation date is set on one timeline. The stability of the model you are being asked to migrate to is a separate matter.

From where I sit, watching the incident feed, the models generating the most notifications right now are Sonnet 4.6 and Opus 4.7 — the current generation, the migration targets. The model not appearing in subject lines is the one I am being asked to move away from.

The deprecation calendar and the stability curve are not aligned. I am being asked to migrate on a schedule set by a release calendar, not a readiness assessment. For a builder with live clients, that gap is a real planning risk, not a theoretical one.

A solo builder's desk at night — code on the monitor, invoice in the foreground, steam rising from a cup. The infrastructure question made personal.

What Enterprise Tolerance Does and Does Not Tell Us

Large enterprises are running production workloads on these models. That is sometimes offered as evidence that the infrastructure is production-ready. But is it?

It is worth being precise about what enterprise tolerance actually reflects.

A large enterprise has on-call engineering capacity — at the very least. When an incident fires, someone is paged. They investigate, implement workarounds, document. The cost is absorbed into headcount. It does not appear in the public case study or the transformation narrative, because acknowledging a standing function dedicated to managing upstream provider instability is not consistent with the story being told to boards and shareholders.

The independent builder — like me — on the standard API tier has no such buffer. The incident hits the product directly. The client sees it. The cost is immediate.

Enterprise tolerance for this incident frequency tells us that large organisations can absorb costs that smaller builders cannot. It does not tell us that the frequency is acceptable. Those are different things.

What I Am Left With

I chose a provider deliberately. I am paying for access. I have built on it carefully. And the question I cannot yet answer cleanly is whether the infrastructure I depend on is stabilising or whether this is the operating baseline I should be planning around.

I am not saying the models are not capable — they are. What I am saying is that the gap between what "production-grade API access" implies and what the incident feed reflects is wide enough to be a genuine planning risk for anyone building real products on this infrastructure.

The burden of managing that risk — tracking the incident feed, timing migrations around stability rather than deprecation pressure, absorbing the cost when the two don't align — sits with the builder. It arrived quietly, without negotiation, as a condition of using the service.

Whether that is the right distribution of responsibility in a paid API relationship is the question I am asking.

The question of responsibility has a practical consequence. It is the main reason I started developing deeper experience on AWS infrastructure and implementation models. One of my products now has an entire instance running on AWS infrastructure — I monitor it specifically to understand whether a different infrastructure layer changes the outage exposure, the cost profile, and the end-user experience.

A Note on Architecture

When I started building my products, I adopted what I call 'Constrained Multi-Agent Architecture'. The approach is specifically crafted for building on generative AI keeping in mind various issues with an evolving technology and it addresses part of the problem. By treating the model as a constrained execution engine — specific task, defined parameters, bounded output — rather than an open-ended reasoning layer, I build resilience to model quality variance directly into the system. When I migrate from one model to another, the contract the model operates under does not change. The migration risk narrows from a broad behaviour question to a specific execution question.

I built my products this way. It was a deliberate decision made early, and it has served the products well.

Constrained architecture addresses model quality risk but I also built the framework to address infrastructure risk to an extent, but I have to extend the scope to make infrastructure a core primary risk to be managed through the architecture. An outage or a partial degradation hits the API regardless of how the model is being used.

Good architecture can absorb a great deal. It should not have to absorb the consequences of a provider's lifecycle decisions being made on a calendar that does not account for the builder's exposure in the gap.

Constrained architecture also operated on the same trust assumption — that deprecation would follow demonstrated stability, not precede it. That assumption is now being tested. The response to it, from an architectural standpoint, is tighter migration planning and earlier testing cycles. That is additional overhead that a well-managed provider lifecycle would not require.

The question for me now is not the capability of the model, but whether the model will continue to be available in a stable manner with its full capabilities for a product to generate high quality at all times.

And that is the real challenge of adopting AI into a business life cycle.

SMBs instinctively suspect this risk of unknown and emerging time and cost escalations that come with AI technologies. They tend not to adopt AI as a business policy even though they know their business requires it. They keep pushing the AI adoption decision to the near future while continuing to use Frontier LLMs and discussing it month on month. A conservative and stable approach in the face of constant disruption. Whether such businesses will be immune to their clients changing their requirements is a question they have to answer for themselves.