Skip to main content
Five AI Infrastructure Trends -- Too Boring for Tech Blog, Too Critical to Ignore in 2026

Five AI Infrastructure Trends -- Too Boring for Tech Blog, Too Critical to Ignore in 2026

·2108 words·10 mins

I spend way too much time reading papers, talking to people running AI in production, and watching what actually breaks versus what just generates conference buzz. Not because I’m working on all of this directly –no one could –but because pattern recognition is how you separate signal from noise in this industry.

And here’s what I’m seeing in late 2025: while everyone’s still breathlessly tweeting about the next frontier model or arguing about AGI timelines, there’s an entire infrastructure layer quietly emerging that will determine which companies actually succeed at AI scale. Not the most exciting stuff. May not get you invited to keynote. But absolutely critical.

These are the trends worth following if you’re trying to build real AI systems that need to, you know, actually work consistently and not bankrupt you.

The Unglamorous Truth About AI in Production
#

Let me start with something I keep hearing from colleagues: “Hey, we got an award from OpenAI for spending zillion tokens!” They share this like it’s funny, but underneath there’s genuine frustration. It’s not something to be proud of – it’s a fortune they’d rather not be spending. Everyone talks about switching to self-hosted models, doing distillation, localization, and reducing dependency on the big players.

But here’s the reality check: For most companies, especially at the SME level, becoming truly independent from OpenAI, Anthropic, or Google is genuinely impossible. The expertise, infrastructure, and capital requirements are prohibitive. Which means managing AI-related finances isn’t some optional nice-to-have –it’s becoming as fundamental as managing your cloud bill, except the dynamics are completely different and the tools barely exist.

This tension between what we’d like to do and what we can actually afford to do – runs through every trend I’m tracking. Let’s walk through them.

1. AI Observability: Everyone’s Building Their Own Wheel
#

The observability space for AI is fascinating because despite all the venture money flowing in (Observe Inc just closed $156M Series C), most companies I talk to are still building custom solutions. They’re experimenting, trying to keep it safe, making sure their models produce acceptable results for daily business operations.

And there’s no obvious leader yet. The reason? Current observability tools are genuinely lagging what the market demands. We’re seeing this explosion of AI observability startups–49 just in the latest Y Combinator batch alone–but here’s what people aren’t saying: we’re probably never going to see a universal platform that handles every modality and use case.

Think about it. The monitoring needs for a retail recommendation system are completely different from monitoring LLMs handling customer communications or analyzing support calls. The usage patterns, the data inputs and outputs, the metrics that actually matter, the consumption patterns –they’re all fundamentally different.

So what I expect to see over the next few years isn’t one winner-takes-all observability platform. It’s going to be a wide variety of tools and technologies emerging, each specialized for specific sub-sectors and sub-domains of machine learning. One for computer vision applications. Another for conversational AI. Different tools for recommendation engines versus forecasting models.

This isn’t sexy. It’s not the kind of thing you sell to your board by saying “we’re investing in AI observability!” It’s backend infrastructure. But with all the SRE and MLOps practices that have taken it’s place, it’s becoming more and more obvious that this is actually worth investing in – even if it doesn’t make for great demo material.

The companies that figure this out early, before their AI systems are sprawling and glitchy, will have a massive advantage. The ones that don’t? They’ll be debugging drift in production while their CFO asks why the AI bill just hit six figures.

2. Compound AI Systems: Beyond the Buzzword Bingo
#

Speaking of buzzwords, can we talk about the social media saturation around AI-first, AI-driven, AI-aware, AI-native, and whatever other prefix we’re slapping on things this week? People are debating these terms endlessly without coming to any real conclusions.

But here’s what’s actually emerging that matters: the need to orchestrate multi-component systems where AI agents work alongside traditional architecture. Not “agentic” as some binary property. Not multi-agent versus single-agent as some philosophical debate. But the practical challenge of making non-deterministic AI components cooperate with relational databases, message queues, APIs, and all the classical deterministic systems we’ve built for decades.

Berkeley coined “compound AI systems” in February 2024, and the term is still barely established. But look at what’s actually happening in production: 60% of LLM applications are already using compound patterns like RAG or multi-step chains. FactSet improved their accuracy from 55% to 85% by combining text-to-formula generation with validation models. This isn’t theory–it’s what’s working.

The challenge isn’t whether to use compound systems. It’s how to orchestrate them in a robust, predictable way when parts of your system are fundamentally unpredictable. How do you maintain SLAs when one component is a deterministic SQL query and another is an LLM that might return different results every time?

This is where I see the real innovation opportunity: not in building bigger models, but in building better orchestration layers that let deterministic and non-deterministic components actually work together reliably. And the companies that figure out practical patterns for this–not academic papers, but battle-tested implementation guides–will be providing massive value.

3. FinOps for AI: The Reckoning Is Coming
#

Remember when we thought cloud costs were unpredictable? AI spending makes traditional cloud FinOps look quaint. The average AI spend jumped from $63K/month in 2024 to a projected $85K/month in 2025. That’s 36% growth in one year. CloudZero reports 45% of organizations will spend over $100K monthly on AI in 2025, up from 20% in 2024.

But here’s the kicker: 51% of companies can’t even calculate their AI ROI. The costs are invisible until suddenly they’re not. One runaway training job, one accidentally deployed inefficient prompt, and you’ve blown through your quarterly budget.

What’s wild to me is that despite this obvious, urgent pain point, there are essentially no dedicated VC-funded startups purely focused on AI FinOps. The FinOps Foundation is launching their “FinOps for AI” certification in March 2026, which means we’re at that perfect moment where the category is being defined but not yet saturated.

And this connects back to that earlier point about token spending: for most companies, reducing dependency on the big AI providers isn’t realistic. Which means financial management of AI operations becomes absolutely critical. You need to know: Which features are driving token usage? Which teams are burning budget? What’s the actual cost per business outcome?

The danger I see – and this is where the over-engineering risk really kicks in–is companies building elaborate FinOps systems before they’ve figured out if their AI actually delivers value. There’s this temptation at innovative companies to be at the bleeding edge, to have the most sophisticated cost attribution system, the most granular tracking. But if you’re still figuring out product-market fit, you don’t need a Ferrari of FinOps systems. You need a bicycle that works.

4. Governance: Regulations Moving Faster Than Readiness
#

It’s clear that we’re not yet at the point where every startup and small agency is drowning in compliance requirements. But given the pace of regulation, especially in the EU, that’s coming. The EU AI Act provisions become applicable in August 2026. California’s AI Transparency Act kicks in January 2026. Multiple states have their own regulations emerging.

What I find interesting is how fragmented the market is right now. There are 9+ funded startups in the governance space–Credo AI with $41M, Darwin AI with $15M, others with smaller rounds–but the market is projected to explode from $551M in 2024 to $16.6B in 2034. That’s a 40% CAGR, which tells you this is becoming mandatory infrastructure, not optional tooling.

But here’s my observation: the tools and standards that emerge at AI application layer are going to be heavily influenced by regulations, audit requirements, and accountability needs rather than actual data needs themselves.

The governance layer is going to get much more complex, driven primarily by compliance rather than technical necessity. Companies that get ahead of this – building governance into their architecture from the start rather than bolting it on later –will have a significant advantage when the regulatory hammer drops.

5. Data Contracts: The Foundation Everyone Forgets
#

And this brings us to data contracts and data quality. If compound AI systems are the architecture and observability is the monitoring layer, data contracts are the foundation. You can’t have reliable AI without reliable data, and you can’t have reliable data at scale without formal agreements between producers and consumers.

The data mesh movement has been pushing this idea since maybe 2019, and it’s finally starting to stick. Not because it’s elegant– though it is – but because the pain of NOT having contracts is becoming unbearable. Monte Carlo’s surveys show data quality risks evolving faster than management capabilities. Unity Software famously lost $110M ingesting poor quality data. IBM estimates $3.1 trillion lost annually to poor data quality.

For AI specifically, garbage data doesn’t just create bad reports. It causes hallucinations, biased decisions, catastrophic model failures. And as we move to these multi-tenant compound systems I mentioned earlier, we’re going to need more and more internal interconnections. More APIs between components. More data flowing between deterministic and non-deterministic parts of the system.

All of that requires stability and traceability. We need to ensure those non-deterministic bits – the AI models – still operate within needed ranges, within acceptable quality standards. Data contracts provide that guarantee.

What I’m seeing is that this trend is still in early stages in 2025, but the templates are converging. Schemas, SLOs, SLAs, quality metrics, ownership, versioning. The Open Data Contract Standard is gaining traction. The tooling ecosystem is forming with Data Mesh Manager, dbt integrations, Monte Carlo, Great Expectations.

The question isn’t whether your organization will eventually adopt data contracts. It’s whether you’ll do it proactively or after a major data quality incident forces your hand.

Why These Trends Actually Matter (Despite Being Unsexy)#

Here’s the pattern I’m seeing across all five of these areas: institutional money and attention are flowing – VC funding, Gartner reports, conference tracks – but public awareness and quality content remain surprisingly low.

That gap between institutional validation and mainstream awareness is exactly where the opportunity sits. Not just for content positioning (though that’s real), but for companies building actual products and internal capabilities.

These trends all sit at what I’d call the “early practitioner adoption” phase. They’re beyond bleeding edge where everything breaks, but before mainstream saturation where best practices are well-established. Berkeley coined “compound AI systems” in February 2024–barely a year ago. The FinOps for AI certification launches in a few months.

This is the window where getting it right matters most.

And here’s the thing that connects all of them: they’re addressing urgent operational crises right now. Companies aren’t exploring these areas out of curiosity. They’re experiencing cost spirals, compliance deadlines, production failures, and governance gaps that threaten their entire AI strategy.

But – and this is critical – there’s a real risk of over-doing things here. Especially at innovative companies that pride themselves on being at the cutting edge. The temptation is to build the most sophisticated observability stack, the most complex compound system architecture, the most detailed FinOps attribution model before you’ve proven that your AI actually delivers business value.

These infrastructure layers are crucial, but they’re in service of delivering value, not value in themselves.

What I’m Watching For
#

As I continue tracking these trends through 2026 and beyond, I’m particularly interested in a few specific developments:

Will we see consolidation in the observability space, or will it fragment into specialized tools by modality as I expect?

Will the compound AI systems orchestration layer become commoditized through open source, or will proprietary platforms capture most of the value?

When the FinOps for AI certification launches, does it actually influence how companies approach cost management, or does it remain mostly theoretical?

As governance regulations take effect, do we see a wave of compliance-focused startups, or do existing platforms add compliance as features?

And for data contracts, does the Open Data Contract Standard actually achieve adoption at scale, or do we end up with competing proprietary implementations?

These aren’t rhetorical questions. The answers will shape how AI infrastructure evolves over the next few years.

What I can say with confidence: the companies that invest in this unglamorous infrastructure layer now–even though it’s not what gets celebrated on social media–will be the ones actually running AI successfully at scale in 2027 and beyond.

The sexy stuff gets the headlines. The infrastructure layer gets the revenue.