Investment, infrastructure, inference, and big money players:The direction of ai in 2026

“Skate to where the puck is going, not where it has been” – Walter Gretzky (Wayne’s dad)

Ennui is the word that best describes the end of 2025. It was a trudge to the finish line as investors, weary of the ongoing bubble talk and high valuations, shunned AI developments. The ‘oh wow’ moments had passed and the mood captured something essential: investors had heard the artificial intelligence story so often that even genuine advances were eyed sceptically.

However, 2026 should be a year where the malaise evaporates. Although we are only a few days in, it feels like the train is accelerating down the track towards the points of change.

But first . . .

Some themes for 2026

Autonomous driving has been mostly in the laboratory but now moves further into the foreground as a transport service that quietly acquires market share. Waymo’s expansion into new territories and its deepening presence in cities it already serves underlines that robotaxis are transitioning from pilot programmes to infrastructure. This will not just be for taxis but trucks, with the latter industry estimated to be in excess of US$1 trillion in the United States and US$4 trillion globally, and whose economics and driver shortages needs a viable solution.

Robotics tells a similar story. Warehouses, distribution centres, and factories are ripe for automation and robots are increasingly responsible for moving boxes, assembling sub‑components, and navigating complex environments where humans and machines coexist. This is not glamorous, but it is economically significant. It ties AI to labour costs, safety regimes, and supply‑chain resilience.

Will software remain a place of discontent, with concerns over the role of software as a service (SaaS) and the foundations supporting the companies that supply these services under question? Agents are stuck in first gear!

Overall, there may be a shift in focus towards the hard realities of infrastructure. Data centres, power supply, cooling systems, and specialised chips are no longer abstract backdrops; they are the centre of the conversation. Focusing on how the significant investments scale is the where the future lies, and the battles will be won or lost.

What we knew last year is rapidly changing, and the points in the track will switch. Perhaps not immediately, but it is coming.

From the first AI wave to the economics of reality

To understand what is changing, recall how the first wave of AI took shape. We discovered that graphics processors are remarkably effective at training large neural networks; this saw the inexorable rise of Nvidia. Jensen Huang had the vision and the will to bet the farm, and he invested not just in faster chips but in an entire ecosystem. Nvidia’s CUDA (Compute Unified Device Architecture) gives developers a way to write code to translate complex parallel compute into a manageable construct and then surf the wave of successive hardware generations as they arrive. In other words, Nvidia sold a standard and created a moat – a very wide one.

Twenty twenty-five was about teraflops, process nodes, and memory bandwidth, but what is becoming clearer is the idea that this may just have been the bootstrapping phase where the infrastructure was being laid and the tools refined. What will become more obvious throughout 2026 is that AI is now moving from grand experiment to practical, incessant use; just look at token use at Google and its Gemini app, which now has 650 million monthly active users.

Of course, training remains important: however, it is seeing diminishing returns in a world where all ‘general’ models will be created equal. Likely, and of far greater importance, will be the shift in focus toward inference, the day‑to‑day act of answering questions, generating content, and making decisions in real time. That shift has profound implications for GPUs given they have the potential to increase latency that punishes inference, and inconsistency wastes energy. It is where the total cost of ownership battles will be fought and won.

Against this backdrop, two moves by Nvidia stand out: the launch of the Vera Rubin platform and the acquisition of Groq. They are linked by a common theme: lowering the total cost of ownership of AI while increasing the speed and predictability of inference and doing so in a way that deepens Nvidia’s grip on the ecosystem.

Why GPUs are not the last word in inference

At the heart of this lies a technical but crucial point. GPUs have been extraordinary tools for training AI models. They excel at parallel workloads and where the same operation is applied to large batches of data. But inference is very different, presenting a different set of demands consisting of countless individual or small‑batch requests: a question from a user, a prompt from an application, a sensor reading in a vehicle. These requests arrive unpredictably. Some require strict latency guarantees, where even tens of milliseconds matter; others fluctuate depending on the time of day or suddenly spiking at viral moments. In other words, the goal is no longer to finish a job by tomorrow; it is to respond, right now, at acceptable cost.

Latency matters and it matters more every day. Translating this further, it means that every shift in memory data or cache miss manifests itself as a slower response and a worse user experience. Companies seek to address these risks through their capital expenditure and third-party-provided capacity, which may lead to overprovisioned systems and underutilised hardware at quiet times. Underutilised or not, this still uses power and, with the availability of electrons scarce and prices high, power is the bottleneck in what many see as the AI race to the top.

GPU‑based inference stacks are powerful, but they carry inherent inefficiencies and the metric that matters most for inference: tokens served per watt, per rack, per unit of capital employed are better addressed differently, and this is where the cost‑effective output of Groq’s architecture becomes interesting.

Ever forward-thinking, Nvidia hackquires Groq

It is easy to gloss over Nvidia’s widely reportedly US$20 billion ‘hackquisition’ (a method of securing key intellectual property and talent without formally buying a company) of Groq as just another deal, but it may represent a pivotal moment. Groq is not just another chip company – it is a rethink of what computing might look like by re-imagining large-scale inference through a radical shift in where computational intelligence resides.

Groq’s secret sauce resides in its Language Processing Units (LPUs) which were designed to address inference and the question of latency head-on. Traditionally, compilers translate code, leaving the hardware to handle scheduling, memory access, and execution. Groq inverted that model: its compiler performs all scheduling, memory planning, and instruction ordering ahead of time. Every operation is predetermined. The result is silicon that functions as a deterministic executor, optimised for maximum speed and efficiency. In the pre‑computed plan there is no need for speculative execution or complex runtime scheduling, as the chip is never waiting for the next instruction or the next data fetch.

When you combine on‑chip memory with compiler‑driven determinism, a different economic profile emerges. Wasted energy is reduced and lower thermal loads mean cooling can be simplified, and this translates into significantly more tokens per joule, more queries per rack, and a lower cost to serve each unit of work.

The success (or otherwise) of Groq’s technology within Nvidia is over the horizon, but the dizzying pace of advancement at Nvidia comes with the announcement that Vera Rubin (its next generation of processing) delivers multiple times the inference and training performance of its predecessor, Blackwell, with an emphasis on lowering energy per token. Vera Rubin is not simply about speed; it is a redesign of the racks and the services to them aimed at reducing the overall cost of ownership. This is critical with access to power, the clear pinch point in industry growth and rising energy prices.

CUDA as the glue – and the potential pivot

Crucially, CUDA remains the glue that binds all of this together. Developers have spent years writing to CUDA, building models, tools and products on the assumption that the underlying hardware will continue to evolve but that their software interface will remain broadly stable. Each time Nvidia introduces a new generation – from early GPUs through Grace, Blackwell and now Rubin – the promise is straightforward: your code, but faster. Bringing Groq’s LPU concepts into this ecosystem may change things, not for the developer but for the operator with a step change in the efficiency and economics of inference. For competitors, this would look like the moat is getting wider and one that may prove difficult to match.

Is this fiction?

At first blush this may look like a long bow to draw, but it may also be the dawn of a convergence of ideas where the sum is worth more than the parts. Despite Jensen’s protests and smokescreen to the contrary, putting Groq founder Jonathan Ross at the intersection of Nvidia’s silicon and software roadmaps suggests that the company is at least exploring a harder pivot towards inference‑first design and one that may, in time, relegate GPU‑style architectures to being the training engines and high‑flexibility workhorses in a broader, more differentiated platform.

For investors, the uncomfortable but unavoidable question is crystallising. If inference really does favour architectures that bring most of the meaningful compute and memory onto the chip, what does that imply for the vast installed base of GPU‑heavy clusters? If Rubin‑class racks and future LPU‑inspired systems reset expectations for tokens per watt and per rack, how resilient are the depreciation schedules that have already been extended from three years to five or six? And if Nvidia succeeds in weaving Groq’s ideas into the CUDA universe, does that consolidate an almost unassailable lead in inference or does it raise questions from customers and regulators? Does antitrust become an issue?

These are some of the questions that occupy our minds, and the answers cannot be presented in a tidy box with a nice bow. As time passes and the consequential business of serving trillions of tokens a day marches on, the phase of talking about AI is giving way to the era of living, with its infrastructure, its power bills, and its upgrade cycles.

If, at the end of 2025, it felt as though you more or less understood where the AI train was heading, are you still quite so sure now – and how long can the market’s AI ennui really survive in the face of inexorable change?

Pivotal? Probably, just not today.

Tim Chesterfield is CIO of the Perpetual Guardian Group and the founding CIO and Director of its investment management business, PG Investments. With $2.8 billion in funds under management and $8 billion in total assets under management, Perpetual Guardian Group is a leading financial services provider to New Zealanders.

Disclaimer

Information provided in this publication is not personalised and does not take into account the particular financial situation, needs or goals of any person. Professional investment advice should be taken before making an investment. The information provided in this article is not a recommendation to buy, sell, or hold any of the companies mentioned. PG Investments is not responsible for, and expressly disclaims all liability for, damages of any kind arising out of use, reference to, or reliance on any information contained within this article, and no guarantee is given that the information provided in this article is correct, complete, and up to date.

This article was originally published by the NBR. You can read the original piece here.

You might also like:

Top