May 26, 2026

Why CPUs matter more in agentic AI systems

  • AMD says CPUs are central as AI moves toward agentic reasoning.
  • CPUs now manage coordination that keeps GPUs in use.

For much of the past two years, the AI conversation has revolved around GPUs. They are often treated as the main driver of progress, especially as models grow larger and more complex. But as enterprises begin to deploy agentic AI and advanced reasoning systems, another part of the stack is drawing closer attention: the CPU.

The shift is less about raw performance and more about coordination. As AI systems move from single-model inference to multi-step reasoning and agent-based workflows, the CPU increasingly determines how well those systems run in practice.

That point was echoed earlier this year at AMD Advancing AI 2025where OpenAI CEO Sam Altman noted that advanced reasoning models “need tons of compute, tons of memory, and tons of CPUs as well.” The comment reflected a broader industry view that scaling AI now depends on how well different compute layers work together.

In an interview with Alexey Navolokin, General Manager, APAC, at AMD, he described how the CPU’s role has expanded as agentic AI moves from theory to deployment.

The CPU as the control layer for agentic AI

“As agentic AI ramps up, millions of agents are accessing and doing productive work on compute resources at unprecedented speed with each agent interacting with data sources, tools, and other agents, generating constant streams of CPU-driven operations,” Navolokin said.

In these environments, the CPU does far more than host workloads. It manages coordination across the system. In modern AI clusters, CPUs handle scheduling, data movement, and synchronisation, ensuring GPUs remain busy rather than waiting on inputs.

“At the same time, in modern AI clusters, the CPU acts as the synchroniser and orchestrator, feeding data to GPUs, launching kernels, and managing inference schedules to keep accelerators fully utilised,” he said. “High IPC and high-frequency CPU cores directly raise cluster-level performance by ensuring every GPU cycle delivers value.”

The effect is visible at the system level. Poor CPU balance can leave expensive GPUs underused, while stronger host performance can lift throughput without adding more accelerators.

“In other words, the CPU is no longer a supporting actor, it’s the command layer that enables the full system to think, decide, and act at speed,” Navolokin said.

Cost pressure is changing how AI systems are designed

As AI deployments scale, cost has become harder to ignore. Many organisations still plan capacity around GPU counts, but this approach can lead to overspending if the rest of the system is not matched to the workload.

“Scaling AI isn’t about one chip type – it’s about an integrated, end-to-end system where each compute engine does what it does best,” Navolokin said.

He pointed to a change in workload patterns. AI systems increasingly rely on multi-step workflows, real-time decisions, and continuous data movement rather than isolated inference tasks. This creates sustained demand across CPUs, memory, and networking.

“The role of the CPU in this system is indispensable – it delivers accessibility, utilisation, locality, and memory bandwidth necessary to run workloads at scale,” he said.

For organisations preparing for multi-agent systems, Navolokin described “CPU readiness” as having server-class host processors capable of coordinating services, feeding data efficientlyand acting as control nodes in distributed environments.

“A high-performance host CPU ensures GPUs stay fully utilised, delivering lower inference latency, higher throughput, and better overall AI efficiency,” he said.

Memory, I/O, and synchronisation matter more than ever

As AI systems draw on larger datasets and faster pipelines, traditional bottlenecks are shifting. Memory capacity, bandwidth, and I/O performance now shape how quickly systems respond.

“Two CPU characteristics are essential for peak inference performance: high memory capacity to reduce bottlenecks and high core frequency to keep AI pipelines flowing,” Navolokin said.

This becomes more pronounced as enterprises deploy mixed workloads, where AI inference runs alongside traditional enterprise applications. In these cases, CPUs must handle both efficiently without forcing organisations to split infrastructure.

Navolokin highlighted AMD EPYC processors as an example of how CPU design is adapting to these needs, pointing to high core counts, memory bandwidth, and support for both AI and non-AI workloads. He said this allows organisations to run smaller AI deployments efficiently on CPUs alone, while also supporting large-scale environments where GPUs drive performance.

For enterprises, the takeaway is less about specific specifications and more about system balance. Underpowered host CPUs can limit the return on GPU investment, while well-matched architectures can extend the useful life of existing hardware.

Open platforms as a scaling strategy

Beyond hardware, Navolokin emphasised the role of software and standards in making AI systems easier to scale and maintain.

“The AMD open platform approach gives businesses and developers the freedom to build, scale, and deploy AI with few barriers,” he said.

He pointed to ROCm as a way to support common AI frameworks without locking organisations into proprietary stacks. Access to tuning, customisation, and distributed inference is becoming more important as AI workloads vary across teams and regions.

This openness also extends to networking and interconnects. Navolokin discussed AMD’s involvement in open standards such as UALink and the Ultra Ethernet Consortium, which aim to support larger, more flexible AI clusters.

By focusing on open ecosystems, he said, organisations gain more control over how systems evolve, rather than tying future growth to a single vendor roadmap.

Balancing cloud, edge, and on-prem AI

For CIOs managing AI across multiple environments, Navolokin offered two broad principles: openness and distribution.

An open platform gives teams room to adapt as workloads change, while distributed compute helps place inference closer to where data is generated. Not all AI workloads belong in central data centres, particularly when latency, energy use, or data privacy are concerns.

“While centralised infrastructure is ideal for training large models, real-time inference often runs best on AI PCs or edge devices, closer to the data source,” he said.

This approach can reduce costs and improve responsiveness, especially as AI becomes embedded in everyday business processes.

Preparing for the next phase of AI adoption

Looking ahead, Navolokin said enterprises should focus less on individual components and more on how systems fit together across environments.

“As AI inferencing becomes embedded across enterprise environments, the priority for IT leaders should not be on the performance of a single compute resource but rather how infrastructure strategies must evolve to support deployment across a diverse range of systems,” he said.

He pointed to AMD’s plans for an integrated rack-scale architecture, codenamed “Helios,” expected in 2026, as an example of how vendors are aligning CPUs, GPUs, networking, and software around this idea.

For enterprises, the broader message is clear. As AI systems grow more complex, the success of those deployments will depend less on any single accelerator and more on how well the full stack is designed to work as one.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology eventsclick here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

TNG – Latest News & Reviews