May 18, 2026

DeepSeek Proposes Workaround for AI Training Before V4 Release

  • New technique lets AI models process information and reduce reliance on GPU memory.
  • Chinese AI companies ‘innovate around’ US chip export restrictions.

Chinese AI startup DeepSeek has unveiled a new approach to building larger, more capable AI models without needing the most advanced – and expensive – computer chips that US export controls have restricted.

The technique, detailed in a technical paper published Tuesday by DeepSeek founder Liang Wenfeng and researchers from Peking University, tackles a fundamental problem: AI models are getting so large that they’re bumping up against the memory limits of even the best graphics processing units (GPUs).

Think of it like trying to work with a massive spreadsheet on a computer with limited RAM. DeepSeek’s solution, called “Engram,” essentially creates a more efficient filing system that lets the AI store basic facts separately from complex calculations – freeing up precious computing power for the harder thinking tasks.

Why chip memory matters

The challenge isn’t just about raw computing power. Modern AI models need to access vast amounts of information quickly during training and when responding to queries. That requires high-bandwidth memory (HBM) – specialised, fast-access memory built into advanced GPUs.

The is where China faces a significant disadvantage. According to Ray Wang, a Seoul-based analyst at SemiAnalysis cited in the South China Morning PostChina’s leading memory chip manufacturer ChangXin Memory Technologies remains several years behind industry leaders like Samsung, SK Hynix, and Micron – despite making steady progress.

How the breakthrough works

Traditional AI models handle everything through computation – even retrieving simple, basic information. The researchers argue this wastes processing power on “trivial operations” that could be better used for complex reasoning.

Engram changes this by letting models “look up” foundational facts more efficiently, similar to how humans might consult a reference book for basic information not recalculating it from scratch each time.

The technique also helps AI handle longer inputs – what the industry calls “long context” – which remains a major obstacle for deploying AI chatbots as practical assistants in real-world applications.

Testing the approach on a 27 billion parameter model, the researchers reported performance improvements of several percentage points on major industry benchmarks, while crucially preserving more capacity for computationally demanding tasks.

The timing of the paper’s release is notable given widespread industry speculation about a major DeepSeek model launch ahead of the Lunar New Year. US tech publication The Information reported Friday that DeepSeek is expected to release a V4 model with enhanced coding capabilities in mid-February, coinciding with the first anniversary of its R1 model release.

Industry reception and technical validation

Elie Bakouch, a research engineer at open-source developer platform Hugging Face, praised the paper on social media for validating the technique “with hardware at inference and training” – a reference to the researchers’ practical implementation testing not purely theoretical modelling.

The paper lists 14 co-authors, including Huishuai Zhang, an assistant professor of computer science at Peking University and a former Microsoft Research Asia researcher. Lead author Cheng Xin, a Peking University student, previously contributed to DeepSeek’s V3 and R1 models.

In the paper, researchers compare Engram’s potential impact to DeepSeek’s variant of the Mixture-of-Experts technique, which enabled model scaling without proportional compute increases and has since been adopted by competing Chinese AI developers.

“We envision conditional memory as an indispensable modelling primitive for next-generation sparse models,” the authors wrote.

The technical paper is expected to receive scrutiny from AI researchers in both China and the US, as DeepSeek has emerged as a prominent example of Chinese AI innovation operating under US export restrictions on advanced semiconductors.

Industry leaders’ largest models currently operate with several trillion parameters, suggesting substantial room for scaling if techniques like Engram prove effective at production scale.

Want to experience the full spectrum of enterprise technology innovation? Join TechEx in Amsterdam, California, and London. Covering AI, Big Data, Cyber Security, IoT, Digital Transformation, Intelligent Automation, Edge Computing, and Data Centres, TechEx brings together global leaders to share real-world use cases and in-depth insights. Click here for more information.

TNG – Latest News & Reviews