image source head

From computing power competition to algorithm innovation: a new paradigm for AI led by DeepSeek

trendx logo

Reprinted from chaincatcher

03/25/2025·1M

Author: BadBot, IOBC Capital

Just last night, DeepSeek released an update of the V3 version -DeepSeek-V3-0324 on Hugging Face, with model parameters of 685 billion, and significantly improved code capabilities, UI design, reasoning capabilities, etc.

At the just-concluded 2025 GTC conference, Huang Renxun highly praised DeepSeek, and emphasized that it was wrong for the market to think that DeepSeek's efficient model would reduce the demand for Nvidia chips, and that the future computing demand will only be more, not less.

As a star product that breaks through algorithms, DeepSeek has nothing to do with Nvidia's computing power supply? I would like to discuss the significance of computing power and algorithms to the development of the industry first.

picture

The symbiotic evolution of computing power and algorithms

In the field of AI, the improvement of computing power provides a running foundation for more complex algorithms, allowing models to process larger amounts of data and learn more complex modes; while the optimization of algorithms can use computing power more efficiently and improve the efficiency of computing resources.

The symbiotic relationship between computing power and algorithms is reshaping the AI ​​industry structure:

Technology route differentiation : Companies such as OpenAI pursue building super-large computing power clusters, while DeepSeek focuses on algorithm efficiency optimization to form different technical schools.

Industrial chain reconstruction : Nvidia has become the leader in AI computing power through the CUDA ecosystem, while cloud service providers have lowered the deployment threshold through flexible computing power services.

Resource allocation adjustment : The focus of enterprise R&D is to seek a balance between hardware infrastructure investment and efficient algorithm R&D.

The rise of open source communities : Open source models such as DeepSeek and LLaMA allow algorithm innovation and computing power optimization to be shared, accelerating the iteration and diffusion of technology.

DeepSeek's technological innovation

The popularity of DeepSeek is definitely inseparable from its technological innovation. I will use popular language to explain it so that most people can understand it.

Model architecture optimization

DeepSeek adopts the combined architecture of Transformer+MOE (Mixture of Experts) and introduces the multi-Head Latent Attension (MLA). This architecture is like a super team, where Transformer handles routine tasks, while MOE is like a team of experts in the team, each expert has its own field of expertise, and when encountering a specific problem, it is handled by the experts who are best at it, which can greatly improve the efficiency and accuracy of the model. The MLA mechanism allows the model to pay more flexibly attention to different important details when processing information, further improving the performance of the model.

Innovate training methods

DeepSeek proposes an FP8 hybrid precision training framework. This framework is like an intelligent resource provisioner, which can dynamically select the appropriate calculation accuracy according to the needs of different stages of the training process. When high-precision calculations are required, it uses higher accuracy to ensure the accuracy of the model; when lower accuracy can be accepted, it reduces accuracy, thereby saving computing resources, improving training speed, and reducing memory usage.

Improved inference efficiency

During the inference stage, DeepSeek introduced multi-token prediction (MTP) technology. The traditional reasoning method is to take it step by step, only predicting one token at each step. MTP technology can predict multiple tokens at once, which greatly speeds up the speed of reasoning and reduces the cost of reasoning.

Breakthrough in reinforcement learning algorithms

DeepSeek's new reinforcement learning algorithm GRPO (Generalized Reward-Penalized Optimization) optimizes the model training process. Reinforcement learning is like equiping a model with a coach, and the coach guides the model to learn better behavior through rewards and punishments. Traditional reinforcement learning algorithms may consume a lot of computing resources in this process, while DeepSeek's new algorithm is more efficient. It can reduce unnecessary computing while ensuring the performance of the model while achieving a balance between performance and cost.

These innovations are not isolated technical points, but form a complete technical system, reducing computing power demand from training to reasoning. Ordinary consumer-grade graphics cards can now run powerful AI models, greatly reducing the threshold for AI applications and allowing more developers and enterprises to participate in AI innovation.

picture

Impact on Nvidia

Many people think that DeepSeek bypassed the Cuda layer and thus got rid of its dependence on Nvidia. In fact, DeepSeek directly optimizes algorithms through Nvidia's PTX (Parallel Thread Execution) layer. PTX is an intermediate representation language between advanced CUDA code and actual GPU instructions. By operating this level, DeepSeek can achieve more granular performance tuning.

The impact of this on Nvidia is two-sided. On the one hand, DeepSeek is actually more closely bound to Nvidia's hardware and Cuda ecosystem, and the reduction of AI application threshold may expand the overall market size; on the other hand, DeepSeek's algorithm optimization may change the market's demand structure for high-end chips. Some AI models that originally required GPUs such as H100 to run may now run efficiently on A100 or even consumer graphics cards.

Significance to China's AI industry

DeepSeek's algorithm optimization provides a technological breakthrough path for China's AI industry. Against the background of limited high-end chips, the idea of ​​"software supplementing hardware" has reduced the dependence on top imported chips.

Upstream, efficient algorithms reduce the pressure on computing power demand, allowing computing power service providers to extend the hardware usage cycle through software optimization and improve the return on investment. Downstream, the optimized open source model lowers the threshold for AI application development. Many small and medium-sized enterprises can develop competitive applications based on the DeepSeek model without requiring a large amount of computing resources, which will give rise to the emergence of more vertical AI solutions.

The profound impact on Web3+AI

Decentralized AI Infra

DeepSeek's algorithm optimization provides new impetus for the Web3 AI infrastructure. Innovative architecture, efficient algorithms and low computing power requirements make decentralized AI reasoning possible. MoE architecture is naturally suitable for distributed deployment, and different nodes can hold different expert networks without the need for a single node to store a complete model, which significantly reduces the storage and computing requirements of a single node, thereby improving the flexibility and efficiency of the model.

The FP8 training framework further reduces the demand for high-end computing resources, allowing more computing resources to be added to the node network. This not only lowers the threshold for participating in decentralized AI computing, but also improves the computing power and efficiency of the entire network.

Multi-Agent System

Intelligent trading strategy optimization: help users obtain higher returns through real-time market data analysis agent, short-term price fluctuation prediction agent, on-chain transaction execution agent, transaction result supervision agent, etc.

Automated execution of smart contracts : smart contract monitoring agent, smart contract execution agent, execution result supervision agent, etc. collaborative operation to achieve more complex business logic automation.

Personalized portfolio management : AI helps users find the best pledge or liquidity opportunities in real time based on users' risk preferences, investment goals and financial status.

"We can only see a very short future, but it is enough to find that there is a lot of work to be done there." It is under the constraint of computing power that DeepSeek has found breakthroughs through algorithm innovation, opening up a differentiated development path for China's AI industry. Lowering application thresholds, promoting the integration of Web3 and AI, reducing dependence on high-end chips, and empowering financial innovation are reshaping the digital economy landscape. The future development of AI is no longer just a competition for computing power, but a competition for collaborative optimization of computing power and algorithms. On this new track, innovators such as DeepSeek are using Chinese wisdom to redefine the rules of the game.

more