image source head

Deepseek list app store, Chinese AI stir overseas technology circle

trendx logo

Reprinted from panewslab

01/27/2025·3M

Author: APPSO

In the past week, the DeepSeek R1 model from China stirred up the entire overseas AI circle.

On the one hand, it achieves performance comparable to OpenAI o1 with lower training costs, demonstrating China's advantages in engineering capabilities and scale innovation; on the other hand, it also upholds the open source spirit and is keen to share technical details.

Recently, a research team from Jiayi Pan, a Ph.D. candidate at the University of California, Berkeley, has successfully reproduced the key technology of DeepSeek R1-Zero - the "Aha Moment" - at a very low cost (less than US$30).

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

So it’s no wonder that Meta CEO Zuckerberg, Turing Award winner Yann LeCun, and Deepmind CEO Demis Hassabis all spoke highly of DeepSeek.

As the popularity of DeepSeek R1 continues to rise, this afternoon, the DeepSeek App's servers were temporarily busy due to a surge in user visits, and even "crashed" for a time.

OpenAI CEO Sam Altman has just tried to reveal the usage limit of o3-mini to grab the headlines of international media - ChatGPT Plus members can query 100 times a day.

However, what is little known is that before it became famous, DeepSeek’s parent company Huanfang Quantitative was actually one of the leading companies in the domestic quantitative private equity field.

DeepSeek model shocked Silicon Valley, and its gold content is still

rising

On December 26, 2024, DeepSeek officially released the DeepSeek-V3 large model.

This model has performed well in multiple benchmark tests, surpassing the top mainstream models in the industry, especially in areas such as knowledge question and answer, long text processing, code generation and mathematical capabilities. For example, in knowledge tasks such as MMLU and GPQA, the performance of DeepSeek-V3 is close to the international top model Claude-3.5-Sonnet.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

In terms of mathematical ability, it has set new records in tests such as AIME 2024 and CNMO 2024, surpassing all known open source and closed source models. At the same time, its generation speed has increased by 200% compared with the previous generation, reaching 60 TPS, which greatly improves the user experience.

According to the analysis of the independent evaluation website Artificial Analysis, DeepSeek-V3 surpasses other open source models in many key indicators, and is on par with the world's top closed-source models GPT-4o and Claude-3.5-Sonnet in performance.

The core technical advantages of DeepSeek-V3 include:

  1. Mixed Expert (MoE) architecture: DeepSeek-V3 has 671 billion parameters, but in actual operation, only 37 billion parameters are activated for each input. This selective activation method greatly reduces computing costs while maintaining high performance.
  2. Multi-Head Latent Attention (MLA): This architecture has been proven in DeepSeek-V2 and can achieve efficient training and inference.
  3. Load balancing strategy without auxiliary losses: This strategy is designed to minimize the negative impact of load balancing on model performance.
  4. Multi-tokens prediction training target: This strategy improves the overall performance of the model.

Efficient training framework: Using the HAI-LLM framework, it supports 16-way Pipeline Parallelism (PP), 64-way Expert Parallelism (EP) and ZeRO-1 Data Parallelism (DP), and reduces training costs through a variety of optimization methods.

More importantly, the training cost of DeepSeek-V3 is only US$5.58 million, which is much lower than GPT-4, which has a training cost of US$78 million. Moreover, its API service prices also continue to be friendly to the people in the past.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

Input tokens only cost 0.5 yuan (cache hit) or 2 yuan (cache miss) per million, and output tokens only cost 8 yuan per million.

The Financial Times described it as "a dark horse that shocked the international technology community" and believed that its performance was comparable to that of American rival models such as well-funded OpenAI. Maginative founder Chris McKay further pointed out that the success of DeepSeek-V3 may redefine the established methods of AI model development.

In other words, the success of DeepSeek-V3 is also seen as a direct response to the U.S. export restrictions on computing power. This external pressure has instead stimulated China's innovation.

DeepSeek founder Liang Wenfeng, a low-key genius at Zhejiang University

The rise of DeepSeek has made Silicon Valley sleepless. Liang Wenfeng, the founder behind this model that has stirred up the global AI industry, perfectly explains the growth trajectory of geniuses in the traditional Chinese sense - young success, lasting success.

A good AI company leader needs to understand both technology and business, be both visionary and pragmatic, have the courage to innovate and have engineering discipline. This kind of compound talent itself is a scarce resource.

At the age of 17, he was admitted to Zhejiang University majoring in information and electronic engineering. At the age of 30, he founded Hquant and began to lead the team to explore fully automated quantitative trading. Liang Wenfeng’s story proves that genius always does the right thing at the right time.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

  • 2010: With the launch of CSI 300 stock index futures, quantitative investment ushered in development opportunities. The Huanfang team took advantage of the momentum and its self-operated funds grew rapidly.
  • 2015: Liang Wenfeng co-founded Magic Square Quantification with his alumni. The following year, he launched the first AI model and launched trading positions generated by deep learning.
  • 2017: Huanfang Quantitative claimed to realize a comprehensive AI-based investment strategy.
  • 2018: Establish AI as the company's main development direction.
  • 2019: The scale of fund management exceeded 10 billion yuan, becoming one of the “four giants” of domestic quantitative private equity.
  • 2021: Huanfang Quantitative becomes the first domestic quantitative private equity company to exceed 100 billion in scale.

You can't just be successful and think of the company that spent the last few years sitting on the sidelines. However, just like the transformation of quantitative trading companies to AI, it may seem unexpected, but it is actually logical - because they are all data-driven technology-intensive industries.

Huang Renxun only wanted to sell game graphics cards to make money for those of us who are bad at playing games, but he did not expect to become the world's largest AI arsenal. It is similar to Huanfang's entry into the AI ​​field. This kind of evolution is more viable than the large-scale AI models that many industries currently apply mechanically.

Magic Square Quantitative has accumulated a lot of experience in data processing and algorithm optimization in the process of quantitative investment. It also has a large number of A100 chips, which provides strong hardware support for AI model training. Since 2017, Magic Square has deployed AI computing power on a large scale and built high-performance computing clusters such as "Yinghuo One" and "Yinghuo Two" to provide powerful computing power support for AI model training.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

In 2023, Magic Square Quantification officially established DeepSeek to focus on the development of large AI models. DeepSeek inherited Magic Quantitative's accumulation of technology, talents and resources, and quickly emerged in the field of AI.

In an in-depth interview with "Undercurrent", DeepSeek founder Liang Wenfeng also showed a unique strategic vision.

Unlike most Chinese companies that choose to copy the Llama architecture, DeepSeek starts directly from the model structure, just to aim at the ambitious goal of AGI.

Liang Wenfeng makes no secret of the current gap. There is currently a significant gap between China's AI and the top international levels. The comprehensive gap in model structure, training dynamics and data efficiency requires 4 times the computing power to achieve the same effect.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

 ▲Picture from CCTV News screenshot

This attitude of facing challenges head-on stems from Liang Wenfeng’s years of experience in Huanfang.

He emphasized that open source is not only technology sharing, but also a cultural expression. The real moat lies in the team's continuous innovation ability. DeepSeek's unique organizational culture encourages bottom-up innovation, downplays hierarchy, and values ​​the passion and creativity of talents.

The team is mainly composed of young people from top universities and adopts a natural division of labor model to allow employees to explore and collaborate independently. When recruiting, we value employees’ passion and curiosity rather than experience and background in the traditional sense.

Regarding the industry prospects, Liang Wenfeng believes that AI is in an explosion period of technological innovation rather than an explosion period of application. He emphasized that China needs more original technological innovations and cannot remain in the imitation stage forever. It needs people to stand at the forefront of technology.

Even though companies like OpenAI are currently leading the way, opportunities for innovation still exist.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

Overturning Silicon Valley, Deepseek makes overseas AI circles restless

Although the industry has different opinions on DeepSeek, we have also collected some comments from industry insiders.

Jim Fan, NVIDIA GEAR Lab project leader, spoke highly of DeepSeek-R1.

He pointed out that this represents that non-US companies are fulfilling OpenAI's original open mission and achieving influence by disclosing original algorithms and learning curves. By the way, it also contains a wave of OpenAI.

DeepSeek-R1 not only open sourced a series of models, but also disclosed all training secrets. They may be the first open source projects to demonstrate the significant and continued growth of the RL flywheel.

Influence can be achieved through legendary projects such as "ASI Internal Implementation" or "Strawberry Project", or simply by exposing the original algorithm and matplotlib learning curve.

Marc Andreesen, founder of A16Z, a top Wall Street venture capital firm, believes that DeepSeek R1 is one of the most surprising and impressive breakthroughs he has ever seen. As an open source, it is a far-reaching gift to the world.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

Lu Jing, a former senior researcher at Tencent and a postdoctoral fellow in artificial intelligence at Peking University, analyzed from the perspective of technology accumulation. He pointed out that DeepSeek did not suddenly become popular. It inherited many innovations in the previous generation model version. The relevant model architecture and algorithm innovation have been iteratively verified, and it is inevitable to shake the industry.

Yann LeCun, Turing Award winner and Meta’s chief AI scientist, put forward a new perspective:

"For those who think "China is surpassing the United States in AI" after seeing DeepSeek's performance, your interpretation is wrong. The correct interpretation should be, "The open source model is surpassing the proprietary model." "

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

The comments of Deepmind CEO Demis Hassabis revealed a hint of worry:

"What it (DeepSeek) has achieved is very impressive, and I think we need to think about how to maintain the leadership of the Western frontier models. I think the West is still ahead, but certainly China has extremely strong engineering and scaling capabilities. "

Microsoft CEO Satya Nadella said at the World Economic Forum in Davos, Switzerland, that DeepSeek has effectively developed an open source model that not only performs well in inference calculations, but is also extremely efficient in supercomputing.

He emphasized that Microsoft must respond to these breakthrough developments in China with the highest priority.

Meta CEO Zuckerberg's evaluation was more in-depth. He believed that the technical strength and performance displayed by DeepSeek were impressive, and pointed out that the AI ​​gap between China and the United States is already minimal, and China's full sprint has made the competition more intense.

The reaction from competitors is perhaps the best endorsement of DeepSeek. According to reports from Meta employees on the anonymous workplace community TeamBlind, the emergence of DeepSeek-V3 and R1 has put Meta's generative AI team into panic.

Meta engineers are racing against time to analyze DeepSeek's technology and try to copy any possible technology from it.

The reason is that the training cost of DeepSeek-V3 is only US$5.58 million, which is not even as much as the annual salary of some Meta executives. Such a disparity in input-output ratio puts Meta management under great pressure when explaining its huge AI R&D budget.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

International mainstream media have also paid great attention to the rise of DeepSeek.

The Financial Times pointed out that the success of DeepSeek has subverted the traditional understanding that "AI research and development must rely on huge investments" and proves that precise technical routes can also achieve excellent research results. More importantly, the DeepSeek team's selfless sharing of technological innovation has made this company that pays more attention to research value an exceptionally strong competitor.

The Economist stated that it believes that China's rapid breakthroughs in cost-effectiveness of AI technology have begun to shake the United States' technological advantage, which may affect the United States' productivity improvement and economic growth potential in the next decade.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

The New York Times cuts in from another angle. DeepSeek-V3 is equivalent in performance to high-end chatbots from American companies, but the cost is greatly reduced.

This shows that even in the face of chip export controls, Chinese companies can compete through innovation and efficient use of resources. Moreover, the U.S. government's chip restriction policy may be counterproductive, instead promoting China's innovative breakthroughs in the field of open source AI technology.

DeepSeek "reported the wrong door", claiming to be GPT-4

Amid the praise, DeepSeek has also faced some controversy.

Many outsiders believe that DeepSeek may have used the output data of models such as ChatGPT as training materials during the training process. Through model distillation technology, the "knowledge" in these data is migrated to DeepSeek's own model.

This practice is not uncommon in the AI ​​field, but skeptics are concerned about whether DeepSeek used the output data of the OpenAI model without full disclosure. This seems to be reflected in DeepSeek-V3’s self-awareness.

Earlier users discovered that when asked about the identity of a model, it mistook itself for GPT-4.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

High-quality data has always been an important factor in the development of AI. Even OpenAI cannot avoid controversies over data acquisition. Its practice of large-scale crawling of data from the Internet has also attracted many copyright lawsuits. So far, OpenAI and the New York Times have ruled in the first instance. Before the boots have landed, a new case has been added.

So DeepSeek also received public connotations from Sam Altman and John Schulman.

"It's (relatively) easy to copy something you know will work. It's very difficult to do something new, risky, and difficult when you don't know if it will work."

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

However, the DeepSeek team made it clear in the technical report of R1 that it did not use the output data of the OpenAI model and stated that high performance was achieved through reinforcement learning and a unique training strategy.

For example, a multi-stage training method is adopted, including basic model training, reinforcement learning (RL) training, fine-tuning, etc. This multi-stage cyclic training method helps the model absorb different knowledge and abilities at different stages.

Saving money is also a technical job, and the technology behind DeepSeek

is the best solution

The DeepSeek-R1 technical report mentioned a noteworthy discovery, which is the "aha moment" that occurred during the R1 zero training process. In the mid-training phase of the model, DeepSeek-R1-Zero begins to actively re-evaluate the initial problem-solving ideas and allocate more time to optimize the strategy (such as trying different solutions multiple times).

In other words, through the RL framework, AI may spontaneously develop human-like reasoning capabilities and even exceed the limitations of preset rules. And this will also hopefully provide a direction for the development of more autonomous and adaptive AI models, such as dynamically adjusting strategies in complex decision-making (medical diagnosis, algorithm design).

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

At the same time, many industry insiders are trying to analyze DeepSeek's technical report in depth. Andrej Karpathy, former co-founder of OpenAI, said after the release of DeepSeek V3:

DeepSeek (the Chinese AI company) is feeling relaxed today. It publicly released a cutting-edge language model (LLM) and completed the training on an extremely low budget (2048 GPUs, lasting 2 months, costing $6 million).

For reference, this capability typically requires a cluster of 16K GPUs to support, and most of today's advanced systems use approximately 100K GPUs. For example, Llama 3 (405B parameters) used 30.8 million GPU hours, while DeepSeek-V3 appears to be a more powerful model, using only 2.8 million GPU hours (about 1/11 of the computation of Llama 3).

If this model also performs well in real-world testing (for example, the LLM Arena rankings are ongoing and my quick test performed well), then this will be a very good example of how research and engineering capabilities can be demonstrated under resource constraints. Impressive results.

So, does this mean we no longer need large GPU clusters to train cutting-> edge LLM? Not really, but it shows that you have to make sure that the resources you use are not wasted, and this case shows that data and algorithm optimization can still lead to great progress. Additionally, the technical report is excellent and detailed and worth reading.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

Faced with the controversy over DeepSeek V3's use of ChatGPT data, Karpathy said that large language models essentially do not have human-like self-awareness. Whether the model can correctly answer its own identity depends entirely on whether the development team has specially built self-awareness. Training set, if not specially trained, the model will answer based on the closest information in the training data.

In addition, the fact that the model identifies itself as ChatGPT is not the problem. Considering the ubiquity of ChatGPT-related data on the Internet, this answer actually reflects a natural "neighbor knowledge emergence" phenomenon.

Jim Fan pointed out after reading the technical report of DeepSeek-R1:

The most important point of this paper is that it is completely driven by reinforcement learning, without any involvement of supervised learning (SFT). This method is similar to AlphaZero - mastering Go and Shogi from scratch through "Cold Start" and chess, without imitating the play of human chess players.

– Use real rewards calculated based on hard-coded rules rather than learned reward models that can be easily “hacked” by reinforcement learning.

– The model’s thinking time steadily increases as the training progresses. This is not pre-programmed but a spontaneous feature.

– The phenomenon of self-reflection and exploratory behavior emerges.

– Use GRPO instead of PPO: GRPO removes the commentator network in PPO and instead uses the average reward of multiple samples. This is a simple way to reduce memory usage. It is worth noting that GRPO was invented by the DeepSeek team in February 2024, which is really a very powerful team.

When Kimi also released similar research results on the same day, Jim Fan found that the research results of the two companies reached the same goal:

  • They all gave up complex tree search methods such as MCTS and turned to simpler linear thinking trajectories, using traditional autoregressive prediction methods.
  • All avoid using value functions that require additional model copies, reducing computing resource requirements and improving training efficiency.
  • They all abandon intensive reward modeling and rely on real results as guidance as much as possible to ensure the stability of training.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

But there are also significant differences between the two:

  • DeepSeek adopts AlphaZero-style pure RL cold start method, Kimi k1.5 chooses AlphaGo-Master-style preheating strategy and uses lightweight SFT
  • DeepSeek is open source under the MIT license, and Kimi performs well in multi-modal benchmark tests. The paper system design details are richer, covering RL infrastructure, hybrid clusters, code sandboxes, and parallel strategies.

However, in this rapidly iterating AI market, the lead is often fleeting. Other modeling companies will quickly learn from DeepSeek's experience and improve upon it, and may soon be able to catch up.

The initiator of the large model price war

Many people know that DeepSeek has a title called "AI Pinduoduo", but they don't know that the meaning behind it actually stems from the large model price war that started last year.

On May 6, 2024, DeepSeek released the DeepSeek-V2 open source MoE model, which achieved dual breakthroughs in performance and cost through innovative architectures such as MLA (multi-head latent attention mechanism) and MoE (mixed expert model).

The inference cost was reduced to only 1 yuan per million tokens, which was approximately one-seventh of the Llama3 70B and one-seventyth of the GPT-4 Turbo at the time. This technological breakthrough enables DeepSeek to provide extremely cost-effective services without charging any money, and it also brings huge competitive pressure to other manufacturers.

The release of DeepSeek-V2 triggered a chain reaction. ByteDance, Baidu, Alibaba, Tencent, and Zhipu AI all followed suit and significantly reduced the prices of their large model products. The impact of this price war even spans the Pacific, causing great concern in Silicon Valley.

DeepSeek has therefore been dubbed the “Pinduoduo of AI”.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

Faced with doubts from the outside world, DeepSeek founder Liang Wenfeng responded in an interview with Undercurrent:

"Grabbing users is not our main purpose. On the one hand, we lowered the price because we are exploring the structure of the next generation model, and the cost has come down first; on the other hand, we also feel that both API and AI should be inclusive. Something that everyone can afford.”

In fact, the significance of this price war goes far beyond the competition itself. Lower entry barriers allow more companies and developers to access and apply cutting-edge AI, and it also forces the entire industry to rethink pricing strategies. It is during this period that , DeepSeek began to enter the public eye and rose to prominence.

Spending thousands of dollars to buy horse bones, Lei Jun poaches AI

genius girls

A few weeks ago, DeepSeek also made a high-profile personnel change.

According to China Business News, Lei Jun successfully poached Luo Fuli with an annual salary of tens of millions and entrusted her with the important task of head of the large model team of Xiaomi AI Lab.

Luo Fuli joined DeepSeek, a subsidiary of Magic Square Quantitative, in 2022. She can be seen in important reports such as DeepSeek-V2 and the latest R1.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

Later, DeepSeek, which once focused on the B-side, also began to lay out the C-side and launch mobile applications. As of press time, DeepSeek's mobile application ranks as high as second in the free version of Apple's App Store, showing strong competitiveness.

A series of small climaxes have made DeepSeek famous, but at the same time, there are also higher climaxes. On the evening of January 20, the ultra-large-scale model DeepSeek R1 with 660B parameters was officially released.

This model performs well on mathematical tasks. For example, it achieved a pass@1 score of 79.8% on AIME 2024, slightly exceeding OpenAI-o1; it scored as high as 97.3% on MATH-500, which is equivalent to OpenAI-o1.

In terms of programming tasks, for example, it obtained the 2029 Elo rating on Codeforces, surpassing 96.3% of human participants. In knowledge benchmarks such as MMLU, MMLU-Pro and GPQA Diamond, DeepSeek R1 scored 90.8%, 84.0% and 71.5% respectively. Although slightly lower than OpenAI-o1, it is better than other closed-source models.

In the latest comprehensive list of the large model arena LM Arena, DeepSeek R1 ranked third, tied with o1.

  • In the fields of "Hard Prompts" (difficult prompt words), "Coding" (coding ability) and "Math" (mathematical ability), DeepSeek R1 ranks first.
  • In terms of "Style Control", DeepSeek R1 and o1 tied for first place.
  • In the "Hard Prompt with Style Control" test, DeepSeek R1 also tied for first place with o1.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

In terms of open source strategy, R1 adopts MIT License, giving users maximum freedom of use, supporting model distillation, which can distill reasoning capabilities into smaller models, such as the 32B and 70B models, which have achieved benchmark o1-mini in multiple capabilities. The effect of open source even surpasses Meta, which has been criticized before.

The emergence of DeepSeek R1 allows domestic users to use o1-level models for free for the first time, breaking the long-standing information barriers. The buzz it sparked on social platforms such as Xiaohongshu is comparable to GPT-4 at the time of its release.

Go out to the sea and involute

Looking back at DeepSeek's development trajectory, its success code is clearly visible. Strength is the foundation, but brand recognition is the moat.

In a conversation with "Later", MiniMax CEO Yan Junjie shared in depth his thoughts on the AI ​​industry and the company's strategic changes. He highlighted two key turning points: first, recognizing the importance of technology branding, and second, understanding the value of an open source strategy.

Yan Junjie believes that in the field of AI, the speed of technological evolution is more important than current achievements, and open source can accelerate this process through community feedback; secondly, a strong technology brand is crucial to attracting talents and acquiring resources.

Take OpenAI as an example. Although it encountered management turmoil in the later period, its innovative image and open source spirit established early on have accumulated a good first wave of impressions for it. Even though Claude has become technically evenly matched in the future and gradually cannibalized OpenAI's B-side users, OpenAI is still far ahead in C-side users due to users' path dependence.

In the field of AI, the real competitive stage is always global. Going overseas, involving, and promoting is also a good way to go.

DeepSeek dominates the App Store, Chinese AI stirs up overseas technology
circles

This wave of going overseas has already caused ripples in the industry. The earlier Qwen, Wall-facing Smart, and more recently DeepSeek R1, kimi v1.5, and Doubao v1.5 Pro have already caused quite a stir overseas.

Although 2025 has been labeled as the first year of smart bodies and the first year of AI glasses, this year will also be an important first year for Chinese AI companies to embrace the global market, and going global will become an unavoidable keyword.

Moreover, the open source strategy is also a good move, attracting a large number of technical bloggers and developers to spontaneously become DeepSeek's "tap water". Technology for good should not just be a slogan. From the slogan "AI for All" to true technology inclusiveness, DeepSeek has embarked on a purer path than OpenAI.

If OpenAI lets us see the power of AI, then DeepSeek makes us believe:

This power will eventually benefit everyone.

more