Can DeepSeek keep getting angry?

Reprinted from panewslab

01/27/2025·3M

Author: Yu Yan, reporter from The Paper

·A headhunter responsible for discovering high-end technology talents in the large model field told The Paper that DeepSeek’s employment logic is not much different from that of other companies in the large model field. The core labels for talents are “young and high potential”, that is, age. Those born around 1998 should have no more than five years of work experience, "smart, science and engineering major, young, and little experience."

·In the eyes of industry insiders, DeepSeek is lucky compared to other large model startups in China. It has no financing pressure, does not need to prove to investors, and does not need to take into account the technical iteration of the model and the optimization of product applications. But as a commercial company, after investing huge sums of money, sooner or later it will face the same pressures and challenges that other model companies currently face.

Which company will be the most popular in China's large model circle in 2024? Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd. (hereinafter referred to as DeepSeek) must be a strong competitor. As the initiator of the large model price war in the middle of last year, DeepSeek first entered the public eye and successively released open source models at the end of the year and the beginning of the year. After DeepSeek-V3 and the inference model DeepSeek-R1, DeepSeek has completely detonated the public opinion field in the large model circle. On the one hand, people are surprised by its cost-effective training cost (it is said that DeepSeek-V3 only spent 5.576 million US dollars in training costs), on the other hand, they applaud the open source and public technical reports of its model. The release of DeepSeek-R1 has excited many scientists, developers and users, and even believes that DeepSeek is a strong competitor to OpenAI's o1 and other inference models.

How can this low-key company be able to build large models with good performance at extremely low training costs? Its popularity today is due to what did it do right? In the days to come, what challenges will it face if it wants to continue to ride the wind and waves in the "model circle"?

Algorithm innovation has significantly reduced computing power costs

"DeepSeek invested early, accumulated a lot, and has its own characteristics in terms of algorithms." An executive of a domestic star large model startup company said when referring to DeepSeek. He believes that the core advantage of DeepSeek's popularity is that it has Thanks to algorithm innovation, "Chinese companies will pay more attention to saving on computing power costs than OpenAI because they lack computing power."

According to the DeepSeek-R1 information released by DeepSeek, it uses reinforcement learning (Reinforcement learning) technology on a large scale in the post-training phase (Post-Training), which greatly improves the model's reasoning ability with only very little labeled data. . In tasks such as mathematics, coding, and natural language reasoning, the performance is comparable to the official version of OpenAI o1.

Can DeepSeek continue to be popular?

 DeepSeek-R1 API Price

DeepSeek founder Liang Wenfeng has repeatedly emphasized that DeepSeek is committed to developing differentiated technology routes rather than copying OpenAI's model. DeepSeek must come up with more effective methods to train its models.

“They used a series of engineering techniques to optimize the model architecture, such as the innovative use of model hybrid methods. The essential purpose is to reduce costs through engineering and make it profitable.” A veteran who has been in the technology industry for many years told The Paper.

According to the information disclosed by DeepSeek, it can be found that it has made significant progress in the MLA (Multi-head Latent Attention) multi-head latent attention mechanism and the self-developed DeepSeekMOE (Mixture-of-Experts hybrid expert model) structure. These two technologies The design makes the DeepSeek model more cost-effective and improves training efficiency by reducing training computing resources. According to data from research firm Epoch AI, DeepSeek's latest model is very efficient.

In terms of data, unlike OpenAI's "mass data feeding" method, DeepSeek uses algorithms to summarize and classify the data. After selective processing, it is fed to large models, which improves training efficiency and reduces DeepSeek's cost. The emergence of DeepSeek-V3 achieves a balance between high performance and low cost, providing new possibilities for the development of large models.

"There may be no need for ultra-large-scale GPU clusters in the future." After the release of DeepSeek's cost-effective model, OpenAI founding member Andrej Karpathy said.

Liu Zhiyuan, a permanent associate professor of the Department of Computer Science at Tsinghua University, told The Paper that DeepSeek’s emergence from the industry just proves our competitive advantage. Through the extremely efficient use of limited resources, we can win more with less. The release of R1 shows that the AI strength gap between us and the United States has significantly narrowed. The Economist also said in its latest report: "DeepSeek is simultaneously changing the technology industry with its low-cost training and innovation in model design."

Demis Hassabis, currently CEO and co-founder of Google DeepMind, said that while it is not entirely clear how much DeepSeek relies on Western systems for training data and open source models, it must be acknowledged. What the team has accomplished is truly impressive. On the one hand, he recognized that China has very strong engineering capabilities and large-scale capabilities. On the other hand, he also pointed out that the West is still ahead and needs to consider how to maintain the leading position of Western cutting-edge models.

The accumulation of many years of focus

The reason why DeepSeek can achieve these innovations is not a one-day achievement, but the result of "incubation" for several years and long-term planning. Liang Wenfeng is also the founder of the leading quantitative private equity company Magic Square Quantitative. Deepseek is believed to have made full use of the funds, data and cards accumulated by Magic Square.

Liang Wenfeng graduated from Zhejiang University with undergraduate and postgraduate degrees and holds undergraduate and master's degrees in Information and Electronic Engineering. Since 2008, he has led his team to explore fully automated quantitative trading using machine learning and other technologies. In 2015, Magic Square Quantitative was established. The following year, the first AI model was launched, and the first trading position generated by deep learning was put online for execution. In 2018, AI was established as the main development direction. In 2020, Huanfang's AI supercomputer "Yinghuo No. 1", with a cumulative investment of over 100 million yuan and an area equivalent to a basketball court, was officially put into operation. It is claimed to be able to rival the super computing power of 40,000 personal computers. In 2021, Huanfang invested 1 billion to build "Yinghuo 2", "equipped with 10,000 A100 GPU chips." At that time, there were no more than five domestic companies with more than 10,000 GPUs, and except for Magic Square Quantification, the other four companies were all major Internet companies.

In July 2023, DeepSeek was officially established and entered the field of general artificial intelligence. It has never raised external financing so far.

"Having relatively abundant cards and no financing pressure. In the past few years, it only made models but not products. Compared with other large domestic model companies, DeepSeek appears to be more simple and focused, and can make breakthroughs in engineering technology and algorithms." The above. Executives of large domestic model companies said.

In addition, as the large model industry becomes increasingly closed, and OpenAI is nicknamed CloseAI, DeepSeek's model open source and public technical reports have also won many praises from developers, allowing its technology brand to quickly stand out in the large model market at home and abroad. .

Some scientific researchers told The Paper that the openness of DeepSeek is remarkable, and the open source of models V3 and R1 has raised the benchmark level of open source models on the market.

Success proves the power of young people

“The success of DeekSeek also allows everyone to see the power of young people. Essentially, the development of this generation of artificial intelligence requires young minds.” A person from a modeling company told The Paper.

Previously, Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believed that DeepSeek hired "a group of unpredictable wizards." In this regard, Liang Wenfeng once said in an interview with self-media that there are no mysterious wizards. They are graduates from top universities in the country, interns with Ph.D. 4 and Ph. 5 who have not graduated, and some young people who have only graduated a few years ago.

It can be seen from the current public media reports that the biggest characteristics of the DeepSeek team are that they are from a prestigious school and are young. Even at the team leader level, most of them are under 35 years old. With a team of less than 140 people, engineers and R&D personnel are almost all from top domestic universities such as Tsinghua University, Peking University, Sun Yat-sen University, and Beijing University of Posts and Telecommunications, and their working hours are short.

A headhunter responsible for discovering high-end technology talents in the field of large models told The Paper that DeepSeek’s employment logic is not much different from that of other companies in the field of large models. The core labels for talents are “young and high potential”, that is, those who are under the age of Born around 1998, it is best to have no more than five years of work experience, "smart, science and engineering major, young, with little experience."

However, the aforementioned headhunter also said that large model startups are still a startup company in nature, and it is not that they do not want to recruit top overseas AI talents. However, the reality is that not many top overseas AI talents are willing to come back.

An anonymous DeepSeek employee revealed to The Paper that the company’s management is very flat and the atmosphere of free communication is relatively good. Liang Wenfeng's whereabouts are unpredictable on weekdays, and most of the time everyone communicates with him online.

This employee had previously worked on large model technology research and development at a large domestic factory, but felt that he was more like a screw in a large factory and was unable to create value, so he finally chose to join DeepSeek. In his view, DeepSeek is currently more focused on underlying model technology.

The working atmosphere of DeepSeek is completely bottom-up, with a natural division of labor. There is no upper limit for the mobilization of cards and people by everyone. "Bring your own ideas, no need to push. During the exploration process, if he encounters a problem, he will recruit people by himself." Discussion." Liang Wenfeng said in an interview.

“It’s too early to think that China’s AI has surpassed the United States”

The analysis of the American business media Business Insider believes that the newly released R1 shows that China can compete with some of the top artificial intelligence models in the industry and keep pace with the cutting-edge development of Silicon Valley in the United States; secondly, open source such advanced artificial intelligence may also have a negative impact on those trying to pass Companies selling technology for huge profits pose a challenge.

However, it may be too early to shout that "China's AI has surpassed the United States." Liu Zhiyuan publicly stated that we need to be wary of public opinion turning from extreme pessimism to extreme optimism. He feels that we have fully surpassed and are far ahead, "far from it." Liu Zhiyuan believes that the current AGI new technology is still accelerating its evolution, and the future development path is still unclear. China is still in the catching-up stage. Although it is no longer beyond the reach, it can only be said to be far behind. "Follow the fast pace on the path that others have explored." Running is relatively easy, but how to open up a new path in the fog is the bigger challenge.”

“It’s too busy now, everyone is too anxious, and they didn’t realize that DeepSeek finally ran out.” People close to DeepSeek lamented to The Paper that the industry is changing so fast that it is impossible to predict what can be done next. All we can do is take a look. A Q3 quarter change.

On the one hand, Demis Hassabis recognized that China has very strong engineering capabilities and large-scale capabilities. On the other hand, he also pointed out that the West is still ahead and needs to consider how to maintain the leading position of Western cutting-edge models.

Although Liang Wenfeng previously stated that DeepSeek only makes models and not products. But as a commercial company, it is almost impossible to just make models without making products. On January 15, DeepSeek official App was officially released. People close to DeepSeek told The Paper that commercialization has been put on DeepSeek’s agenda.

According to industry insiders, DeepSeek is lucky compared to other large model startups in China. It has no financing pressure, does not need to prove to investors, and does not need to take into account the technical iteration of the model and the optimization of product applications. But as a commercial company, after investing huge sums of money, sooner or later it will face the same pressures and challenges that other model companies currently face. "This time out of the circle is a successful marketing for DeepSeek on the eve of commercialization. However, after actual commercialization in the future, it will need to be tested by the market. It is still difficult to determine whether it can continue to break the wave." said a person from the above-mentioned model company.

What is certain is that DeepSeek will face more pressure and challenges in the future. The race to a universal model has just begun. Who can win depends on continued investment in funds and technological iterations. But industry insiders also believe that “for the domestic model industry, it is a good thing to have a company with real technical strength like DeepSeek join.”