DeepSeek Revolutionizes AI with Open Large Language Models

You may have heard about Deepseek: The Chinese company released a pair of large open language models (LLMS), Deepseek-V3 and Deepseek-R1, in December 2024, making it available to anyone for free use and modification. After that, in January, the company released a free Chatbot app, which gained popularity and soon rose to the first place in the Apple App Store. The excellent performance of Deepsek Models, which competes with the best LLMS performance from Openai and Antholbrobi, in the stock market track on January 27, which led to the elimination of more than $ 600 billion in AI’s shares.
However, the supporters of the open artificial intelligence models have enthusiastically confronted the Dibsic versions. More than 700 are now available on Deepseek-V3 and R1 on the AI community platform. Group, they have received more than 5 million downloads.
Cameron t says. Wolf, the chief research scientist in Netflix, enthusiasm is justified. “Deepseek-V3 and R1 legally approached to match closed models. In addition, the fact that Deepseek was able to make such a model under strict restrictions of devices due to American export controls on NVIDIA chips is impressive.”
The cost of Deepseek-V3 is less than $ 6 million for training
This is the second point – Hardware is due to export restrictions in the United States in 2022 – which highlights the most surprising Deepseek claims. The company says the Deepsek-V3 costs approximately $ 5.6 million for training using H800 chips in NVIDIA. H800 is less than NVIDIA, which is designed to pass the standards set by the American export ban. The ban aims to prevent Chinese companies from training LLMS at the highest level. (H800 chip was later banned, in October 2023.)
Deepseek has achieved impressive results on less powerful devices with the “Dualpipe” parallel algorithm designed to circumvent the restrictions of NVIDIA H800. Low -level programming is used to accurately control how training and assembly tasks are scheduled. The model also uses a mixture of experts (MEE), which includes many nerve networks, “experts”, which can be activated independently. Since every smaller and more specialized expert, there is a less memory to train the model, and the costs of less account are once the form is published.
The result is Deepseek-V3, a large language model with 671 billion teachers. Although Openai does not reveal the parameters in its advanced models, it exceeds 1 trillion. Nevertheless, Deepseek V3 has scored standard degrees that match or overcome Openai’s GPT-4O and Claude 3.5 Sonnet.
Deepsek-V3 is not the only star of the company; It also released a form of logic, Deepseek-R1, with thinking about a series like Openai’s O1. Although R1 is not the first open thinking model, it is more capable of the previous model, like QWQ from Alibiba. As with Deepseek-V3, it achieved its results with an unconventional approach.
Most LLMS is trained with a process that includes supervisory polishing (SFT). This technique is samples of the model responses to the claims, which are reviewed and then named by humans. Their assessments are fed again for training to improve the responses of the model. It works, but the presence of human beings review and classification of responses is a consumer of time and costly.
Deepseek first tried to ignore SFT and instead relied on reinforcement learning (RL) to train Deepseek-R1-Zero. The bases-based reward system, described in the white paper of the model, is designed to help Deepseek-R1-Zero learn the mind. But this approach led to problems, such as mixing language (using many languages in one response), which made its responses difficult to read. To overcome this, use Deepseek-R1 technology “Cold Beginning” starting with a small SFT data collection with a few thousand examples. From there, RL is used to complete the training. Wolf is called “a huge, very, not trivial discovery.”
Deepseek put in practice
For Rajkiran Panuganti, the first manager of AI applications at the Indian company KRUTRIM, Deepseek gains are not just an academy. KRUTRIM provides Amnesty International services to customers and used many open models, including the Meta’s Llama of Models, to build their products and services. Panuganti says it will recommend using Deepseek in future projects.
“The previous Llama models were great open models, but they are not suitable for complex problems. Sometimes they cannot answer simple questions, such as the number of times the message is making P “It appears in strawberries,” says Panogantte. It warns that Deepseek models do not overcome the leading closed thinking models, such as Openai’s O1, which may be better for the most challenging tasks. However, Deepseek-R1 says it is “many complications” less expensive.
This is if you pay the Deepseek API fees. While the company has a commercial application programming interface that receives up to its models, it is also free to download, use and modify under a mute license.
Better, Deepseek offers many smaller and most efficient versions of their main models, known as “distilled models”. These have less parameters, which makes them easier to run less powerful devices. YouTube Jeff Geerling has already demonstrated Deepseek R1 on Raspberry Pi. Famous facades to run LLM locally on a private computer, such as ollama, already support Deepseek R1. I had Deepseek-R1-7B, the second best distilled model, working on the Mac Mini M4 with 16 GB of RAM in less than 10 minutes.
From just “open” to an open source
While Deepseek “is open”, some details are left behind the processor curtain. Deepseek does not reveal data sets or training code used to train their models.
This is a point of disagreement in open source societies. Most “open” models only provide the weight weights needed to run or adjust the model. The full training data set, as well as the symbol used in training, remains hidden. Stefano Mavoli, director of the open source initiative, has repeatedly summoned social media on social media, saying that her decision to classify the Llama model as an open source is an “heinous lie”.
Deepseek models are similarly transparent, but Hugingface tries to detect the puzzle. On January 28, Open-R1, an attempt to create a completely open version of Deepseek-R1.
“Learning to reinforcement is very difficult, and small differences in implementation can lead to large gaps in performance,” says Eli Bakush, an artificial intelligence research engineer at Hugingface. The cost of the Deepseek data collection will cost, which is required to reproduce models, important. However, Bakouch says that Hugingface has a “scientific group” that should be at the level of the task. Researchers Engineers can follow Open-R1’s Progress Lugingface and Jaytab.
Regardless of the success of Open-R1, Bakouch says the Deepseek effect goes beyond the open AI community. “The excitement is not only in the open source community, but everywhere.
From your site articles
Related articles about the web
2025-01-31 12:00:00