AI

Now it’s TikTok parent ByteDance’s turn for a reasoning AI: enter Seed-Thinking-v1.5!


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


It started to announce the Openai’s O1 model in September 2024, but it really took off with the release of Deepseek R1 in January 2025.

Now, it seems that most of the main AI service providers and trainers are in a new race to better, faster and cheapest-that is, that may take a little longer to respond to a human user, but it perfectly excels with better and more comprehensive procedures.

BYTEDANCE, the Chinese media giant for Tiktok, is the latest that joins the party by announcing and publishing the artistic paper behind thinking about the separations, the upcoming large language model (LLM) designed to enhance the performance of thinking through both science, technology, mathematics and engineering (STEM) and the general areas of general expression.

The model is not yet available for download or use, and it is not clear what the licensing conditions will be – whether it is a royal/closed source, an open/free source for everyone to use and modify it at the will, or somewhere between them. However, the painting provides some of the details worth noting that it is now worth moving and in advance whenever it is provided.

Like Meta’s New Lama 4 and Mistral’s Mixtral before, Bees-Hinking-V1.5 is designed using a mixture of experts (MEE).

This architecture is designed to make models more efficient. It mainly combines the capabilities of multiple models in one, each specialist in a different field.

In this case, the MEE structure means that the seed -1.5 thinking uses only 20 billion of 200 billion teachers at one time.

BYTEDANCE says in its technical papers published to GitHub that thinking about thought at 1.5 gives priority to organized thinking and generating the studied response.

The results are almost talking about itself, as Deepseek R1 surpasses the performance of the newly released on Gemini 2.5 Pro and Openai’s O3-MINI on many standard assessments of an external party. It even transcends these two standard standards, which measures progress towards artificial general intelligence, which is seen as a goal or “sacred cup” of artificial intelligence. This model surpasses humans in most of the economically valuable tasks, according to the definition of Openai.

It is placed as a compressed alternative but is able to advanced advanced models, and the seed thinking is achieved in 1.5 competitive results. Reinforcement learning innovations (RL), training training and infrastructure data for Amnesty International.

Performance standards and concentration of the form

The seed-V1.5 thinking shows a strong performance on a set of difficult tasks, as 86.7 % in AIME 2024, 55.0 % Pass@8 on the Forsis Code and 77.3 % on the GPQA science standard. These results are placed close to or matching models such as Openai’s O3-Mini-Hight and Google Gemini 2.5 Pro on specific thinking standards.

Regarding the unusual tasks, the model was evaluated through the comparisons of human preference and achieved a 8.0 % higher victory rate on Deepsek R1, indicating that its strengths go beyond logical challenges or in mathematics.

To address saturation in standard standards such as AIME, Bytedance Beyondaime, a new and difficult mathematical standard with coordinated problems designed to resist conservation and better performance model. This group and Codeforce evaluation group are expected to be publicly released to support future research.

Data strategy

Training data played a major role in the development of the model. In order to install the supervision (SFT), the team sponsored 400,000 samples, including 300,000 verified samples (STEM, logic and coding) and 100,000 unsupported problems such as creative writing and roles.

For RL Training, data has been divided into:

  • Problems that can be verified: 100,000 STEM questions that were accurately filtered and logical puzzles with well -known answers, obtained from elite competitions and expert review.
  • Inspected tasks: Human discrimination data sets focus on open claims, which were evaluated using marital reward models.

STEM has greatly bowed on advanced mathematics, representing more than 80 % of the problem of problems. Additional logic data included tasks such as Sudoku and 24 points, with adjustable difficulty to match model progress.

Reinforce learning approach

Learning for reinforcement in seed thinking-V1.5 is supported by critical active action frameworks (VAPO) and DAPO frameworks, which have been developed to address the well-known instability in RL training. These technologies reduce the contrary signal contrast and enhance training stability, especially in the long idea chain settings (COT).

Reward models play an important role in overseeing RL outputs. By attedance two main tools:

  • SEED-Verififier: LLM on the rules based on whether the answers created and reference answers are mathematical equivalent.
  • Thinking of the seeds: A judge based on thinking step by step that improves the consistency of judgment and resists the reward.

The bilateral bonuses system enables accurate assessment of both direct and complex tasks.

Infrastructure and expansion

To widely support effective training, Bytedance has built a system above the HybridFlow framework. Implementation is treated by Ray groups, and joint training and inferences are determined to reduce lethargy time in GPU.

The broadcasting system (SRS) is a remarkable innovation that separates the model’s development from the implementation of the operating time. It speeds up the speed of repetition by managing completely complete generations through models versions. According to what is reported, this architecture is available up to 3 x RL cycles faster.

Additional infrastructure techniques include:

  • Mixed accuracy (FP8) for memory
  • Parallel experts and automatic control of the Ministry of Agriculture’s efficiency
  • Bytecheckpoint for flexible and flexible correction
  • Autotting to improve parallel and memory compositions

Human evaluation and the influence of the real world

To assess compatibility with human preferences, Bytedance conducted a human test across a group of areas, including creative writing, knowledge of humanities and a general conversation.

The seeds of the seed-V1.5 constantly outperformed the Deepseek R1 through sessions, which enhances its application to the needs of users in the real world.

The development team notes that mainly trained thinking models on check -up tasks have shown a strong circular to creative fields – a result attributed to the structure and hardness included in the course of sports training.

What it means for technical leaders, data engineers and decision makers for institutions

For technical progresss, managing a life cycle of large language models-from data organization to publishing-shows thinking in thought-V1.5 is an opportunity to rethink how to integrate the capabilities of thinking about the AI’s chimneys of the institution.

Its standard training process, which includes verified thinking data sets and multi -stage reinforcement learning, especially calls for teams that look forward to expanding the scope of LLM development while maintaining accurate control.

Bytedance moves to introduce seed display mechanisms and seed thinking for confidence -worthy bonuses, which can be decisive when publishing models in environments facing a customer or organization.

For the teams that operate under the narrow final dates and the limited frequency range display, the stability of the model under the learning of reinforcement, which was enabled through innovations such as Vapo and dynamic samples, can reduce repetition cycles and simplify accurate refinement of specific tasks.

From the perspective of coordination and publishing, the hybrid infrastructure of the model – including the SRS and support for the improvement of the FP8 – takes great gains in productivity training and the use of devices.

These features will be valuable for engineers responsible for expanding the scope of LLM operations through cloud and drawing systems. The fact that the thinking of the -1.5 seed was trained with mechanisms to adapt the reward notes based on the dynamics of the time of operation speaks directly to the challenges of managing heterogeneous data pipelines and maintaining consistency across fields.

For the difference in reliability guarantee, reproduction, and the continuous integration of new tools, the system’s system design for V1.5 can be a plan to build strong multimedia formatting systems.

For professionals in data engineering, the regulatory approach to training data – including strict liquidation, enlargement and experts verification – restores the importance of data quality as a multiplier to perform the model. This can inspire more deliberate methods of developing the data set and checking pipelines.

Future expectations

The seed-V1.5 thinking is produced within the Seedance Seed LLM team, led by Yonghui Wu and with a general representation by Haibin Lin, a long contributor of artificial intelligence.

The project also depends on previous efforts, such as Doubao 1.5 Pro, and includes joint techniques in RLHF and data arrangement.

The team plans to continue to improve reinforcement learning technologies, focusing on training efficiency and rewards for unlocked tasks. The general release of internal standards such as Beyondaime aims to enhance the broader progress in thinking that focuses on thinking.


Don’t miss more hot News like this! Click here to discover the latest in AI news!


2025-04-11 19:08:00

Related Articles

Back to top button