The State of LLM Reasoning Models/推理模型的现状

zhezhongyun 2025-03-28 23:21 78 浏览

翻译自：
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling?r=1csfkw&utm_campaign=post&utm_medium=web

Part 1: Inference-Time Compute Scaling Methods

第 1 部分：推理时计算扩展方法

Improving the reasoning abilities of large language models (LLMs) has become one of the hottest topics in 2025, and for good reason. Stronger reasoning skills allow LLMs to tackle more complex problems, making them more capable across a wide range of tasks users care about.
提高大型语言模型的推理能力（）LLMs 已成为 2025 年最热门的话题之一，这是有充分理由的。更强的推理技能允许LLMs解决更复杂的问题，使它们在用户关心的广泛任务中更有能力。

In the last few weeks, researchers have shared a large number of new strategies to improve reasoning, including scaling inference-time compute, reinforcement learning, supervised fine-tuning, and distillation. And many approaches combine these techniques for greater effect.
在过去的几周里，研究人员分享了大量改进推理的新策略，包括扩展推理时间计算、强化学习、监督微调和蒸馏。许多方法将这些技术相结合，以获得更大的效果。

This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged since the release of DeepSeek R1.
本文探讨了 reasoning-optimized LLMs的最新研究进展，特别关注自 DeepSeek R1 发布以来出现的推理时间计算扩展。

The four main categories of implementing reasoning models I explained in Understanding Reasoning LLMs. This article focuses on inference-time-scaling methods.
我在理解推理 LLMs中解释了实现推理模型的四个主要类别。本文重点介绍推理时间缩放方法。

Implementing and improving reasoning in LLMs: The four main categories实施和改进推理LLMs：四个主要类别

Since most readers are likely already familiar with LLM reasoning models, I will keep the definition short: An LLM-based reasoning model is an LLM designed to solve multi-step problems by generating intermediate steps or structured "thought" processes. Unlike simple question-answering LLMs that just share the final answer, reasoning models either explicitly display their thought process or handle it internally, which helps them to perform better at complex tasks such as puzzles, coding challenges, and mathematical problems.

由于大多数读者可能已经熟悉LLM推理模型，因此我将保持定义简短：LLM基于推理的模型旨在通过LLM生成中间步骤或结构化的“思维”过程来解决多步骤问题。与只分享最终答案的简单问答LLMs不同，推理模型要么明确显示他们的思维过程，要么在内部处理，这有助于他们在谜题、编码挑战和数学问题等复杂任务中表现得更好。

Side-by-side comparison of a basic LLM’s one-line answer and a reasoning LLM’s explanatory response.
基本 LLM的单行答案和推理 LLM的解释性回答的并排比较。

In general, there are two main strategies to improve reasoning: (1) increasing training compute or (2) increasing inference compute, also known as inference-time scaling or test-time scaling. (Inference compute refers to the processing power required to generate model outputs in response to a user query after training.)
一般来说，有两种主要策略可以改进推理：（1）增加训练计算或（2）增加推理计算，也称为推理时间缩放或测试时缩放g。（推理计算是指在训练后生成模型输出以响应用户查询所需的处理能力。

Accuracy improvements can be achieved through increased training or test-time compute, where test-time compute is synonymous with inference-time compute and inference-time scaling. Source: Annotated figure from
https://openai.com/index/learning-to-reason-with-llms/
可以通过增加训练或测试时计算来实现准确性改进，其中测试时计算是推理时计算和推理时扩展的同义词。来源：来自
https://openai.com/index/learning-to-reason-with-llms/ 的注释图

Note that the plots shown above make it look like we improve reasoning either via train-time compute OR test-time compute. However, LLMs are usually designed to improve reasoning by combining heavy train-time compute (extensive training or fine-tuning, often with reinforcement learning or specialized data) and increased test-time compute (allowing the model to "think longer" or perform extra computation during inference).
请注意，上面显示的图看起来我们通过训练时间计算或测试时计算来改进推理。但是，LLMs它们通常旨在通过结合繁重的训练时间计算（广泛的训练或微调，通常与强化学习或专用数据相结合）和增加的测试时间计算（允许模型在推理期间“思考更长时间”或执行额外的计算）来改进推理。

The many terms that are used synonymously with inference-time scaling.
与推理时间扩展同义使用的许多术语。

To understand how reasoning models are being developed and improved, I think it remains useful to look at the different techniques separately. In my previous article, Understanding Reasoning LLMs, I discussed a finer categorization into four categories, as summarized in the figure below.
要了解推理模型是如何开发和改进的，我认为分别查看不同的技术仍然很有用。在我之前的文章中，理解推理 LLMs，我讨论了更精细的分类为四类，如下图所示。

Methods 2-4 in the figure above typically produce models that generate longer responses because they include intermediate steps and explanations in their outputs. Since inference costs scale with response length (e.g., a response twice as long requires twice the compute), these training approaches are inherently linked to inference scaling. However, in this section on inference-time compute scaling, I focus specifically on techniques that explicitly regulate the number of generated tokens, whether through additional sampling strategies, self-correction mechanisms, or other methods.

上图中的方法 2-4 通常会产生生成更长响应的模型，因为它们在其输出中包含中间步骤和解释。由于推理成本随响应长度而变化（例如，两倍长的响应需要两倍的计算），因此这些训练方法本质上与推理扩展有关。但是，在有关推理时间计算扩展的部分中，我特别关注明确调节生成令牌数量的技术，无论是通过其他采样策略、自我纠正机制还是其他方法。

In this article, I focus on the interesting new research papers and model releases focused on scaling inference-time compute scaling that followed after the DeepSeek R1 release on January 22nd, 2025. (Originally, I wanted to cover methods from all categories in this article, but due to the excessive length, I decided to release a separate article focused on train-time compute methods in the future.)
在本文中，我将重点介绍在 2025 年 1 月 22 日发布 DeepSeek R1 之后发布的有趣的新研究论文和模型发布，这些论文和模型侧重于扩展推理时间计算扩展。（最初，我想在本文中涵盖所有类别的方法，但由于篇幅过长，我决定将来发布一篇单独的文章，重点介绍训练时间计算方法。

Development process of DeepSeek's reasoning models that I discussed in my previous article, Understanding Reasoning LLMs (
https://magazine.sebastianraschka.com/p/understanding-reasoning-llms).
我在上一篇文章 Understanding Reasoning LLMs （
https://magazine.sebastianraschka.com/p/understanding-reasoning-llms）中讨论过的 DeepSeek 推理模型的开发过程。

Before we look into Inference-time compute scaling methods and the different areas of progress on the reasoning model with a focus on the inference-time compute scaling category, let me at least provide a brief overview of all the different categories.
在我们研究推理时计算扩展方法和推理模型的不同进展领域并重点介绍推理时计算扩展类别之前，我至少简要概述一下所有不同的类别。

1. Inference-time compute scaling
1. 推理时计算扩展

This category includes methods that improve model reasoning capabilities at inference time without training or modifying the underlying model weights. The core idea is to trade off increased computational resources for improved performance, which helps with making even fixed models more capable through techniques such as chain-of-thought reasoning, and various sampling procedures.
此类别包括在推理时提高模型推理能力的方法，而无需训练或修改基础模型权重。核心思想是权衡增加的计算资源以提高性能，这有助于通过思维链推理和各种采样程序等技术使固定模型更有能力。

While I categorize inference-time compute scaling separately to focus on methods in this context, it is important to note that this technique can be applied to any LLM. For example, OpenAI developed its o1 model using reinforcement learning, and then additionally leveraged inference-time compute scaling. Interestingly, as I discussed in my previous article on reasoning models (Understanding Reasoning LLMs), the DeepSeek R1 paper explicitly mentioned that R1 did not use inference-time scaling. However, they acknowledged that it is something they could easily incorporate into the R1 deployment or application.

虽然我将推理时计算扩展单独分类以关注此上下文中的方法，但重要的是要注意，这种技术可以应用于任何 LLM.例如，OpenAI 使用强化学习开发了其 o1 模型，然后还利用了推理时计算扩展。有趣的是，正如我在上一篇文章（理解推理LLMs）中所讨论的那样，DeepSeek R1 论文明确提到 R1 没有使用推理时间缩放。但是，他们承认，他们可以轻松地将其整合到 R1 部署或应用程序中。

2. Pure reinforcement learning
2. 纯强化学习

This approach focuses solely on reinforcement learning (RL) to develop or improve reasoning capabilities. It typically involves training models with verifiable reward signals from math or coding domains. While RL allows models to develop more strategic thinking and self-improvement capabilities, it comes with challenges such as reward hacking, instability, and high computational costs.
这种方法只专注于强化学习（RL）来发展或提高推理能力。它通常涉及使用来自数学或编码领域的可验证奖励信号来训练模型。虽然 RL 允许模型发展更多的战略思维和自我提升能力，但它也带来了奖励黑客攻击、不稳定和高计算成本等挑战。

3. Reinforcement learning and supervised fine-tuning
3. 强化学习和监督微调

This hybrid approach combines RL with supervised fine-tuning (SFT) to achieve more stable and generalizable improvements than pure RL. Typically, a model is first trained with SFT on high-quality instruction data and then further refined using RL to optimize specific behaviors.
这种混合方法将 RL 与监督微调（SFT）相结合，以实现比纯 RL 更稳定和可推广的改进。通常，模型首先使用 SFT 在高质量指令数据上进行训练，然后使用 RL 进一步优化以优化特定行为。

4. Supervised fine-tuning and model distillation
4. 监督微调和模型蒸馏

This method improves the reasoning capabilities of a model by instruction fine-tuning it on high-quality labeled datasets (SFT). If this high-quality dataset is generated by a larger LLM, then this methodology is also referred to as "knowledge distillation" or just "distillation" in LLM contexts. However, note that this differs slightly from traditional knowledge distillation in deep learning, which typically involves training a smaller model using not only the outputs (labels) but also the logits of a larger teacher model.

这种方法通过在高质量标记数据集（SFT）上指令微调模型来提高模型的推理能力。如果这个高质量的数据集是由较大的 LLM生成的，那么这种方法在LLM上下文中也称为“知识蒸馏”或简称“蒸馏”。但是，请注意，这与深度学习中的传统知识蒸馏略有不同，深度学习通常涉及不仅使用输出（标签）训练较小的模型，还使用较大教师模型的 logits。

Inference-time compute scaling methods推理时计算扩展方法

The previous section already briefly summarized inference-time compute scaling. Before discussing the recent research in this category, let me describe the inference-time scaling in a bit more detail.
上一节已经简要总结了推理时计算扩展。在讨论该类别的最新研究之前，让我更详细地描述一下推理时间缩放。

Inference-time scaling improves an LLM's reasoning by increasing computational resources ("compute") during inference. The idea why this can improve reasoning can be given with a simple analogy: humans give better responses when given more time to think, and similarly, LLMs can improve with techniques that encourage more "thought" during generation.
推理时间扩展通过在推理过程中增加计算资源（“计算”）来改进 LLM的推理。为什么这可以提高推理能力，可以用一个简单的类比来给出：当给人类更多的思考时间时，人类会做出更好的反应，同样，LLMs也可以通过鼓励在生成过程中进行更多“思考”的技术来改进。

One approach here is prompt engineering, such as chain-of-thought (CoT) prompting, where phrases like "think step by step" guide the model to generate intermediate reasoning steps. This improves accuracy on complex problems but is unnecessary for simple factual queries. Since CoT prompts generate more tokens, they effectively make inference more expensive.
这里的一种方法是提示工程，例如思维链（CoT）提示，其中 “think step by step” 等短语指导模型生成中间推理步骤。这可以提高复杂问题的准确性，但对于简单的事实查询来说，这是不必要的。由于 CoT 提示会生成更多的令牌，因此它们实际上会使推理的成本更高。

An example of classic CoT prompting from the 2022 Large Language Models are Zero-Shot Reasoners paper (
https://arxiv.org/abs/2205.11916).
来自 2022 年大型语言模型的经典 CoT 提示的一个例子是 Zero-Shot Reasoners 论文（
https://arxiv.org/abs/2205.11916）。

Another method involves voting and search strategies, such as majority voting or beam search, which refine responses by selecting the best output.
另一种方法涉及投票和搜索策略，例如多数投票或光束搜索，这些策略通过选择最佳输出来优化响应。

Different search-based methods rely on a process-reward-based model to select the best answer. Annotated figure from the LLM Test-Time Compute paper,
https://arxiv.org/abs/2408.03314
不同的基于搜索的方法依赖于基于流程奖励的模型来选择最佳答案。Test-Time Compute 论文中的LLM注释图
https://arxiv.org/abs/2408.03314

1. "s1: Simple test-time scaling"1. “s1：简单的测试时缩放”

The remainder of this article will be focused on the recent research advances in the inference-time scaling category for improving reasoning capabilities of LLMs. Let me start with a more detailed discussion of a paper that serves as an example of inference-time scaling.
本文的其余部分将重点介绍推理时间缩放类别中的最新研究进展，以提高 LLMs的推理能力。首先，我要更详细地讨论一篇论文，该论文是推理时间扩展的一个示例。

So, one of the interesting recent research papers in this category is s1: Simple Test-Time Scaling (31 Jan, 2025), which introduces so-called "wait" tokens, which can be considered as a more modern version of the aforementioned "think step by step" prompt modification.
因此，该类别中最近有趣的研究论文之一是 s1：Simple Test-Time Scaling（2025 年 1 月 31 日），它引入了所谓的 “wait” token，可以被认为是上述 “think step by step” 提示修改的更现代版本。

Note that this involves supervised finetuning (SFT) to generate the initial model, so it's not a pure inference-time scaling approach. However, the end goal is actively controlling the reasoning behavior through inference-time scaling; hence, I considered this paper for the "1. Inference-time compute scaling" category.
请注意，这涉及监督微调（SFT）来生成初始模型，因此它不是一种纯粹的推理时间缩放方法。但是，最终目标是通过推理时间缩放主动控制推理行为;因此，我将这篇论文视为“1.Inference-time compute scaling“类别。

In short, their approach is twofold:
简而言之，他们的方法有两个方面：

Create a curated SFT dataset with 1k training examples that include reasoning traces.
使用 1k 训练示例（包含推理跟踪）创建精选的 SFT 数据集。
Control the length of responses by:
通过以下方式控制响应的长度：
a) Appending "Wait" tokens to get the LLM to generate longer responses, self-verify, and correct itself, or
a）附加 “Wait” 令牌以获取 LLM 以生成更长的响应、自我验证和更正自身，或者
b) Stopping generation by adding an end-of-thinking token delimiter ("Final Answer:"). They call this length control "budget forcing."
b）通过添加 end-of-thinking token 分隔符（“Final Answer：”）来停止生成。他们将这种长度控制称为 “预算强制”。

Illustration of "wait" token insertion to control the length of the output. Annotated figure from
https://arxiv.org/abs/2501.19393.
用于控制输出长度的 “wait” 标记插入图示。来自
https://arxiv.org/abs/2501.19393 的注释图。

Budget forcing can be seen as a sequential inference scaling technique because it still generates one token at a time (but just more of it). In contrast, we have parallel techniques like majority voting, which aggregate multiple independent completions.
预算强制可以看作是一种顺序推理扩展技术，因为它仍然一次生成一个 Token（但只是更多）。相比之下，我们有并行技术，如多数投票，它聚合了多个独立的完成。

Correlation between response accuracy and length. Annotated figure from
https://arxiv.org/abs/2501.19393.
响应准确性和长度之间的相关性。来自
https://arxiv.org/abs/2501.19393 的注释图。

They found their budget-forcing method more effective than other inference-scaling techniques I've discussed, like majority voting. If there's something to criticize or improve, I would've liked to see results for more sophisticated parallel inference-scaling methods, like beam search, lookahead search, or the best compute-optimal search described in Google's Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters paper last year. Or even a simple comparison with a classic sequential method like chain-of-thought prompting ("Think step by step").
他们发现他们的预算强迫方法比我讨论过的其他推理扩展技术（如多数投票）更有效。如果有什么要批评或改进的地方，我希望看到更复杂的并行推理扩展方法的结果，例如光束搜索、前瞻搜索，或者 Google 去年的 Scaling LLM Test-Time Compute Can be more effective 论文中描述的最佳计算最优搜索。甚至与经典的序列方法进行简单比较，如思维链提示（“逐步思考”）。

Anyway, it's a really interesting paper and approach!
无论如何，这是一篇非常有趣的论文和方法！

PS: Why "Wait" tokens? My guess is the researchers were inspired by the "Aha moment" figure in the DeepSeek-R1 paper, where researchers saw LLMs coming up with something like "Wait, wait. Wait. That's an aha moment I can flag here." which showed that pure reinforcement learning can induce reasoning behavior in LLMs.
PS：为什么是 “Wait” 代币？ 我猜研究人员受到了 DeepSeek-R1 论文中“顿悟时刻”图的启发，研究人员在论文中看到了LLMs类似“等等，等等。等。这是我可以在这里标记的顿悟时刻。“这表明纯强化学习可以诱导中的LLMs推理行为。

Interestingly, they also tried other tokens like "Hmm" but found that "Wait" performed slightly better.
有趣的是，他们还尝试了 “Hmm” 等其他代币，但发现 “Wait” 的表现略好一些。

"Wait" vs "Hmm" tokens. Annotated figure from
https://arxiv.org/abs/2501.19393.
“Wait” 与 “Hmm” 代币。来自
https://arxiv.org/abs/2501.19393 的注释图。

Other noteworthy research papers on inference-time compute scaling其他关于推理时计算扩展的值得注意的研究论文

Since it's been a very active month on the reasoning model research front, I need to keep the summaries of other papers relatively brief to manage a reasonable length for this article. Hence, below are brief summaries of other interesting research articles related to inference-time compute scaling, sorted in ascending order by publication date.
由于在推理模型研究方面是一个非常活跃的月份，我需要保持其他论文的摘要相对简短，以便为这篇文章留出合理的长度。因此，以下是与推理时间计算扩展相关的其他有趣研究文章的简要摘要，按出版日期升序排序。

As mentioned earlier, not all of these articles fall neatly into the inference-time compute scaling category, as some of them also involve specific training. However, these papers have in common that controlling inference-time compute is a specific mechanism of action. (Many distilled or SFT methods that I will cover in upcoming articles will lead to longer responses, which can be seen as a form of inference-time compute scaling. However, they do not actively control the length during inference, which makes these methods different from those covered here.)
如前所述，并非所有这些文章都完全属于推理时间计算扩展类别，因为其中一些还涉及特定的训练。但是，这些论文的共同点是控制推理时间计算是一种特定的作用机制。（我将在接下来的文章中介绍的许多蒸馏或 SFT 方法将导致更长的响应，这可以被视为推理时间计算扩展的一种形式。但是，它们在推理过程中不会主动控制长度，这使得这些方法与此处介绍的方法不同。

2.Test-Time Preference Optimization2. 测试时首选项优化

22 Jan, Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback,
https://arxiv.org/abs/2501.12895
1 月 22 日，Test-Time Preference Optimization： On-the-Alignment via Iterative Textual Feedback，
https://arxiv.org/abs/2501.12895

Test-time Preference Optimization (TPO) is an iterative process that aligns LLM outputs with human preferences during inference (this is without altering its underlying model weights). In each iteration, the model:
测试时偏好优化（TPO）是一个迭代过程，它在推理过程中使LLM输出与人类偏好保持一致（这不会改变其底层模型权重）。在每次迭代中，模型：

Generates multiple responses for a given prompt.
为给定提示生成多个响应。
Score the responses with a reward model to select the highest- and lowest-scoring ones as "chosen" and "rejected" responses
使用奖励模型对响应进行评分，以选择得分最高和最低的响应作为 “selected” 和 “rejected” 响应
Prompt the model to compare and critique the "chosen" and "rejected" responses
提示模型比较和批评 “chosen” 和 “rejected” 响应
Refine the output by converting the critiques into textual suggestions to update the original model responses
通过将评论转换为文本建议来更新原始模型响应来优化输出

By doing steps 1-4 iteratively, the model refines its original responses.
通过迭代执行步骤 1-4，模型可以优化其原始响应。

Annotated figure from "Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback",
https://arxiv.org/abs/2501.12895
注释图来自 “Test-Time Preference Optimization： On-the-Fly Alignment via Iterative Textual Feedback”，
https://arxiv.org/abs/2501.12895

3. Thoughts Are All Over the Place3. 想法无处不在

30 Jan, Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs,
https://arxiv.org/abs/2501.18585
1 月 30 日，思绪无处不在：关于 o1-like LLMs的思考不足，
https://arxiv.org/abs/2501.18585

The researchers explore a phenomenon called "underthinking", where reasoning models frequently switch between reasoning paths instead of fully focusing on exploring promising ones, which lowers the problem solving accuracy.
研究人员探索了一种称为 “思考不足” 的现象，即推理模型经常在推理路径之间切换，而不是完全专注于探索有前途的路径，这降低了解决问题的准确性。

To address this "underthinking" issue, they introduce a method called the Thought Switching Penalty (TIP), which modifies the logits of thought-switching tokens to discourage premature reasoning path transitions.
为了解决这个 “思考不足” 问题，他们引入了一种称为 Thought Switching Penalty （TIP）的方法，该方法修改了 thought switching标记的日志，以阻止过早的推理路径转换。

Their approach does not require model fine-tuning and empirically improves accuracy across multiple challenging test sets.
他们的方法不需要对模型进行微调，并根据经验提高了多个具有挑战性的测试集的准确性。

Annotated figure from "Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs",
https://arxiv.org/abs/2501.18585
注释人物出自《Thoughts Are All Over the Place： On the Underthinking of o1-LikeLLMs》，
https://arxiv.org/abs/2501.18585

4. Trading Inference-Time Compute for Adversarial Robustness4. 用推理时间计算换取对抗鲁棒性

31 Jan, Trading Inference-Time Compute for Adversarial Robustness,
https://arxiv.org/abs/2501.18841
1 月 31 日，将推理时间计算换取对抗性鲁棒性，
https://arxiv.org/abs/2501.18841

Increasing inference-time compute improves the adversarial robustness of reasoning LLMs in many cases in terms of reducing the rate of successful attacks. Unlike adversarial training, this method does not need any special training or require prior knowledge of specific attack types.
在许多情况下，增加推理时间计算可以提高推理LLMs的对抗鲁棒性，从而降低攻击成功率。与对抗性训练不同，这种方法不需要任何特殊训练，也不需要特定攻击类型的先验知识。

However, there are some important exceptions. For example, the improvements in settings involving policy ambiguities or loophole exploitation are limited. Additionally, the reasoning-improved robustness increases can be reduced by new attack strategies such as "Think Less" and "Nerd Sniping".
但是，也有一些重要的例外。例如，涉及策略歧义或漏洞利用的设置改进是有限的。此外，推理改进的稳健性增加可以通过新的攻击策略（例如“少想”和“书狙击”）来减少。

So, while these findings suggest that scaling inference-time compute can improve LLM safety, this alone is not a complete solution to adversarial robustness.
因此，虽然这些发现表明扩展推理时间计算可以提高LLM安全性，但仅凭这一点并不能完全解决对抗性鲁棒性问题。

Annotated figure from "Trading Inference-Time Compute for Adversarial Robustness",
https://arxiv.org/abs/2501.18841
注释图来自 “Trading Inference-Time Compute for Adversarial Robustness”，
https://arxiv.org/abs/2501.18841

5. Chain-of-Associated-Thoughts5. 关联思想链

4 Feb, CoAT:
Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning,
https://arxiv.org/abs/2502.02390
2 月 4 日，CoAT：用于增强大型语言模型推理的关联思维链框架，
https://arxiv.org/abs/2502.02390

The researchers combine classic Monte Carlo Tree Search inference-time scaling with an "associative memory" that serves as the LLM's knowledge base during the exploration of reasoning pathways. Using this so-called associative memory, it's easier for the LLM to consider earlier reasoning pathways and use dynamically involving information during the response generation.
研究人员将经典的 Monte Carlo Tree Search 推理时间缩放与 “联想记忆” 相结合，在探索推理路径期间作为 LLM的知识库。使用这种所谓的联想记忆，LLM更容易考虑早期的推理途径，并在响应生成过程中动态使用涉及的信息。

Annotated figure from "CoAT:
Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning",
https://arxiv.org/abs/2502.02390
注释图来自“CoAT：用于增强大型语言模型推理的关联思维框架”，
https://arxiv.org/abs/2502.02390

6. Step Back to Leap Forward6. 后退一步，实现飞跃

6 Feb, Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models,
https://arxiv.org/abs/2502.0440
2 月 6 日，Step Back to Leap Forward： Self-Backtracking for Boosting Reasoning of Language Models，
https://arxiv.org/abs/2502.0440

This paper proposes a self-backtracking mechanism that allows LLMs to improve their reasoning by learning when and where to backtrack during training and inference. While training involves teaching the model to recognize and correct suboptimal reasoning paths using a token, the key contribution is an inference-time tree-based search that uses this learned backtracking ability to explore alternative solutions.
本文提出了一种自我回溯机制，允许LLMs通过在训练和推理过程中学习何时何地回溯来提高他们的推理能力。虽然训练涉及教模型使用标记识别和纠正次优推理路径，但关键贡献是基于推理时树的搜索，它使用这种学习到的回溯功能来探索替代解决方案。

What's unique is that this exploration does not require without relying on external reward models (unlike the search-based methods that use a process-reward-based model that I mentioned at the beginning of the "1. Inference-time compute scaling methods" section in this article).
独特的是，这种探索不需要不依赖外部奖励模型（与我在“1.推理时计算扩展方法“部分）。

Annotated figure from "Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models",
https://arxiv.org/abs/2502.04404
注释图来自“Step Back to Leap Forward： Self-Backtracking for Boosting Reasoning of Language Models”，
https://arxiv.org/abs/2502.04404

I added this paper here as it's heavily focused on the proposed backtracking inference-time scaling method, which improves reasoning by dynamically adjusting search depth and breadth rather than fundamentally altering the training paradigm (although, the training with tokens is required).
我在这里添加了这篇论文，因为它主要关注所提出的回溯推理时间缩放方法，该方法通过动态调整搜索深度和广度来改进推理，而不是从根本上改变训练范式（尽管，需要使用标记进行训练）。

7. Scaling up Test-Time Compute with Latent Reasoning7. 使用 Latent Reasoning 扩展 Test-Time Compute

7 Feb, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach,
https://arxiv.org/abs/2502.05171
2 月 7 日，使用潜在推理扩展测试时计算：一种递归深度方法，
https://arxiv.org/abs/2502.05171

Instead of improving reasoning by generating more tokens, the researchers propose a model that scales inference-time compute by iterating over a recurrent depth block in latent space. This block functions like a hidden state in RNNs, which allows the model to refine its reasoning without requiring longer token outputs.
研究人员没有通过生成更多代币来改进推理，而是提出了一种模型，该模型通过迭代潜在空间中的递归深度块来扩展推理时间计算。这个块的功能类似于 RNN 中的隐藏状态，它允许模型改进其推理，而无需更长的令牌输出。

However, a key drawback is the lack of explicit reasoning steps, which are (in my opinion) useful for human interpretability and a major advantage of chain-of-thought methods.
然而，一个关键的缺点是缺乏明确的推理步骤，这些步骤（在我看来）对人类的可解释性很有用，也是思维链方法的一个主要优势。

Annotated figure from "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach",
https://arxiv.org/abs/2502.05171
注释图来自“Scaling up Test-Time Compute with Latent Reasoning： A Recurrent Depth Approach”，
https://arxiv.org/abs/2502.05171

8. Can a 1B LLM Surpass a 405B LLM?8. 1B LLM 能否超越 405B LLM？

10 Feb, Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling,
https://arxiv.org/abs/2502.06703
2 月 10 日，1B LLM 能否超越 405B LLM？重新思考 Compute-Optimal Test-Time Scaling，
https://arxiv.org/abs/2502.06703

Many inference-time scaling techniques depend on sampling, which requires a Process Reward Model (PRM) to select the best solution. This paper systematically analyzes how inference-time compute scaling interacts with PRMs and problem difficulty.
许多推理时间扩展技术都依赖于抽样，这需要过程奖励模型（PRM）来选择最佳解决方案。本文系统地分析了推理时间计算扩展如何与 PRM 和问题难度相互作用。

The researchers develop a compute-optimal scaling strategy that adapts to the choice of PRM, policy model, and task complexity. Their results show that with the right inference-time scaling approach, a 1B parameter model can outperform a 405B Llama 3 model that lacks inference-time scaling.
研究人员开发了一种计算最优扩展策略，该策略可适应 PRM 的选择、策略模型和任务复杂性。他们的结果表明，使用正确的推理时间缩放方法，1B 参数模型可以胜过缺乏推理时间缩放的 405B Llama 3 模型。

Similarly, they show how a 7B model with inference-time scaling surpasses DeepSeek-R1 while maintaining higher inference efficiency.
同样，他们展示了具有推理时间缩放的 7B 模型如何超越 DeepSeek-R1，同时保持更高的推理效率。

These findings highlight how inference-time scaling can significantly improve LLMs, where small LLMs, with the right inference compute budget, can outperform much larger models.
这些发现强调了推理时间扩展如何显著改进LLMs，其中小型 LLMs，具有正确的推理计算预算，可以胜过大型模型。

Annotated figure from "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling",
https://arxiv.org/abs/2502.06703
注释图来自“1B LLM 能否超越 405B LLM？重新思考计算最优测试时扩展“，
https://arxiv.org/abs/2502.06703

9. Inference-Time Computations for LLM Reasoning and Planning9. 用于LLM推理和规划的推理时间计算

18 Feb, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights,
https://www.arxiv.org/abs/2502.12521
2 月 18 日，用于推理和规划的LLM推理时间计算：基准和见解，
https://www.arxiv.org/abs/2502.12521

This paper benchmarks various inference-time compute scaling techniques for reasoning and planning tasks with a focus on analyzing their trade-offs between computational cost and performance.
本文对用于推理和规划任务的各种推理时间计算扩展技术进行了基准测试，重点是分析它们在计算成本和性能之间的权衡。

The authors evaluate multiple techniques—such as Chain-of-Thought, Tree-of-Thought, and Reasoning as Planning across eleven tasks spanning arithmetic, logical, commonsense, algorithmic reasoning, and planning.
作者评估了多种技术，例如 Chain-of-Thought、Tree-of-Thought 和 Reasoning as Planning，涵盖算术、逻辑、常识、算法推理和规划的 11 项任务。

The main finding is that while scaling inference-time computation can improve reasoning, no single technique consistently outperforms others across all tasks.
主要发现是，虽然扩展推理时间计算可以改进推理，但没有一种技术在所有任务中始终优于其他技术。

Annotated figure from Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights,
https://www.arxiv.org/abs/2502.12521
注释图来自 推理和规划的推理时间计算LLM：基准和见解，
https://www.arxiv.org/abs/2502.12521

10. Inner Thinking Transformer10. 内在思维变压器

19 Feb, Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking,
https://arxiv.org/abs/2502.13842
2 月 19 日，《内在思维转变者：利用动态深度缩放培养适应性内在思维》，
https://arxiv.org/abs/2502.13842

The Inner Thinking Transformer (ITT) dynamically allocates more compute during inference. Instead of using a fixed depth (= using same number of layers) for all tokens as in standard transformer-based LLMs, ITT employs Adaptive Token Routing to allocate more compute to difficult tokens. These difficult tokens pass through the same layer multiple times to undergo additional processing, which increases the inference-compute budget for these difficult tokens.
Inner Thinking Transformer （ITT）在推理过程中动态分配更多计算。ITT 不是像基于LLMs标准 transformer 的那样对所有令牌使用固定深度（= 使用相同数量的层），而是采用自适应令牌路由将更多计算分配给困难的令牌。这些困难的令牌多次通过同一层进行额外的处理，这增加了这些困难令牌的推理计算预算。

Annotated figure from "Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking",
https://arxiv.org/abs/2502.13842
注释图来自“Inner Thinking Transformer： Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking”，
https://arxiv.org/abs/2502.13842

11. Test Time Scaling for Code Generation11. 测试代码生成的时间缩放

20 Feb, S*: Test Time Scaling for Code Generation,
https://arxiv.org/abs/2502.14382
2 月 20 日，S*：代码生成的测试时间缩放，
https://arxiv.org/abs/2502.14382

Inference-time scaling can be achieved by parallel scaling (generating multiple answers), sequential scaling (iteratively refining answers), or both as described in the Google paper from Summer 2024 (Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters).
推理时扩展可以通过并行扩展（生成多个答案）、顺序扩展（迭代优化答案）或两者来实现，如 2024 年夏季的 Google 论文中所述（以最佳方式扩展LLM测试时计算可能比扩展模型参数更有效）。

S* is a test-time compute scaling method designed specifically for code generation that improves both parallel scaling (generating multiple solutions) and sequential scaling (iterative debugging).
S* 是一种专为代码生成而设计的测试时计算扩展方法，可改进并行扩展（生成多个解决方案）和顺序扩展（迭代调试）。

Annotated figure from "S*: Test Time Scaling for Code Generation",
https://arxiv.org/abs/2502.14382
注释图来自 “S*： Test Time Scaling for Code Generation”
https://arxiv.org/abs/2502.14382

The approach operates in two stages:
该方法分两个阶段进行：

Stage 1: Generation 第 1 阶段：生成

The model generates multiple code solutions and iteratively refines them using execution results and test cases provided in the problem prompt.
该模型生成多个代码解决方案，并使用问题提示中提供的执行结果和测试用例迭代优化它们。

Think of this like a coding competition where a model submits solutions, runs tests, and fixes mistakes:
将此视为一场编码竞赛，模型在其中提交解决方案、运行测试并修复错误：

1. The model generates multiple candidate solutions.
1. 该模型生成多个候选解。

2. Each solution is executed on public test cases (predefined input-output pairs).
2. 每个解决方案都在公共测试用例（预定义的输入-输出对）上执行。

3. If a solution fails (incorrect output or crashes), the model analyzes the execution results (errors, outputs) and modifies the code to improve it.
3. 如果解决方案失败（输出不正确或崩溃），模型会分析执行结果（错误、输出）并修改代码以改进它。

4. This refinement process continues iteratively until the model finds solutions that pass the test cases.
4. 此优化过程以迭代方式继续，直到模型找到通过测试用例的解决方案。

For example, suppose the model is asked to implement a function is_even(n) that returns True for even numbers and False otherwise.
例如，假设要求模型实现一个函数 is_even（n），该函数对偶数返回 True，否则返回 False。

The model’s first attempt might be:
模型的第一次尝试可能是：

def is_even(n):
    return n % 2  #  Incorrect: should be `== 0`

The model tests this implementation with public test cases:
该模型使用公共测试用例测试此实现：

Input	        Expected	Model Output	Status
is_even(4)	True	        False	         Fail
is_even(3)	False	        True	         Fail

After reviewing the results, the model realizes that 4 % 2 returns 0, not True, so it modifies the function:
在查看结果后，模型意识到 4 % 2 返回 0，而不是 True，因此它修改了函数：

def is_even(n):
    return n % 2 == 0  #  Corrected

Now the function passes all public tests, completing the debugging phase.
现在，该函数通过了所有公共测试，完成了调试阶段。

Stage 2: Selection 第 2 阶段：选择

Once multiple solutions have passed public tests, the model must choose the best one (if possible). Here, S* introduces adaptive input synthesis to avoid random picking:
多个解决方案通过公共测试后，模型必须选择最佳解决方案（如果可能）。在这里，S* 引入了自适应输入综合以避免随机选择：

1. The model compares two solutions that both pass public tests.
1. 该模型比较了两个都通过公共测试的解决方案。

2. It asks itself: "Can I generate an input that will reveal a difference between these solutions?"
2. 它问自己：“我能否生成一个输入来揭示这些解决方案之间的差异？

3. It creates a new test input and runs both solutions on it.
3. 它创建一个新的测试输入并在其上运行两个解决方案。

4. If one solution produces the correct output while the other fails, the model selects the better one.
4. 如果一个解决方案产生正确的输出，而另一个解决方案失败，则模型会选择更好的解决方案。

5. If both solutions behave identically, the model randomly picks one.
5. 如果两个解的行为相同，则模型会随机选择一个。

For example, consider two different implementations of is_perfect_square(n):
例如，考虑 is_perfect_square（n）的两种不同实现：

import math

def is_perfect_square_A(n):
    return math.isqrt(n) ** 2 == n

def is_perfect_square_B(n):
    return math.sqrt(n).is_integer()

Both pass the provided test cases for simple examples:
两者都通过了提供的测试用例，例如简单的示例：

n = 25
print(is_perfect_square_A(n))  #  True (Correct)
print(is_perfect_square_B(n))  #  True (Correct)

But when the LLM generates edge cases we can see one of them fail, so the model would select the solution A in this case:
但是当生成边缘情况时，LLM我们可以看到其中一个失败，因此在这种情况下，模型会选择解决方案 A：

n = 10**16 + 1
print(is_perfect_square_A(n))  #  False (Correct)
print(is_perfect_square_B(n))  #  True (Incorrect)

12. Chain of Draft 12. 选秀链

25 Feb, Chain of Draft: Thinking Faster by Writing Less,
https://arxiv.org/abs/2502.18600
2 月 25 日，选秀链：少写，思考更快，
https://arxiv.org/abs/2502.18600

The researchers observe that while reasoning LLMs often generate verbose step-by-step explanations, humans typically rely on concise drafts that capture only essential information.
研究人员观察到，虽然推理LLMs通常会生成冗长的分步解释，但人类通常依赖于仅捕获基本信息的简洁草稿。

Inspired by this, they propose Chain of Draft (CoD), a prompting strategy that reduces verbosity by generating minimal but informative intermediate steps. So, in a sense it's a method for inference-time scaling that improves the efficiency of inference-time scaling through generating fewer tokens.
受此启发，他们提出了 Chain of Draft （CoD），这是一种提示策略，通过生成最小但信息丰富的中间步骤来减少冗长。因此，从某种意义上说，它是一种推理时间扩展方法，通过生成更少的令牌来提高推理时间扩展的效率。

Annotated figures from "Chain of Draft: Thinking Faster by Writing Less",
https://arxiv.org/abs/2502.18600
注释图来自“Chain of Draft： Thinking Faster by Writing Less”，
https://arxiv.org/abs/2502.18600

Looking at the results, it seems that CoD is almost as brief as standard prompting, but as accurate as Chain of Thought (CoT) prompting. As I mentioned earlier, in my opinion, one of the advantages of reasoning models is that users can read the reasoning traces to learn and to better evaluate / trust the response. CoD somewhat diminishes the advantage of CoD. However, it might come in very handy where verbose intermediate steps are not needed as it speeds up the generation while maintaining the accuracy of CoT.
从结果来看，CoD 似乎几乎与标准提示一样简短，但与思维链（CoT）提示一样准确。正如我之前提到的，在我看来，推理模型的优势之一是用户可以阅读推理轨迹来学习并更好地评估/信任响应。CoD 在一定程度上削弱了 CoD 的优势。但是，在不需要冗长的中间步骤的情况下，它可能会非常方便，因为它可以加快生成速度，同时保持 CoT 的准确性。

Conclusion 结论

Inference-time compute scaling has become one of the hottest research topics this year to improve the reasoning abilities of large language models without requiring modification to model weights.
推理时计算扩展已成为今年最热门的研究主题之一，无需修改模型权重即可提高大型语言模型的推理能力。

The many techniques I summarized above range from simple token-based interventions like “Wait” tokens to sophisticated search and optimization-based strategies such as Test-Time Preference Optimization and
Chain-of-Associated-Thoughts.
我上面总结的许多技术范围从简单的基于令牌的干预（如 “等待”令牌）到复杂的搜索和优化策略（如测试时偏好优化和
Chain-of-Associated-Thoughts）。

On the big-picture level, one recurring theme is that increasing compute at inference allows even relatively small models to achieve substantial improvements (on reasoning benchmarks) compared to standard approaches.
在大局层面上，一个反复出现的主题是，与标准方法相比，增加推理的计算能力甚至可以让相对较小的模型实现实质性改进（在推理基准上）。

This suggests that inference strategies can help narrow the performance gap between smaller, more cost-effective models and their larger counterparts.
这表明推理策略可以帮助缩小较小、更具成本效益的模型与较大的模型之间的性能差距。

The cost caveat 成本警告

The caveat is that inference-time scaling increases the inference costs, so whether to use a small model with substantial inference scaling or training a larger model and using it with less or no inference scaling is a math that has to be worked out based on how much use the model gets.
需要注意的是，推理时间扩展会增加推理成本，因此是使用具有大量推理扩展的小型模型，还是训练较大的模型并在较少或没有推理扩展的情况下使用它，这是一个数学计算，必须根据模型的使用量来计算。

As an example, an o1 model, which uses heavy inference time scaling, is actually still slightly cheaper than a likely larger GPT-4.5 model that likely doesn't use inference time scaling.
例如，使用大量推理时间扩展的 o1 模型实际上仍然比可能不使用推理时间扩展的可能更大的 GPT-4.5 模型略便宜。

(It will be interesting to see how well GPT-4.5 will perform with o1- or o3-style inference-time scaling.)
（看看 GPT-4.5 在 o1 或 o3 风格的推理时间缩放下的表现会很有趣。

Which technique? 哪种技术？

However, inference-time compute scaling is not a silver bullet. While methods like Monte Carlo Tree Search, self-backtracking, and dynamic-depth scaling can substantially improve reasoning performance, the effectiveness also still depends on the task and difficulty. As one of the earlier papers showed, there's no inference-time compute scaling technique that performs best across all tasks.
但是，推理时计算扩展并不是灵丹妙药。虽然 Monte Carlo Tree Search、self-backtracking 和动态深度缩放等方法可以大大提高推理性能，但有效性仍然取决于任务和难度。正如之前的一篇论文所表明的那样，没有一种推理时计算扩展技术在所有任务中表现最佳。

Additionally, many of these approaches trade off response latency for improved reasoning, and slow responses can be annoying to some users. For instance, I usually switch from o1 to GPT4o if I have simple tasks due to the faster response time.
此外，其中许多方法都在牺牲响应延迟来改进推理，并且响应缓慢可能会让某些用户感到烦恼。例如，如果我有简单的任务，我通常会从 o1 切换到 GPT4o，因为响应时间更快。

What's next 下一步

Looking ahead, I think we will see many more papers this year centered around the two main branches of "reasoning via inference-time compute scaling" research:
展望未来，我认为今年我们将看到更多围绕“通过推理时计算扩展进行推理”研究的两个主要分支的论文：

1. Research that is purely centered around developing the best possible model topping the benchmarks.
1. 纯粹以开发超越基准的最佳模型为中心的研究。

2. Research that is concerned with balancing cost and performance trade-offs across different reasoning tasks.
2. 关注在不同推理任务之间平衡成本和性能权衡的研究。

Either way, what's nice about inference-time compute scaling is that it can be applied to any type of existing LLM to make it better for specific tasks.
无论哪种方式，推理时计算扩展的优点在于，它可以应用于任何类型的现有LLM扩展，使其更适合特定任务。

Thinking on Demand 按需思考

An interesting trend on the industry side is what I refer to as "thinking on demand". Following the release of DeepSeek R1, it feels like companies have been rushing to add reasoning capabilities to their offerings.
行业方面的一个有趣趋势是我所说的 “按需思考”。在 DeepSeek R1 发布之后，感觉公司一直在急于为其产品添加推理功能。

An interesting development here is that most LLM providers started to add the option for users to enable or disable thinking. An interesting development is that most LLM providers now allow users to enable or disable these "thinking" features. The mechanism is not publicly shared, but it's likely the same model with dialed-back inference-time compute scaling.
一个有趣的发展是，大多数LLM提供商开始为用户添加启用或禁用 thinking 的选项。一个有趣的发展是，大多数LLM提供商现在允许用户启用或禁用这些 “思考” 功能。该机制未公开共享，但它可能是具有回拨推理时间计算扩展的相同模型。

For instance, Claude 3.7 Sonnet and Grok 3 now have a "thinking" that users can enable for their model, whereas OpenAI requires users to switch between models. For example, GPT4o/4.5 and o1/o3-mini if they want to use explicit reasoning models. However, the OpenAI CEO mentioned that GPT4.5 will likely be their last model, which doesn't explicitly have a reasoning or "thinking" mode. On the open-source side, even IBM added an explicit "thinking" toggle to their Granite models.
例如，Claude 3.7 Sonnet 和 Grok 3 现在有一个用户可以为他们的模型启用的“想法”，而 OpenAI 则需要用户在模型之间切换。例如，GPT4o/4.5 和 o1/o3-mini，如果他们想使用显式推理模型。然而，OpenAI 首席执行官提到，GPT4.5 可能是他们的最后一个模型，它没有明确具有推理或“思考”模式。在开源方面，甚至 IBM 也向他们的 Granite 模型添加了一个明确的 “思考” 开关。

Overall, the trend of adding reasoning capabilities whether via inference-time or train-time compute scaling is a major step forward for LLMs in 2025.
总体而言，无论是通过推理时间还是训练时间计算扩展来增加推理功能的趋势是 2025 年向前迈出的重要LLMs一步。

In time, I expect that reasoning will no longer be treated as an optional or special feature but will instead become the standard, much as instruction-finetuned or RLHF-tuned models are now the norm over raw pretrained models.
随着时间的推移，我预计推理将不再被视为可选或特殊功能，而是成为标准，就像指令微调或 RLHF 调整模型现在是原始预训练模型的常态一样。

As mentioned earlier, this article solely focused on inference-time compute length due to its already long lengths, thanks to the very active reasoning research activity. In a future article, I plan to cover all the interesting train-time compute scaling methods for reasoning.
如前所述，由于推理研究活动非常活跃，本文只关注推理时间计算长度，因为它的长度已经很长了。 在以后的文章中，我计划介绍所有有趣的训练时间计算扩展方法以进行推理。

This magazine is a personal passion project. For those who wish to support me, please consider purchasing a copy of my Build a Large Language Model (From Scratch) book. (I am confident that you'll get lots out of this book as it explains how LLMs work in a level of detail that is not found anywhere else.)
这本杂志是一个我个人的激情项目。对于那些希望支持我的人，请考虑购买我的 Build a Large Language Model （From Scratch）一书。（我相信你会从这本书中学到很多东西，因为它以其他任何地方都找不到的细节水平解释了LLMs如何工作。

Build a Large Language Model (From Scratch) now available on Amazon

标签

上一篇：模板安装工程施工技术交底（模板安装施工方案）
下一篇：FedEx正测试Crosstrack功能:帮用户追踪快递

The State of LLM Reasoning Models/推理模型的现状

Part 1: Inference-Time Compute Scaling Methods

第 1 部分：推理时计算扩展方法

Implementing and improving reasoning in LLMs: The four main categories实施和改进推理LLMs：四个主要类别

Inference-time compute scaling methods推理时计算扩展方法

1. "s1: Simple test-time scaling"1. “s1：简单的测试时缩放”

Other noteworthy research papers on inference-time compute scaling其他关于推理时计算扩展的值得注意的研究论文

2.Test-Time Preference Optimization2. 测试时首选项优化

3. Thoughts Are All Over the Place3. 想法无处不在

4. Trading Inference-Time Compute for Adversarial Robustness4. 用推理时间计算换取对抗鲁棒性

5. Chain-of-Associated-Thoughts5. 关联思想链

6. Step Back to Leap Forward6. 后退一步，实现飞跃

7. Scaling up Test-Time Compute with Latent Reasoning7. 使用 Latent Reasoning 扩展 Test-Time Compute

8. Can a 1B LLM Surpass a 405B LLM?8. 1B LLM 能否超越 405B LLM？

9. Inference-Time Computations for LLM Reasoning and Planning9. 用于LLM推理和规划的推理时间计算

10. Inner Thinking Transformer10. 内在思维变压器

11. Test Time Scaling for Code Generation11. 测试代码生成的时间缩放

12. Chain of Draft 12. 选秀链

Conclusion 结论

相关推荐

HTML标签速查手册?别死记硬背了，带你从原理上掌握它!

用node.js实现一个网页爬虫（nodejs爬虫模拟浏览器）

旧手机新玩法，MacroDeck进阶指南

推荐36种免费React模板和主题「干货」

免费领取→可编辑危废标签模板首发!

业余无线电UV段几种常见天线（uv段最强天线）

今年最常见的前端面试题，你会做几道?

GitHub和码云上，7个h5页面制作工具推荐

BarTender10.1条码软件如何制作模板标签

顶级Javaer，常用的 14 个类库