DeepSeek-R1:incentivizingReasoningCapabilityinLLMsviareinforcementlearningDeepSeek-AIresearch@Deepseek.comAbstractWeintroduceourfirst-generationreasoningmodels,DeepSeek-R1-ZeroandDeepSeek-R1.DeepSeek-R1-Zero,amodeltrainedvialarge-scalereinforcementlearning(RL)withoutsuper-visedfine-tuning(SFT)asapreliminarystep,demonstratesremarkablereasoningcapabilities.ThroughRL,DeepSeek-R1-Zeronaturallyemerg...
时间:2025-04-10 23:33栏目:研究报告