DeepSeek Research Papers

DeepSeek Research Team

Advancing AI through Open Research and Innovation

Publications

1. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Scaling open-source language models with a focus on longtermism.

2. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Exploring expert specialization in Mixture-of-Experts language models.

3. DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Investigating the intersection of large language models and programming.

4. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Advancing mathematical reasoning capabilities in open language models.

5. DeepSeek-VL: Towards Real-World Vision-Language Understanding

Focusing on real-world vision-language understanding.

6. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Developing a strong, economical, and efficient Mixture-of-Experts language model.

7. DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Using large-scale synthetic data to advance theorem proving in LLMs.

8. DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Aiming to surpass closed-source models in code intelligence.

9. Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Fine-tuning sparse architectural large language models with expert specialization.

10. DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Utilizing proof assistant feedback for reinforcement learning and Monte-Carlo Tree Search.

11. Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Decoupling visual encoding for unified multimodal understanding and generation.

12. JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Harmonizing autoregression and rectified flow for unified multimodal understanding and generation.

13. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Mixture-of-Experts Vision-Language Models for advanced multimodal understanding.

14. DeepSeek-V3 Technical Report

Technical report for DeepSeek-V3.

15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

16. Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Unified Multimodal Understanding and Generation with Data and Model Scaling.

17. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Hardware-Aligned and Natively Trainable Sparse Attention.