DeepSeek Research Papers

1. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Scaling open-source language models with a focus on longtermism.

Paper {Jan 6, 2024}

2. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Exploring expert specialization in Mixture-of-Experts language models.

Paper {Jan 11, 2024}

3. DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Investigating the intersection of large language models and programming.

Paper {Jan 25, 2024}

4. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Advancing mathematical reasoning capabilities in open language models.

Paper {Feb 6, 2024}

5. DeepSeek-VL: Towards Real-World Vision-Language Understanding

Focusing on real-world vision-language understanding.

Paper {Mar 9, 2024}

6. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Developing a strong, economical, and efficient Mixture-of-Experts language model.

Paper {May 7, 2024}

7. DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Using large-scale synthetic data to advance theorem proving in LLMs.

Paper {May 23, 2024}

8. DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Aiming to surpass closed-source models in code intelligence.

Paper {Jun 17, 2024}

9. Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Fine-tuning sparse architectural large language models with expert specialization.

Paper {Jul 2, 2024}

10. DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Utilizing proof assistant feedback for reinforcement learning and Monte-Carlo Tree Search.

Paper {Aug 15, 2024}

11. Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Decoupling visual encoding for unified multimodal understanding and generation.

Paper {Oct 17, 2024}

12. JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Harmonizing autoregression and rectified flow for unified multimodal understanding and generation.

Paper {Nov 12, 2024}

13. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Mixture-of-Experts Vision-Language Models for advanced multimodal understanding.

Paper {Dec 13, 2024}

14. DeepSeek-V3 Technical Report

Technical report for DeepSeek-V3.

Paper {Dec 27, 2024}

15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

Paper {Jan 27, 2025}

16. Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Unified Multimodal Understanding and Generation with Data and Model Scaling.

Paper {Jan 31, 2025}

17. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Hardware-Aligned and Natively Trainable Sparse Attention.

Paper {Feb 16, 2025}