Running DeepSeek-R1 Model on Your Local Machine
DeepSeek's R1 model has garnered significant attention for its advanced capabilities and cost-effectiveness. Running this model locally can be a rewarding experience, offering insights into cutting-edge AI technology. This guide will walk you through the process of setting up and running the DeepSeek-R1 model on your local machine, covering hardware requirements, installation of necessary tools, and client usage.
1. DeepSeek-R1 Model Versions and Hardware Requirements
DeepSeek offers several versions of the R1 model, each with specific hardware requirements:
Full-Scale Models
Full-Scale Models are the original, comprehensive versions of neural networks that contain a vast number of parameters, offering high accuracy but requiring substantial computational resources.
Model Version | Parameters | VRAM | NVIDIA | Mac |
---|---|---|---|---|
DeepSeek-R1-Zero | 671B | ~1,342 GB | Multi-GPU setup (e.g., NVIDIA A100 80GB x16) | Not applicable |
DeepSeek-R1 | 671B | ~1,342 GB | Multi-GPU setup (e.g., NVIDIA A100 80GB x16) | Not applicable |
Distilled Models
Distilled Models are created through knowledge distillation, where a smaller model learns to replicate the behavior of a larger model, maintaining performance while reducing size and resource demands.
Model Version | Parameters | VRAM | NVIDIA | Mac |
---|---|---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | ~3.5 GB | NVIDIA RTX 3060 12GB or higher | 16 GB or more |
DeepSeek-R1-Distill-Qwen-7B | 7B | ~16 GB | NVIDIA RTX 4080 16GB or higher | 32 GB or more |
DeepSeek-R1-Distill-Llama-8B | 8B | ~18 GB | NVIDIA RTX 4080 16GB or higher | 32 GB or more |
DeepSeek-R1-Distill-Qwen-14B | 14B | ~32 GB | Multi-GPU setup (e.g., NVIDIA RTX 4090 x2) | 64 GB or more |
DeepSeek-R1-Distill-Qwen-32B | 32B | ~74 GB | Multi-GPU setup (e.g., NVIDIA RTX 4090 x4) | 128 GB or more |
DeepSeek-R1-Distill-Llama-70B | 70B | ~161 GB | Multi-GPU setup (e.g., NVIDIA A100 80GB x2) | Not applicable |
Quantized Models (4-bit)
Quantized Models (4-bit) reduce the precision of model parameters to 4-bit representations, significantly decreasing memory usage and computational load, facilitating deployment on hardware with limited resources.
Model Version | Parameters | VRAM (4-bit) | NVIDIA | Mac |
---|---|---|---|---|
DeepSeek-R1-Zero | 671B | ~336 GB | Multi-GPU setup (e.g., NVIDIA A100 80GB x6) | Not applicable |
DeepSeek-R1 | 671B | ~336 GB | Multi-GPU setup (e.g., NVIDIA A100 80GB x6) | Not applicable |
DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | ~1 GB | NVIDIA RTX 3050 8GB or higher | 8 GB or more |
DeepSeek-R1-Distill-Qwen-7B | 7B | ~4 GB | NVIDIA RTX 3060 12GB or higher | 16 GB or more |
DeepSeek-R1-Distill-Llama-8B | 8B | ~4.5 GB | NVIDIA RTX 3060 12GB or higher | 16 GB or more |
DeepSeek-R1-Distill-Qwen-14B | 14B | ~8 GB | NVIDIA RTX 4080 16GB or higher | 32 GB or more |
DeepSeek-R1-Distill-Qwen-32B | 32B | ~18 GB | NVIDIA RTX 4090 24GB or higher | 64 GB or more |
DeepSeek-R1-Distill-Llama-70B | 70B | ~40 GB | Multi-GPU setup (e.g., NVIDIA RTX 4090 24GB x2) | 128 GB or more |
2. Installation Guide for Ollama
Ollama is a tool designed to run AI models locally. Here's how to install it on various platforms:
macOS
Download: Visit the Ollama download page and download the macOS version.
Install: Open the downloaded
.dmg
file and follow the on-screen instructions to install Ollama.
Windows
Download: Go to the Ollama download page and download the Windows installer.
Install: Run the installer and follow the prompts to complete the installation.
Linux
Install: Open a terminal and run the following command:
curl -fsSL https://ollama.com/install.sh | sh
This command downloads and installs Ollama. (github.com)
Verify: After installation, verify that Ollama is running by executing:
ollama -v
This should display the version of Ollama installed.
3. Running the Model and Chatting via Command Line
Once Ollama is installed, you can run the DeepSeek-R1 model and interact with it through the command line:
Start Ollama: Launch Ollama by running:
ollama serve
Interact with the Model: In a new terminal window, use the following command to start a chat session:
ollama run deepseek-r1:8b
This command initiates a chat session with the DeepSeek-R1 model.
4. Client Usage
a) Enabling Ollama API Access Origins
To allow external applications to interact with Ollama, you need to enable API access origins.
Please see from this post: How to Enable API Cross-Origin for Ollama
b) Interacting with Open Source Clients like LobeHub
LobeHub is an open-source client that can interact with Ollama. You can install LobeHub on local machines or servers to chat with AI models like DeepSeek-R1. Please follow the installation guide.
After setting up LobeHub, you can enable Ollama provider in Lobe's AI Service Provider
settings and start chatting with the DeepSeek-R1 model.
Happy chatting with DeepSeek-R1 on your local machine!