Running DeepSeek-R1 Model on Your Local Machine

DeepSeek, DeepSeek-R1, Ollama, LLM, Lobehub

Wed Jan 29 2025

DeepSeek's R1 model has garnered significant attention for its advanced capabilities and cost-effectiveness. Running this model locally can be a rewarding experience, offering insights into cutting-edge AI technology. This guide will walk you through the process of setting up and running the DeepSeek-R1 model on your local machine, covering hardware requirements, installation of necessary tools, and client usage.

1. DeepSeek-R1 Model Versions and Hardware Requirements

DeepSeek offers several versions of the R1 model, each with specific hardware requirements:

Full-Scale Models

Full-Scale Models are the original, comprehensive versions of neural networks that contain a vast number of parameters, offering high accuracy but requiring substantial computational resources.

Model Version	Parameters	VRAM	NVIDIA	Mac
DeepSeek-R1-Zero	671B	~1,342 GB	Multi-GPU setup (e.g., NVIDIA A100 80GB x16)	Not applicable
DeepSeek-R1	671B	~1,342 GB	Multi-GPU setup (e.g., NVIDIA A100 80GB x16)	Not applicable

Distilled Models

Distilled Models are created through knowledge distillation, where a smaller model learns to replicate the behavior of a larger model, maintaining performance while reducing size and resource demands.

Model Version	Parameters	VRAM	NVIDIA	Mac
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	~3.5 GB	NVIDIA RTX 3060 12GB or higher	16 GB or more
DeepSeek-R1-Distill-Qwen-7B	7B	~16 GB	NVIDIA RTX 4080 16GB or higher	32 GB or more
DeepSeek-R1-Distill-Llama-8B	8B	~18 GB	NVIDIA RTX 4080 16GB or higher	32 GB or more
DeepSeek-R1-Distill-Qwen-14B	14B	~32 GB	Multi-GPU setup (e.g., NVIDIA RTX 4090 x2)	64 GB or more
DeepSeek-R1-Distill-Qwen-32B	32B	~74 GB	Multi-GPU setup (e.g., NVIDIA RTX 4090 x4)	128 GB or more
DeepSeek-R1-Distill-Llama-70B	70B	~161 GB	Multi-GPU setup (e.g., NVIDIA A100 80GB x2)	Not applicable

Quantized Models (4-bit)

Quantized Models (4-bit) reduce the precision of model parameters to 4-bit representations, significantly decreasing memory usage and computational load, facilitating deployment on hardware with limited resources.

Model Version	Parameters	VRAM (4-bit)	NVIDIA	Mac
DeepSeek-R1-Zero	671B	~336 GB	Multi-GPU setup (e.g., NVIDIA A100 80GB x6)	Not applicable
DeepSeek-R1	671B	~336 GB	Multi-GPU setup (e.g., NVIDIA A100 80GB x6)	Not applicable
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	~1 GB	NVIDIA RTX 3050 8GB or higher	8 GB or more
DeepSeek-R1-Distill-Qwen-7B	7B	~4 GB	NVIDIA RTX 3060 12GB or higher	16 GB or more
DeepSeek-R1-Distill-Llama-8B	8B	~4.5 GB	NVIDIA RTX 3060 12GB or higher	16 GB or more
DeepSeek-R1-Distill-Qwen-14B	14B	~8 GB	NVIDIA RTX 4080 16GB or higher	32 GB or more
DeepSeek-R1-Distill-Qwen-32B	32B	~18 GB	NVIDIA RTX 4090 24GB or higher	64 GB or more
DeepSeek-R1-Distill-Llama-70B	70B	~40 GB	Multi-GPU setup (e.g., NVIDIA RTX 4090 24GB x2)	128 GB or more

2. Installation Guide for Ollama

Ollama is a tool designed to run AI models locally. Here's how to install it on various platforms:

macOS

Download: Visit the Ollama download page and download the macOS version.
Install: Open the downloaded .dmg file and follow the on-screen instructions to install Ollama.

Windows

Download: Go to the Ollama download page and download the Windows installer.
Install: Run the installer and follow the prompts to complete the installation.

Linux

Install: Open a terminal and run the following command:

shell|
curl -fsSL https://ollama.com/install.sh | sh

This command downloads and installs Ollama. (github.com)

Verify: After installation, verify that Ollama is running by executing:

shell|
ollama -v

This should display the version of Ollama installed.

3. Running the Model and Chatting via Command Line

Once Ollama is installed, you can run the DeepSeek-R1 model and interact with it through the command line:

Start Ollama: Launch Ollama by running:

shell|
ollama serve

Interact with the Model: In a new terminal window, use the following command to start a chat session:

shell|
ollama run deepseek-r1:8b

This command initiates a chat session with the DeepSeek-R1 model.

4. Client Usage

a) Enabling Ollama API Access Origins

To allow external applications to interact with Ollama, you need to enable API access origins.

Please see from this post: How to Enable API Cross-Origin for Ollama

b) Interacting with Open Source Clients like LobeHub

LobeHub is an open-source client that can interact with Ollama. You can install LobeHub on local machines or servers to chat with AI models like DeepSeek-R1. Please follow the installation guide.

After setting up LobeHub, you can enable Ollama provider in Lobe's AI Service Provider settings and start chatting with the DeepSeek-R1 model.

Happy chatting with DeepSeek-R1 on your local machine!

ShinChven's Blog