Running DeepSeek-R1 Model on Your Local Machine

Wed Jan 29 2025

DeepSeek's R1 model has garnered significant attention for its advanced capabilities and cost-effectiveness. Running this model locally can be a rewarding experience, offering insights into cutting-edge AI technology. This guide will walk you through the process of setting up and running the DeepSeek-R1 model on your local machine, covering hardware requirements, installation of necessary tools, and client usage.

1. DeepSeek-R1 Model Versions and Hardware Requirements

DeepSeek offers several versions of the R1 model, each with specific hardware requirements:

Full-Scale Models

Full-Scale Models are the original, comprehensive versions of neural networks that contain a vast number of parameters, offering high accuracy but requiring substantial computational resources.

Model Version Parameters VRAM NVIDIA Mac
DeepSeek-R1-Zero 671B ~1,342 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x16) Not applicable
DeepSeek-R1 671B ~1,342 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x16) Not applicable

Distilled Models

Distilled Models are created through knowledge distillation, where a smaller model learns to replicate the behavior of a larger model, maintaining performance while reducing size and resource demands.

Model Version Parameters VRAM NVIDIA Mac
DeepSeek-R1-Distill-Qwen-1.5B 1.5B ~3.5 GB NVIDIA RTX 3060 12GB or higher 16 GB or more
DeepSeek-R1-Distill-Qwen-7B 7B ~16 GB NVIDIA RTX 4080 16GB or higher 32 GB or more
DeepSeek-R1-Distill-Llama-8B 8B ~18 GB NVIDIA RTX 4080 16GB or higher 32 GB or more
DeepSeek-R1-Distill-Qwen-14B 14B ~32 GB Multi-GPU setup (e.g., NVIDIA RTX 4090 x2) 64 GB or more
DeepSeek-R1-Distill-Qwen-32B 32B ~74 GB Multi-GPU setup (e.g., NVIDIA RTX 4090 x4) 128 GB or more
DeepSeek-R1-Distill-Llama-70B 70B ~161 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x2) Not applicable

Quantized Models (4-bit)

Quantized Models (4-bit) reduce the precision of model parameters to 4-bit representations, significantly decreasing memory usage and computational load, facilitating deployment on hardware with limited resources.

Model Version Parameters VRAM (4-bit) NVIDIA Mac
DeepSeek-R1-Zero 671B ~336 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x6) Not applicable
DeepSeek-R1 671B ~336 GB Multi-GPU setup (e.g., NVIDIA A100 80GB x6) Not applicable
DeepSeek-R1-Distill-Qwen-1.5B 1.5B ~1 GB NVIDIA RTX 3050 8GB or higher 8 GB or more
DeepSeek-R1-Distill-Qwen-7B 7B ~4 GB NVIDIA RTX 3060 12GB or higher 16 GB or more
DeepSeek-R1-Distill-Llama-8B 8B ~4.5 GB NVIDIA RTX 3060 12GB or higher 16 GB or more
DeepSeek-R1-Distill-Qwen-14B 14B ~8 GB NVIDIA RTX 4080 16GB or higher 32 GB or more
DeepSeek-R1-Distill-Qwen-32B 32B ~18 GB NVIDIA RTX 4090 24GB or higher 64 GB or more
DeepSeek-R1-Distill-Llama-70B 70B ~40 GB Multi-GPU setup (e.g., NVIDIA RTX 4090 24GB x2) 128 GB or more

2. Installation Guide for Ollama

Ollama is a tool designed to run AI models locally. Here's how to install it on various platforms:

macOS

  1. Download: Visit the Ollama download page and download the macOS version.

  2. Install: Open the downloaded .dmg file and follow the on-screen instructions to install Ollama.

Windows

  1. Download: Go to the Ollama download page and download the Windows installer.

  2. Install: Run the installer and follow the prompts to complete the installation.

Linux

Install: Open a terminal and run the following command:

curl -fsSL https://ollama.com/install.sh | sh

This command downloads and installs Ollama. (github.com)

Verify: After installation, verify that Ollama is running by executing:

ollama -v

This should display the version of Ollama installed.

3. Running the Model and Chatting via Command Line

Once Ollama is installed, you can run the DeepSeek-R1 model and interact with it through the command line:

Start Ollama: Launch Ollama by running:

ollama serve

Interact with the Model: In a new terminal window, use the following command to start a chat session:

ollama run deepseek-r1:8b

This command initiates a chat session with the DeepSeek-R1 model.

4. Client Usage

a) Enabling Ollama API Access Origins

To allow external applications to interact with Ollama, you need to enable API access origins.

Please see from this post: How to Enable API Cross-Origin for Ollama

b) Interacting with Open Source Clients like LobeHub

LobeHub is an open-source client that can interact with Ollama. You can install LobeHub on local machines or servers to chat with AI models like DeepSeek-R1. Please follow the installation guide.

After setting up LobeHub, you can enable Ollama provider in Lobe's AI Service Provider settings and start chatting with the DeepSeek-R1 model.

Happy chatting with DeepSeek-R1 on your local machine!

Sources: