Configuring Your Ollama Server
Ollama is a powerful tool for running large language models (LLMs) locally, but to get the most out of it, you'll want to configure it to suit your specific needs and environment. This blog post will guide you through the process of configuring the Ollama server, based on the official Ollama FAQ.
Understanding Ollama Server Configuration
Ollama's server is configured primarily through environment variables. This allows for a flexible and powerful way to adjust settings without modifying any core code. These settings control aspects like network binding, model storage, proxy settings, CORS, and more.
Setting Environment Variables
The method for setting environment variables depends on your operating system. Here's a breakdown for each:
macOS
If you're running Ollama as a macOS application, you should use launchctl
to manage environment variables:
Set each variable: For each environment variable you want to set, use the
launchctl setenv
command. For instance:bashlaunchctl setenv OLLAMA_HOST "0.0.0.0" launchctl setenv OLLAMA_MODELS "/path/to/your/models"
Restart Ollama: After setting the variables, restart the Ollama application for the changes to take effect.
Linux
On Linux, if Ollama is running as a systemd service, use systemctl
to set the environment variables:
Edit the systemd service file: Run
systemctl edit ollama.service
. This will open the service file in a text editor.Add environment variables: Under the
[Service]
section, add a line for each environment variable using theEnvironment=
directive:ini[Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_MODELS=/path/to/your/models"
Reload and restart: Save the file, exit the editor, and then run the following commands:
bashsystemctl daemon-reload systemctl restart ollama
Windows
On Windows, Ollama inherits your user and system environment variables:
Quit Ollama: First, ensure Ollama is not running by quitting it from the taskbar.
Access environment variables: Open the Settings app (Windows 11) or Control Panel (Windows 10) and search for "environment variables."
Edit user variables: Click on "Edit environment variables for your account."
Set variables: Edit existing variables or create new ones for
OLLAMA_HOST
,OLLAMA_MODELS
, etc.Apply changes: Click OK/Apply to save the changes.
Restart Ollama: Start the Ollama application from the Windows Start menu.
Common Configuration Options
Here are some of the most common configuration options you might want to set:
OLLAMA_HOST
By default, Ollama binds to 127.0.0.1
, meaning it's only accessible from the local machine. To make it accessible on your network, set OLLAMA_HOST
to 0.0.0.0
.
OLLAMA_MODELS
This variable determines where Ollama stores its models. The default locations are:
- macOS:
~/.ollama/models
- Linux:
/usr/share/ollama/.ollama/models
- Windows:
C:\Users\%username%\.ollama\models
You can change this to any directory you prefer. Important Note for Linux: If you change the model directory on Linux, make sure the ollama
user has read and write access to the new location. You can use sudo chown -R ollama:ollama <directory>
to grant the necessary permissions.
HTTPS_PROXY
If you're behind a proxy server, you'll need to set HTTPS_PROXY
to the address of your proxy. This ensures that Ollama can download models from the internet. Important: Make sure your proxy's certificate is installed as a system certificate. Do not set HTTP_PROXY
as it can disrupt client connections.
OLLAMA_ORIGINS
This setting controls which web origins are allowed to access the Ollama API. By default, it allows requests from 127.0.0.1
and 0.0.0.0
. You can add other origins as needed.
OLLAMA_KEEP_ALIVE
This controls how long a model stays loaded in memory after a request. By default, it's 5 minutes. You can:
- Set a specific duration (e.g., "10m", "24h")
- Use a number of seconds (e.g., 3600)
- Use a negative number to keep the model loaded indefinitely (e.g., -1, "-1m")
- Use
0
to unload the model immediately after a response. - The
/api/generate
and/api/chat
endpoints has akeep_alive
parameter that takes precedence over the environment variable.
OLLAMA_MAX_QUEUE
Sets the maximum number of requests that Ollama can queue before responding with a 503 error, indicating that the server is overloaded. The default is 512.
OLLAMA_MAX_LOADED_MODELS
Sets the maximum number of models that can be concurrently loaded, assuming they fit within the available memory. The default is set to three times the number of GPUs or three for CPU inference.
OLLAMA_NUM_PARALLEL
Sets the maximum number of parallel requests each model can handle simultaneously. The default value is dynamically chosen between 4 and 1, based on the available memory.
OLLAMA_FLASH_ATTENTION
Enables Flash Attention, a performance optimization for modern models that can significantly reduce memory usage, especially as the context size grows. Set this environment variable to 1
to enable it.
OLLAMA_KV_CACHE_TYPE
This variable specifies the quantization type for the K/V cache. Using quantization can reduce memory usage, but might slightly impact precision. Options include f16
(default), q8_0
(recommended if not using f16
), and q4_0
.
Advanced: Proxy Servers and Tunneling
If you need to expose your Ollama server through a proxy like Nginx or using tunneling tools like ngrok or Cloudflare Tunnel, refer to the original FAQ for detailed instructions on how to set that up.
Conclusion
Configuring the Ollama server might seem daunting at first, but by understanding how environment variables work and following the steps outlined above, you can easily customize Ollama to meet your needs. Whether you need to expose it on a network, change the model storage location, or optimize performance, the power is in your hands! Happy experimenting with your newly configured Ollama server!