Configuring Your Ollama Server

Wed Jan 15 2025

Ollama is a powerful tool for running large language models (LLMs) locally, but to get the most out of it, you'll want to configure it to suit your specific needs and environment. This blog post will guide you through the process of configuring the Ollama server, based on the official Ollama FAQ.

Understanding Ollama Server Configuration

Ollama's server is configured primarily through environment variables. This allows for a flexible and powerful way to adjust settings without modifying any core code. These settings control aspects like network binding, model storage, proxy settings, CORS, and more.

Setting Environment Variables

The method for setting environment variables depends on your operating system. Here's a breakdown for each:

macOS

If you're running Ollama as a macOS application, you should use launchctl to manage environment variables:

Set each variable: For each environment variable you want to set, use the launchctl setenv command. For instance:

bash|
launchctl setenv OLLAMA_HOST "0.0.0.0"
launchctl setenv OLLAMA_MODELS "/path/to/your/models"

Restart Ollama: After setting the variables, restart the Ollama application for the changes to take effect.

Linux

On Linux, if Ollama is running as a systemd service, use systemctl to set the environment variables:

Edit the systemd service file: Run systemctl edit ollama.service. This will open the service file in a text editor.
Add environment variables: Under the [Service] section, add a line for each environment variable using the Environment= directive:
```
ini|
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/path/to/your/models"
```
Reload and restart: Save the file, exit the editor, and then run the following commands:
```
bash|
systemctl daemon-reload
systemctl restart ollama
```

Windows

On Windows, Ollama inherits your user and system environment variables:

Quit Ollama: First, ensure Ollama is not running by quitting it from the taskbar.
Access environment variables: Open the Settings app (Windows 11) or Control Panel (Windows 10) and search for "environment variables."
Edit user variables: Click on "Edit environment variables for your account."
Set variables: Edit existing variables or create new ones for OLLAMA_HOST, OLLAMA_MODELS, etc.
Apply changes: Click OK/Apply to save the changes.
Restart Ollama: Start the Ollama application from the Windows Start menu.

Common Configuration Options

Here are some of the most common configuration options you might want to set:

`OLLAMA_HOST`

By default, Ollama binds to 127.0.0.1, meaning it's only accessible from the local machine. To make it accessible on your network, set OLLAMA_HOST to 0.0.0.0.

`OLLAMA_MODELS`

This variable determines where Ollama stores its models. The default locations are:

macOS: ~/.ollama/models
Linux: /usr/share/ollama/.ollama/models
Windows: C:\Users\%username%\.ollama\models

You can change this to any directory you prefer. Important Note for Linux: If you change the model directory on Linux, make sure the ollama user has read and write access to the new location. You can use sudo chown -R ollama:ollama <directory> to grant the necessary permissions.

`HTTPS_PROXY`

If you're behind a proxy server, you'll need to set HTTPS_PROXY to the address of your proxy. This ensures that Ollama can download models from the internet. Important: Make sure your proxy's certificate is installed as a system certificate. Do not set HTTP_PROXY as it can disrupt client connections.

`OLLAMA_ORIGINS`

This setting controls which web origins are allowed to access the Ollama API. By default, it allows requests from 127.0.0.1 and 0.0.0.0. You can add other origins as needed.

`OLLAMA_KEEP_ALIVE`

This controls how long a model stays loaded in memory after a request. By default, it's 5 minutes. You can:

Set a specific duration (e.g., "10m", "24h")
Use a number of seconds (e.g., 3600)
Use a negative number to keep the model loaded indefinitely (e.g., -1, "-1m")
Use 0 to unload the model immediately after a response.
The /api/generate and /api/chat endpoints has a keep_alive parameter that takes precedence over the environment variable.

`OLLAMA_MAX_QUEUE`

Sets the maximum number of requests that Ollama can queue before responding with a 503 error, indicating that the server is overloaded. The default is 512.

`OLLAMA_MAX_LOADED_MODELS`

Sets the maximum number of models that can be concurrently loaded, assuming they fit within the available memory. The default is set to three times the number of GPUs or three for CPU inference.

`OLLAMA_NUM_PARALLEL`

Sets the maximum number of parallel requests each model can handle simultaneously. The default value is dynamically chosen between 4 and 1, based on the available memory.

`OLLAMA_FLASH_ATTENTION`

Enables Flash Attention, a performance optimization for modern models that can significantly reduce memory usage, especially as the context size grows. Set this environment variable to 1 to enable it.

`OLLAMA_KV_CACHE_TYPE`

This variable specifies the quantization type for the K/V cache. Using quantization can reduce memory usage, but might slightly impact precision. Options include f16 (default), q8_0 (recommended if not using f16), and q4_0.

Advanced: Proxy Servers and Tunneling

If you need to expose your Ollama server through a proxy like Nginx or using tunneling tools like ngrok or Cloudflare Tunnel, refer to the original FAQ for detailed instructions on how to set that up.

Conclusion

Configuring the Ollama server might seem daunting at first, but by understanding how environment variables work and following the steps outlined above, you can easily customize Ollama to meet your needs. Whether you need to expose it on a network, change the model storage location, or optimize performance, the power is in your hands! Happy experimenting with your newly configured Ollama server!

ShinChven's Blog

Configuring Your Ollama Server

Understanding Ollama Server Configuration

Setting Environment Variables

macOS

Linux

Windows

Common Configuration Options

OLLAMA_HOST

OLLAMA_MODELS

HTTPS_PROXY

OLLAMA_ORIGINS

OLLAMA_KEEP_ALIVE

OLLAMA_MAX_QUEUE

OLLAMA_MAX_LOADED_MODELS

OLLAMA_NUM_PARALLEL

OLLAMA_FLASH_ATTENTION

OLLAMA_KV_CACHE_TYPE