Running LLMs locally with Ollama and open-webui

April 14, 2024 · 4 min · torgeir

Enjoying LLMs but don't care for giving away all your data? Here's how to run your own little chatgpt locally, using ollama and open-webui in docker!

Ai Docker Nix Llm Gpu Series

This is the first post in a series about running LLMs locally. The second part is about Connecting Stable Diffusion WebUI to your locally running Open WebUI .

We’ll accomplish this using

A nix enabled machine
An AMD GPU - a decent CPU will probably work as well
Rootless docker
Ollama/ollama
Open-webui/open-webui

Rootless docker

You can enable rootless docker with the following nix configuration. Its a nice way of running docker without it needing root access. This reduces the blast radius if code should be able to break out of your containers in case of exploitable vulnerabilities.

This also means your user don’t need to be in the docker group, and you don’t need sudo to launch docker images.

virtualisation.docker = {
  # must align with your filesystem
  storageDriver = "btrfs";
  rootless = {
    enable = true;
    # by default point DOCKER_HOST to the rootless Docker instance for users
    setSocketVariable = true;
    # where the data goes
    daemon.settings = { data-root = "/data"; };
  };
};

Start the docker daemon from your user account

systemctl --user start docker

Ollama

Ollama is one of many ways to get up and running with large language models locally. It works on both Windows, Mac and Linux, and can also run in docker. It also works alright with a decent CPU, if you don’t have a GPU at hand. (Of course, a GPU will make it much, much faster).

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

After this, you can install ollama from your favorite package manager, and you have an LLM directly available in your terminal by running ollama pull <model> and ollama run <model>. Try it with nix-shell -p ollama, followed by ollama run llama2.

You can also script against it, e.g. by running this curl command - that makes sure to prevent buffering for both curl and jq, for that incredibly annoying OG chatbot feel.

curl http://localhost:11434/api/generate \
    --no-buffer \
    --silent \
    -d '{ "model": "llama2", "prompt":"Why is the sky blue? Keep it brief." }' \
    | jq --unbuffered -jrc .response

Already this is useful in and of itself, but adding open-webui to the mix you can take it even further, to create your own little chatgpt-like playground.

But I have a GPU

Check the docker hub image descriptions for the image variation you will need to detect and support your GPU automatically

For my AMD RX6800XT GPU, it was simply a matter of switching to the image ollama/ollama:rocm and forwarding the devices /dev/kfd and /dev/dri to make Ollama detect it.

This will speed your queries up by orders of magnitude.

Open-webui

Open-webui describes itself as

(…) an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.

It also runs with docker , and connects to your running ollama server. It provides a web based chat like experience, much like chatgpt - in fact, pretty much exactly like chatgpt.

I’m not conviced chats like this is the way to interact with AI. Nonetheless, this is where its at, and I have to say that the open source variant of the “chatgpt UI” does work pretty darn well.

The webui brings life to a lot of the features of ollama. Among other things I like, it enables

user registration with an admin interface, so you could share the instance with your friends and family at home
pulling models automatically from the Ollama library
uploading files for context, that the models can query
chat configurations, supporting predefined models and prompts to customize your assistants
with a model like llava, it can look at and understand images

Well, almost.

It can even connect to the AUTOMATIC1111/stable-diffusion-webui to enable image generation, but sadly support for rocm and AMD GPUs are still lacking in the docker variant of stable-diffusion-webui, so that’s for another post.

docker-compose.yml

If all this sounds interesting, here is the docker compose configuration needed to run ollama and open-webui on NixOs with rootless docker and an AMD graphics card.

# docker-compose.yml
services:
  ollama:
    container_name: ollama
    image: ollama/ollama:rocm
    restart: unless-stopped
    expose:
      - 11434:11434
    devices:
      - /dev/kfd
      - /dev/dri
    volumes:
      - ollama:/root/.ollama
    networks:
      - internal

  open-webui:
    container_name: open-webui
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    environment:
      - "OLLAMA_BASE_URL=http://ollama:11434"
    ports:
      - 8080:8080
    networks:
      - internal

volumes:
  ollama: {}
  open-webui: {}

networks:
  internal:
    driver: bridge

Put it in a folder and run docker compose up inside it.

But I am using Emacs

Check out ellama on MELPA, or s-kostyaev/ellama , for instructions on how to integrate this into your workflow.

Have fun! 🤖

Running LLMs locally with Ollama and open-webui

Rootless docker#

Ollama#

But I have a GPU#

Open-webui#

docker-compose.yml#

But I am using Emacs#