Wednesday, October 29, 2025

Ollama for Local Llm

Continuing with the theme of running Llm's locally, it was time ideal to go with Ollama which has gained significant ground over the one odd year. While tools like Gpt4All, Llamafile, etc are all based on llama.cpp, Ollama has its own model format. 

More importantly Ollama supports older CPU's (non-AVX), something stopped by llama.cpp and the others. 

(I) Installation

Ollama installation is on a box with Ubuntu 25.10 with the latest libraries installed.

    sudo snap install ollama

(II) Pull Ollama model

        ollama pull llama3.2:1b

  • To view the downloaded models using 'ollama list' command:

        ollama list  

            |__ 

NAME                          ID              SIZE      MODIFIED   
llama3.2:1b                   baf6a787fdff    1.3 GB    ... 

  • Models by default get downloaded to:/var/snap/ollama/common/models, verified via:

        sudo snap get ollama 

        |__

Key                   Value
context-length        
cuda-visible-devices  
debug                 0
flash-attention       0
host                  127.0.0.1:11434
models                /var/snap/ollama/common/models
origins               *://localhost 

To change the default location the config models needs to be updated: 

        sudo snap set ollama models=/UPDATED/MODELS/LOCATION/PATH

 (III) Run/ chat with downloaded model:

    ollama run llama3.2:1b

        |__ 

 >>> bye
Bye.

>>> what is 80 + 10?
80 + 10 = 90.

(IV) Install/ Run any locally downloaded GGUF model

Ollama also provides the option to run any downloaded model GGUF locally. These are models not downloaded via ollama pull (ref step (II)) but models downloaded from Hugging face, etc.  

A simple modelfile needs to be prepared with one line instruction:

        FROM </GGUF/FILE/DOWNLOAD/LOCATION> 

Next the Ollama create command is to be run using the modelfile:

        ollama create <CUSTOM_MODEL_NAME> -f  </modefile/LOCATION>

 With that a your downloaded model (GGUF) file would be available for run from Ollama and show up in: 

        ollama list    

(Note: There's a known issue with the template of downloaded model.)

(V) Ollama API

Ollama server by default listens on the end point: http://127.0.0.1:11434.  

Through the endpoint various Ollama APIs are available for chatting, generating completions, list models, show details, running models, version, push, pull, etc with the installed models.

(VI) Remove Models

To remove any downloaded models run the 'ollama rm' command:

        ollama rm llama3.2:1b

(VII) Stop Ollama

  • Stopping/ unloading of just the running model can be effected via an Ollama API call with keep_alive=0, along with an empty message: 

        curl http://127.0.0.1:11434/api/chat -d '{"model": "llama3.2:1b","messages": [],"keep_alive": 0}'

        sudo pkill -9 ollama

     Snap will however restart Ollama snap the moment it is killed (recovery/ restart).

  • To completely shutdown/ disable Ollama:

        sudo snap disable ollama 

 

No comments:

Post a Comment