Continuing with the theme of running Llm's locally, it was time ideal to go with Ollama which has gained significant ground over the one odd year. While tools like Gpt4All, Llamafile, etc are all based on llama.cpp, Ollama has its own model format.
More importantly Ollama supports older CPU's (non-AVX), something stopped by llama.cpp and the others.
(I) Installation
Ollama installation is on a box with Ubuntu 25.10 with the latest libraries installed.
sudo snap install ollama
(II) Pull Ollama model
ollama pull llama3.2:1b
- To view the downloaded models using 'ollama list' command:
ollama list
|__
NAME ID SIZE MODIFIED
llama3.2:1b baf6a787fdff 1.3 GB ...
- Models by default get downloaded to:/var/snap/ollama/common/models, verified via:
sudo snap get ollama
|__
Key Value
context-length
cuda-visible-devices
debug 0
flash-attention 0
host 127.0.0.1:11434
models /var/snap/ollama/common/models
origins *://localhost
To change the default location the config models needs to be updated:
sudo snap set ollama models=/UPDATED/MODELS/LOCATION/PATH
(III) Run/ chat with downloaded model:
ollama run llama3.2:1b
|__
>>> bye
Bye.
>>> what is 80 + 10?
80 + 10 = 90.
(IV) Install/ Run any locally downloaded GGUF model
Ollama also provides the option to run any downloaded model GGUF locally. These are models not downloaded via ollama pull (ref step (II)) but models downloaded from Hugging face, etc.
A simple modelfile needs to be prepared with one line instruction:
FROM </GGUF/FILE/DOWNLOAD/LOCATION>
Next the Ollama create command is to be run using the modelfile:
ollama create <CUSTOM_MODEL_NAME> -f </modefile/LOCATION>
With that a your downloaded model (GGUF) file would be available for run from Ollama and show up in:
ollama list
(Note: There's a known issue with the template of downloaded model.)
(V) Ollama API
Ollama server by default listens on the end point: http://127.0.0.1:11434.
Through the endpoint various Ollama APIs are available for chatting, generating completions, list models, show details, running models, version, push, pull, etc with the installed models.
(VI) Remove Models
To remove any downloaded models run the 'ollama rm' command:
ollama rm llama3.2:1b
(VII) Stop Ollama
- Stopping/ unloading of just the running model can be effected via an Ollama API call with keep_alive=0, along with an empty message:
curl http://127.0.0.1:11434/api/chat -d '{"model": "llama3.2:1b","messages": [],"keep_alive": 0}'
- On the other hand, stopping of Ollama service is unimplemented. So a hard kill is the only option that works (will also unload all running models):
sudo pkill -9 ollama
Snap will however restart Ollama snap the moment it is killed (recovery/ restart).
- To completely shutdown/ disable Ollama:
sudo snap disable ollama
No comments:
Post a Comment