Wednesday, October 29, 2025

Ollama for Local Llm

Continuing with the theme of running Llm's locally, it was time ideal to go with Ollama which has gained significant ground over the one odd year. While tools like Gpt4All, Llamafile, etc are all based on llama.cpp, Ollama has its own model format. 

More importantly Ollama supports older CPU's (non-AVX), something stopped by llama.cpp and the others. 

(I) Installation

Ollama installation is on a box with Ubuntu 25.10 with the latest libraries installed.

    sudo snap install ollama

(II) Pull Ollama model

        ollama pull llama3.2:1b

  • To view the downloaded models using 'ollama list' command:

        ollama list  

            |__ 

NAME                          ID              SIZE      MODIFIED   
llama3.2:1b                   baf6a787fdff    1.3 GB    ... 

  • Models by default get downloaded to:/var/snap/ollama/common/models, verified via:

        sudo snap get ollama 

        |__

Key                   Value
context-length        
cuda-visible-devices  
debug                 0
flash-attention       0
host                  127.0.0.1:11434
models                /var/snap/ollama/common/models
origins               *://localhost 

To change the default location the config models needs to be updated: 

        sudo snap set ollama models=/UPDATED/MODELS/LOCATION/PATH

 (III) Run/ chat with downloaded model:

    ollama run llama3.2:1b

        |__ 

 >>> bye
Bye.

>>> what is 80 + 10?
80 + 10 = 90.

(IV) Install/ Run any locally downloaded GGUF model

Ollama also provides the option to run any downloaded model GGUF locally. These are models not downloaded via ollama pull (ref step (II)) but models downloaded from Hugging face, etc.  

A simple modelfile needs to be prepared with one line instruction:

        FROM </GGUF/FILE/DOWNLOAD/LOCATION> 

Next the Ollama create command is to be run using the modelfile:

        ollama create <CUSTOM_MODEL_NAME> -f  </modefile/LOCATION>

 With that a your downloaded model (GGUF) file would be available for run from Ollama and show up in: 

        ollama list    

(Note: There's a known issue with the template of downloaded model.)

(V) Ollama API

Ollama server by default listens on the end point: http://127.0.0.1:11434.  

Through the endpoint various Ollama APIs are available for chatting, generating completions, list models, show details, running models, version, push, pull, etc with the installed models.

(VI) Remove Models

To remove any downloaded models run the 'ollama rm' command:

        ollama rm llama3.2:1b

(VII) Stop Ollama

  • Stopping/ unloading of just the running model can be effected via an Ollama API call with keep_alive=0, along with an empty message: 

        curl http://127.0.0.1:11434/api/chat -d '{"model": "llama3.2:1b","messages": [],"keep_alive": 0}'

        sudo pkill -9 ollama

     Snap will however restart Ollama snap the moment it is killed (recovery/ restart).

  • To completely shutdown/ disable Ollama:

        sudo snap disable ollama 

 

Tuesday, October 28, 2025

Gpt4All on Ubuntu-20

Notes from a rather tough, yet futile, attempt at getting Gpt4All to run locally on an old Ubuntu20.04 box, with Python-3.8.

* Pip Install: First up, the gpt4All installed via pip (ver 2..8.2) has changes incompatible with current/ recent model files & gguf (llama3.2, etc). Causing type, value, keyword, attribute errors etc at different stages of installation & execution.

* Custom Build: Alt. is to download the latest Gpt4All & build it.

This leads to issues with Ubuntu 20.04 library's being outdated/ missing & the hardware being outdated :

  • GLIBC_2.32 not found 
  • GLIBCXX_3.4.29 not found
  • CMake 3.23 or higher is required.  You are running version 3.16.3
  • Vulkan not found, version incompatible with Gpu, etc
  • CUDA Toolkit not found. 
  • CPU does not support AVX 

Anyway, after a lot of false steps the build did succeed with the following flags set did succeed:

    cmake -B build -DCMAKE_BUILD_TYPE=Rel -DLLMODEL_CUDA=OFF -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON 

     Build files have been written to: .../gpt4all/gpt4all-backend/build 

 

Even after all that there were issues popping up with getting Llms to run from libraries like langchain, pygpt4all and so on. Clearly indicating that it was time to bid adieu to Ubuntu 20.04 & upgrade to more recent and better supported versions. 

References

  • https://python.langchain.com/docs/how_to/local_llms/
  • https://askubuntu.com/questions/1393285/how-to-install-glibcxx-3-4-29-on-ubuntu-20-04
  • https://stackoverflow.com/questions/71940179/error-lib-x86-64-linux-gnu-libc-so-6-version-glibc-2-34-not-found 

Sunday, October 26, 2025

Mlflow Java client

Mlflow is a leading open source framework for managing AI/ ML workflows. Mlflow allows tracking, monitoring and generally visualizing end-to-end ML project lifecycles. A handy ops side tool that improves over interpretability of AI/ ML projects.

Key Mlflow concepts include ML Projects, Models on which several Runs of Experiments conducted to name a few. Experiments can also be Tagged with meaningful humanly relevant labels.

While Mlflow is a Python native library with integrations with all the leading Python AI/ ML frameworks such as OpenAI, Langchain, Llamaindex, etc there are also Mlflow API endpoints for wider portability. 

There is also a specific Mlflow Java Api for use from the Java ecosystem. The corresponding Mlflow Java client (maven plugin, etc) works well with the API. To get started with the mlflow using Java:

(I) Install mlflow (Getting started guide)

        $ pip install mlflow 

 This installs mlflow to the users .local folder:

        ~/.local/bin/mlflow 

(II) Start Local mlflow server (simple without authentication)

        $ mlflow server --host 127.0.0.1 --port 8080

mlflow server should be running on 

        http://127.0.0.1:8080

(III) Download mlflower repo (sample Java client code)

Next clone the mlflower repo which has some sample code showing working of the mlflow Java client. 

  • The class Mlfclient shows a simple use case of Creating an Experiment:

            client.createExperiment(experimentName);

Followed by a few runs of logging some Parameters, Metrics, Artifacts:

     run.logParam();

      run.logMetric();

       run.logArtifact()

 

  • Run Hierarchy: Class NestedMlfClient shows nesting hierarchy of Mlflow runs

        Parent Run -> Child Run -> Grand Child Run ->.... & so on

(IV) Start Local mlflow server (with eBasic Authentication)

While authentication is crucial for managing workflows, Mlflow only provided Basic Auth till very recently. Version 3.5 onwards has better support for various auth provides, SSO, etc. For now only mlflow Basic Auth integration is shown.

           # Start server with Basic Auth
            mlflow server --host 127.0.0.1 --port 8080 --app-name basic-auth

Like previously, mlflow server should start running on

            http://127.0.0.1:8080

Only requiring a login credential this time to access the page. The default admin credentials are mentioned on mlflow basic-auth-http.

  • The class BasicAuthMlfclient shows the Java client using BasicMlflowHostCreds to connect to Mlflow with basic auth. 

            new MlflowClient(new BasicMlflowHostCreds(TRACKING_URI, USERNAME, PASSWORD));

(V) Deletes Soft/ Hard

  • Experiments, Runs, etc created within mlflow can be deleted from the ui (& client). The deletes are however only Soft, and get stored somewhere in a Recycle Bin, not visible on the UI.
  •  Hard/ permanent deletes can be effected from the mlflow cli

    # Set mlflow server tracking uri 

    export MLFLOW_TRACKING_URI=http://127.0.0.1:8080

    # Clear garbage

    mlflow gc

  (VI) Issues

  • MlflowContext.withActiveRun() absorbs exception without any logs, simply sets the run status to RunStatus.FAILED
    • So incase runs show failure on the mlflow UI, its best to put explicit try-catch on the client to find the cause.
  • Unable to upload artifacts since cli looks for python (& not python3) on path to run. 
    • Error message: Failed to exec 'python -m mlflow.store.artifact.cli', needed to access artifacts within the non-Java-native artifact store at 'mlflow-artifacts:
    • The dev box (Ubuntu ver 20.04) has python3 (& not python) installed.
    • Without changing the dev box a simple fix is to set/ export the environment variable MLFLOW_PYTHON_EXECUTABLE (within the IDE, shell, etc) to whichever python lib is installed on the box:
               MLFLOW_PYTHON_EXECUTABLE=/usr/bin/python3 
 
So with that keep the AI/ Ml projects flowing!

Wednesday, October 8, 2025

AI/ML '25

• GenAI
    - Text: Chat, Q&A, Compose, Summarize, Think, Search, Insights, Research
     - Image: Gen, Identify, Search (Image-Image, Text-Image, etc), Label, Multimodal
    - Code gen
    - Research: Projects, Science, Breakthroughs
    - MoE

• Agentic
    - Workflows: GenAI, DNN, Scripts, Tools, etc combined to fulfil Objectives
        -- Auto-Generated Plans & Objectives  
    - Standardization: MCP (API), Interoperability, Protocols
    - RAG
    - Tools: Websearch, DB, Invoke API/ Tools/ LLM, etc

• Context
    - Fresh/ Updated
    - Length: Cost vs Speed trade-off
    - RAG
    - VectorDB (Similarity/ Relevance)
    - Memory enhanced

• Fine Tune
    - Foundation models (generalists) -> Specialists
    - LoRA
    - Inference time scaling (compute, tuning, etc)
    - Prompts

• Multimodal: Text, Audio, Video, Image, Graph, Sensors

• Safety/ Security
    - Output Quality: Relevance, Accuracy, Correctness, Evaluation (Automated Rating, Ranking, JudgeLLM, etc)
        -- Hallucination
    - Privacy, Data Leak, Backdoor, Jailbreak
    - Guard Rails