Showing posts with label Artificial Intelligence. Show all posts
Showing posts with label Artificial Intelligence. Show all posts

Wednesday, November 5, 2025

Agent2Agent (A2A) with a2a-sdk and Http2

Continuing with A2A evaluation next up is a2a-sdk (unrelated to previously evaluated a2a-server). This evaluation is largely based on getting the hello world from the a2a-samples project working as per the instruction of a2a-protocol. With additional, integration with other Http2 based non Python clients.

(I) Installation

pip install a2a-sdk 

# uvicorn python-dotenv (packages existing) 

# For Http2 support 

pip install hypercorn 

pip install h2==4.2.0 (See Issue 1 at the end & the bug details

git clone https://github.com/a2aproject/a2a-samples.git -b main --depth 1

(II) Replace uvicorn server with hypercorn (support for Http2) 

The a2a-samples make use of the uvicorn python server. However, uvicorn is a Http1.x compliant server and doesn't support Http2. Keep seeing the following messages if client requests from Http2: 

"WARNING:  Unsupported upgrade request. "

In order to support a wider & more updated category of clients, uvicorn is replaced with a hypercorn which is Http2 compliant.

In order to switch to hypercorn, the following changes are done to _main_.py of helloworld python project

#import uvicorn
 

# Use Hypercorn for Http2
import asyncio
from hypercorn.config import Config
from hypercorn.asyncio import serve

 ....

    config = Config()
    config.bind="127.0.0.1:8080"  # Binds to all interfaces on port 8080

    asyncio.run(serve(server.build(), config))
   # uvicorn.run(server.build(), host='127.0.0.1', port=8080, log-level='debug') 

(III) Run helloworld

python a2a-samples/samples/python/agents/helloworld/__main__.py 

(IV) View AgentCard

Open in the browser or via curl:

curl http:///127.0.0.1:8080/.well-known/agent-card.json

Response: 

{"capabilities":{"streaming":true},"defaultInputModes":["text"],"defaultOutputModes":["text"],"description":"Just a hello world agent","name":"Hello World Agent","preferredTransport":"JSONRPC","protocolVersion":"0.3.0","skills":[{"description":"just returns hello world","examples":["hi","hello world"],"id":"hello_world","name":"Returns hello world","tags":["hello world"]}],"supportsAuthenticatedExtendedCard":true,"url":"http://127.0.0.1:8080/","version":"1.0.0"} 

For the Authorized Extended Agent Card:

curl -H "Authorization: Bearer dummy-token-for-extended-card" --http2 http://127.0.0.1:8080/agent/authenticatedExtendedCard 

Response: 

{"capabilities":{"streaming":true},"defaultInputModes":["text"],"defaultOutputModes":["text"],"description":"The full-featured hello world agent for authenticated users.","name":"Hello World Agent - Extended Edition","preferredTransport":"JSONRPC","protocolVersion":"0.3.0","skills":[{"description":"just returns hello world","examples":["hi","hello world"],"id":"hello_world","name":"Returns hello world","tags":["hello world"]},{"description":"A more enthusiastic greeting, only for authenticated users.","examples":["super hi","give me a super hello"],"id":"super_hello_world","name":"Returns a SUPER Hello World","tags":["hello world","super","extended"]}],"supportsAuthenticatedExtendedCard":true,"url":"http://127.0.0.1:8080/","version":"1.0.1"} 

(V) Send/ Receive message to Agent

curl -H "Content-Type: application/json"  http:///127.0.0.1:8080 -d '{"jsonrpc":"2.0","id":"ee22f765-0253-40a0-a29f-c786b090889d","method":"message/send","params":{"message":{"role":"user","parts":[{"text":"hello there!","kind":"text"}],"messageId":"ccaf4715-712e-40c6-82bc-634a7a7136f2","kind":"message"},"configuration":{"blocking":false}}}' 

Response: 

 {"id":"ee22f765-0253-40a0-a29f-c786b090889d","jsonrpc":"2.0","result":{"kind":"message","messageId":"d813fed8-58cd-4337-8295-6282930d4d4e","parts":[{"kind":"text","text":"Hello World"}],"role":"agent"}}

(VI) Send/ Receive via Http2

curl -iv --http2 http://127.0.0.1:8080/.well-known/agent-card.json

curl -iv --http2  -H "Content-Type: application/json"  http://127.0.0.1:8080 -d '{"jsonrpc":"2.0","id":"ee22f765-0253-40a0-a29f-c786b090889d","method":"message/send","params":{"message":{"role":"user","parts":[{"text":"dragons and wizards","kind":"text"}],"messageId":"ccaf4715-712e-40c6-82bc-634a7a7136f2","kind":"message"},"configuration":{"blocking":false}}}'

(The responses are the same as shown above)

(VII) Send/ Receive from Java client

TBD

(VIII) Issues 

Issue 1: Compatibility issue with hypercorn (ver=0.17.3) & latest h2 (ver=4.3.0)

Ran in to the issue in the mentioned here:

    |   File "/home/algo/Tools/venv/langvang/lib/python3.13/site-packages/hypercorn/protocol/h2.py", line 138, in initiate
    |     event = h2.events.RequestReceived()
    | TypeError: RequestReceived.__init__() missing 1 required keyword-only argument: 'stream_id' 

Issue was resolved by downgrading to h2 (ver=4.2.0).

 

Tuesday, November 4, 2025

Agent2Agent (A2A) with a2a-server

Agent2Agent (A2A) is a protocol for AI agents to communicate amongst themselves. These Agents though built by different vendors by subscribing to the common a2a protocol will have a standardized way of inter-operating.  

Getting going with A2A 

(I) As a starting point got the python a2a-server installed. 

pip install a2a-server

Issue 1: Compatibility issue between latest a2a-server & a2a-json-rpc:

a2a-server & a2a-server also brings in a2a-json-rpc:  but there were compatibility issues between the latest a2a-json-rpc (ver.0.4.0) & a2a-server (ver. 0.6.1)

        ImportError: cannot import name 'TaskSendParams' from 'a2a_json_rpc.spec' (.../python3.13/site-packages/a2a_json_rpc/spec.py) 

Downgrading  a2a-json-rpc to previous 0.3.0 fixed it:

pip install a2a-json-rpc==0.3.0 

(II) To get the a2a-server running a agent.yaml file needs to be built with the configs like host, port, handler, provider, model, etc:

server:
  host: 127.0.0.1
  port: 8080

handlers:
  use_discovery: false
  default_handler: chuk_pirate
  chuk_pirate:
    type: a2a_server.tasks.handlers.chuk.chuk_agent_handler.ChukAgentHandler
    agent: a2a_server.sample_agents.chuk_pirate.create_pirate_agent
    name: chuk_pirate
    enable_sessions: false
    enable_tools: false
    provider: "ollama"
    model: "llama3.2:1b"
    version: "1.0.1"

    agent_card:
      name: Pirate Agent
      description: "Captain Blackbeard's Ghost with conversation memory"
      capabilities:
        streaming: false
        sessions: false
        tools: false 

-- 

Next, start the server using:

a2a-server -c agent.yaml --log-level debug 

(III) Test a2a-server endpoint from browser

Open http://127.0.0.1:8080/ which will lists the different Agents. 

Agent Card(s): 

http://127.0.0.1:8080/chuk_pirate/.well-known/agent.json 

(IV) Issues a2a-server 

Issue 2: Agent Card endpoint url 

Firstly, the Agent Card end point is that this is no longer a valid end point. As per the latest Agent Card protocol the Agent Card needs to be served from the location: http://<base_url>/ .well-known/agent-card.json

  • agent-card.json (& not agent.json) 
  • Without the agent's name (i.e. without chuk_pirate) 

The valid one would looks like:

http://127.0.0.1:8080/chuk_pirate/.well-known/agent.json 

Issue 3: Error message/send not found

The other issue is that the seems to be a lack of support for the method "message/ send"  used to send messages and chat with the agent. The curl request fails with an error: 

curl -iv -H "Content-Type: application/json"  http://127.0.0.1:8080/chuk_pirate -d '{"jsonrpc":"2.0","id":"ee22f765-0253-40a0-a29f-c786b090889d","method":"message/send","params":{"message":{"role":"user","parts":[{"text":"hello  there!","kind":"text"}],"messageId":"ccaf4715-712e-40c6-82bc-634a7a7136f2","kind":"message"},"configuration":{"blocking":false}}}' 

{"jsonrpc":"2.0","id":"ee22f765-0253-40a0-a29f-c786b090889d","result":null,"error":{"code":-32601,"message":"message/send not found"}} 

Due to all these issues with a2a-server and its lack of documentation there's no clarity on the library. So it's a no-go for the moment atleast.

Sunday, November 2, 2025

DeepEval

DeepEval helps to test and verify the correctness of LLMs. DeepEval is a framework with a suite of Metrics, Synthetic Data generation having integrations across all leading AI/ ML libraries. 

DeepEval can be used to set-up one LLM to judge the output of another LLM. This JudgeLLM set-up can be used at both the training as well as live inference stage for MlOps scenarios.

Getting started with DeepEval is simple with Ollama

(I) Installation

    pip install deepeval

Ollama installation was covered previously with a llama3.2 base model. 

(II) Set-Ollama model in DeepEval

# Unset the openai model - default for DeepEval     

deepeval unset-openai

# Set ollama model for DeepEval 

deepeval set-ollama "llama3.2:1b" --base-url="http://localhost:11434"  

(III) Create a JudgeLLM.py code

# Set up ollama model

model = OllamaModel(
  model="llama3.2:1b",
  base_url="http://localhost:11434",
  temperature=0.0,  # Example: Setting a custom temperature

# Set up evaluation metrics 

correctness_metric = GEval(
    name="Correctness",
    criteria="Determine whether the actual output is factually correct based on the expected output.",
    # NOTE: you can only provide either criteria or evaluation_steps, and not both
    evaluation_steps=[
        "Check whether the facts are true"    ],
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
   model=model, # ollama model
rubric=[
        Rubric(score_range=(0,2), expected_outcome="Factually incorrect."),
        Rubric(score_range=(3,6), expected_outcome="Mostly correct."),
        Rubric(score_range=(7,9), expected_outcome="Correct but missing minor details."),
        Rubric(score_range=(10,10), expected_outcome="100% correct."),
    ],
#    threshold=0.1   

# define the test case

test_case_maths = LLMTestCase(
    input="what is 80 in words? using only 1 word.",
    actual_output="eighty",
    expected_output="eighty"

 # Run the evaluation

evaluate(test_cases=[test_case_maths], metrics=[answer_relevancy]) 

(IV) Execute the JudgeLLM.py 

 deepeval test run JudgeLLM.py 

 

Friday, October 31, 2025

Codegen LLMs

One GenAI feature making headlines is coding. GenAI apps are getting better with reading and writing code (codegen) in various programming languages such as Python, Java, C++ and so on.

On the evaluation side there are all kinds of benchmarks and leaderboards that track progress on codegen. Additionally, aspects of usability, platform support, IDE integration, etc are all key factors for using codegen.

In terms of local evaluations, Ollama provides handy options. With Ollama it's easy to download and run LLMs from various providers (Llama, Gemma, etc). Most now support codegen and readily follow instructions in a chat to churn out basic level Python code:

  •  Llama3.2
  • Gemma3
  • Codegemma 
  • Falcon3
  • Starcoder 

 

Thursday, October 30, 2025

Langchain4j

LangChain is one of the leading python based AI/ ML, agentic modelling and integration frameworks. Langchain (and allied frameworks like LangGraph) allow integration with almost all LLMs, python libraries and tools out there. 

Langchain4j is its Java couterpart. Langchain4j allows LLM integrations and workflows to be built using pure Java constructs. It primarily operates as a Java client to the various Api's exposed by the different LLM provides such as OpenAi, Azure, Bedrock, Gemini and so on. 

Langchain4j has covered a lot of ground in terms of the supported modules from both the Python and the Java ecosystems. It's actively supported and should be one for the long run.. 

To get a feel for Langchain4j on a local LLM try out langchain4j-ollama

This will get: 

    Java langchain4j-ollama to talk to  

        -> Ollama (deployed locally) 

                -> Hosting the llama3.2:1b  model  

(I) Get a local Ollama up & running

Refer to the previous post regarding installing getting Ollama running locally. Once done, you should have a llama3.2:1b model running & ready to chat locally on:

    http://127.0.0.1:11434 

(II) Download & build langchain4j-ollama project

Clone langchain4j-ollama project & build:

    cd </download/folder/langchain4j-ollama> 

    mvn install 

(III) Run langchain4j-ollama tests

Run a couple of the langchain4j-ollama integration tests. Start with OllamaChatModelIT.java. Make sure to update the Model_Name value to llama3.2:1b downloaded in step (I) above:

         static final String MODEL_NAME = "llama3.2:1b";

That's about it for getting the three pieces integrated & chatting! 

Wednesday, October 29, 2025

Ollama for Local Llm

Continuing with the theme of running Llm's locally, it was time ideal to go with Ollama which has gained significant ground over the one odd year. While tools like Gpt4All, Llamafile, etc are all based on llama.cpp, Ollama has its own model format. 

More importantly Ollama supports older CPU's (non-AVX), something stopped by llama.cpp and the others. 

(I) Installation

Ollama installation is on a box with Ubuntu 25.10 with the latest libraries installed.

    sudo snap install ollama

(II) Pull Ollama model

        ollama pull llama3.2:1b

  • To view the downloaded models using 'ollama list' command:

        ollama list  

            |__ 

NAME                          ID              SIZE      MODIFIED   
llama3.2:1b                   baf6a787fdff    1.3 GB    ... 

  • Models by default get downloaded to:/var/snap/ollama/common/models, verified via:

        sudo snap get ollama 

        |__

Key                   Value
context-length        
cuda-visible-devices  
debug                 0
flash-attention       0
host                  127.0.0.1:11434
models                /var/snap/ollama/common/models
origins               *://localhost 

To change the default location the config models needs to be updated: 

        sudo snap set ollama models=/UPDATED/MODELS/LOCATION/PATH

 (III) Run/ chat with downloaded model:

    ollama run llama3.2:1b

        |__ 

 >>> bye
Bye.

>>> what is 80 + 10?
80 + 10 = 90.

(IV) Install/ Run any locally downloaded GGUF model

Ollama also provides the option to run any downloaded model GGUF locally. These are models not downloaded via ollama pull (ref step (II)) but models downloaded from Hugging face, etc.  

A simple modelfile needs to be prepared with one line instruction:

        FROM </GGUF/FILE/DOWNLOAD/LOCATION> 

Next the Ollama create command is to be run using the modelfile:

        ollama create <CUSTOM_MODEL_NAME> -f  </modefile/LOCATION>

 With that a your downloaded model (GGUF) file would be available for run from Ollama and show up in: 

        ollama list    

(Note: There's a known issue with the template of downloaded model.)

(V) Ollama API

Ollama server by default listens on the end point: http://127.0.0.1:11434.  

Through the endpoint various Ollama APIs are available for chatting, generating completions, list models, show details, running models, version, push, pull, etc with the installed models.

(VI) Remove Models

To remove any downloaded models run the 'ollama rm' command:

        ollama rm llama3.2:1b

(VII) Stop Ollama

  • Stopping/ unloading of just the running model can be effected via an Ollama API call with keep_alive=0, along with an empty message: 

        curl http://127.0.0.1:11434/api/chat -d '{"model": "llama3.2:1b","messages": [],"keep_alive": 0}'

        sudo pkill -9 ollama

     Snap will however restart Ollama snap the moment it is killed (recovery/ restart).

  • To completely shutdown/ disable Ollama:

        sudo snap disable ollama 

 

Tuesday, October 28, 2025

Gpt4All on Ubuntu-20

Notes from a rather tough, yet futile, attempt at getting Gpt4All to run locally on an old Ubuntu20.04 box, with Python-3.8.

* Pip Install: First up, the gpt4All installed via pip (ver 2..8.2) has changes incompatible with current/ recent model files & gguf (llama3.2, etc). Causing type, value, keyword, attribute errors etc at different stages of installation & execution.

* Custom Build: Alt. is to download the latest Gpt4All & build it.

This leads to issues with Ubuntu 20.04 library's being outdated/ missing & the hardware being outdated :

  • GLIBC_2.32 not found 
  • GLIBCXX_3.4.29 not found
  • CMake 3.23 or higher is required.  You are running version 3.16.3
  • Vulkan not found, version incompatible with Gpu, etc
  • CUDA Toolkit not found. 
  • CPU does not support AVX 

Anyway, after a lot of false steps the build did succeed with the following flags set did succeed:

    cmake -B build -DCMAKE_BUILD_TYPE=Rel -DLLMODEL_CUDA=OFF -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON 

     Build files have been written to: .../gpt4all/gpt4all-backend/build 

 

Even after all that there were issues popping up with getting Llms to run from libraries like langchain, pygpt4all and so on. Clearly indicating that it was time to bid adieu to Ubuntu 20.04 & upgrade to more recent and better supported versions. 

References

  • https://python.langchain.com/docs/how_to/local_llms/
  • https://askubuntu.com/questions/1393285/how-to-install-glibcxx-3-4-29-on-ubuntu-20-04
  • https://stackoverflow.com/questions/71940179/error-lib-x86-64-linux-gnu-libc-so-6-version-glibc-2-34-not-found 

Sunday, October 26, 2025

Mlflow Java client

Mlflow is a leading open source framework for managing AI/ ML workflows. Mlflow allows tracking, monitoring and generally visualizing end-to-end ML project lifecycles. A handy ops side tool that improves over interpretability of AI/ ML projects.

Key Mlflow concepts include ML Projects, Models on which several Runs of Experiments conducted to name a few. Experiments can also be Tagged with meaningful humanly relevant labels.

While Mlflow is a Python native library with integrations with all the leading Python AI/ ML frameworks such as OpenAI, Langchain, Llamaindex, etc there are also Mlflow API endpoints for wider portability. 

There is also a specific Mlflow Java Api for use from the Java ecosystem. The corresponding Mlflow Java client (maven plugin, etc) works well with the API. To get started with the mlflow using Java:

(I) Install mlflow (Getting started guide)

        $ pip install mlflow 

 This installs mlflow to the users .local folder:

        ~/.local/bin/mlflow 

(II) Start Local mlflow server (simple without authentication)

        $ mlflow server --host 127.0.0.1 --port 8080

mlflow server should be running on 

        http://127.0.0.1:8080

(III) Download mlflower repo (sample Java client code)

Next clone the mlflower repo which has some sample code showing working of the mlflow Java client. 

  • The class Mlfclient shows a simple use case of Creating an Experiment:

            client.createExperiment(experimentName);

Followed by a few runs of logging some Parameters, Metrics, Artifacts:

     run.logParam();

      run.logMetric();

       run.logArtifact()

 

  • Run Hierarchy: Class NestedMlfClient shows nesting hierarchy of Mlflow runs

        Parent Run -> Child Run -> Grand Child Run ->.... & so on

(IV) Start Local mlflow server (with eBasic Authentication)

While authentication is crucial for managing workflows, Mlflow only provided Basic Auth till very recently. Version 3.5 onwards has better support for various auth provides, SSO, etc. For now only mlflow Basic Auth integration is shown.

           # Start server with Basic Auth
            mlflow server --host 127.0.0.1 --port 8080 --app-name basic-auth

Like previously, mlflow server should start running on

            http://127.0.0.1:8080

Only requiring a login credential this time to access the page. The default admin credentials are mentioned on mlflow basic-auth-http.

  • The class BasicAuthMlfclient shows the Java client using BasicMlflowHostCreds to connect to Mlflow with basic auth. 

            new MlflowClient(new BasicMlflowHostCreds(TRACKING_URI, USERNAME, PASSWORD));

(V) Deletes Soft/ Hard

  • Experiments, Runs, etc created within mlflow can be deleted from the ui (& client). The deletes are however only Soft, and get stored somewhere in a Recycle Bin, not visible on the UI.
  •  Hard/ permanent deletes can be effected from the mlflow cli

    # Set mlflow server tracking uri 

    export MLFLOW_TRACKING_URI=http://127.0.0.1:8080

    # Clear garbage

    mlflow gc

  (VI) Issues

  • MlflowContext.withActiveRun() absorbs exception without any logs, simply sets the run status to RunStatus.FAILED
    • So incase runs show failure on the mlflow UI, its best to put explicit try-catch on the client to find the cause.
  • Unable to upload artifacts since cli looks for python (& not python3) on path to run. 
    • Error message: Failed to exec 'python -m mlflow.store.artifact.cli', needed to access artifacts within the non-Java-native artifact store at 'mlflow-artifacts:
    • The dev box (Ubuntu ver 20.04) has python3 (& not python) installed.
    • Without changing the dev box a simple fix is to set/ export the environment variable MLFLOW_PYTHON_EXECUTABLE (within the IDE, shell, etc) to whichever python lib is installed on the box:
               MLFLOW_PYTHON_EXECUTABLE=/usr/bin/python3 
 
So with that keep the AI/ Ml projects flowing!

Wednesday, October 8, 2025

AI/ML '25

• GenAI
    - Text: Chat, Q&A, Compose, Summarize, Think, Search, Insights, Research
     - Image: Gen, Identify, Search (Image-Image, Text-Image, etc), Label, Multimodal
    - Code gen
    - Research: Projects, Science, Breakthroughs
    - MoE

• Agentic
    - Workflows: GenAI, DNN, Scripts, Tools, etc combined to fulfil Objectives
        -- Auto-Generated Plans & Objectives  
    - Standardization: MCP (API), Interoperability, Protocols
    - RAG
    - Tools: Websearch, DB, Invoke API/ Tools/ LLM, etc

• Context
    - Fresh/ Updated
    - Length: Cost vs Speed trade-off
    - RAG
    - VectorDB (Similarity/ Relevance)
    - Memory enhanced

• Fine Tune
    - Foundation models (generalists) -> Specialists
    - LoRA
    - Inference time scaling (compute, tuning, etc)
    - Prompts

• Multimodal: Text, Audio, Video, Image, Graph, Sensors

• Safety/ Security
    - Output Quality: Relevance, Accuracy, Correctness, Evaluation (Automated Rating, Ranking, JudgeLLM, etc)
        -- Hallucination
    - Privacy, Data Leak, Backdoor, Jailbreak
    - Guard Rails

Friday, April 18, 2025

AI Agentic Frameworks

With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:

Thursday, April 17, 2025

On Quantization

  • Speed vs Accuracy trade off.
  • Reduce costs on storage, compute, operations .
  • Speed up output generation, inference, etc.
  • Work with lower precision data.
  • Cast/ map data from Int32, Float32, etc 32-bit or higher precision to lower precision data types such as 16-bit Brain Float (BFloat16) or 4-bit (NFloat)/ int4 or int8, etc.
    • East mapping Float32 (1-bit Sign, 7-bit Exponent, 23-bit Mantissa) => BFloat16 (1-bit Sign, 7-bit Exponent, 7-bit Mantissa). Just discard the higher 16-bits of mantissa. No overflow!
    • Straightforward mapping work out max, min, data distribution, mean, variance, etc & then sub-divide into equally sized buckets based on bit size of the lower precision data type. E.g int4 (4-bit) => 2^4 = 16 buckets. 
    • Handle outliers, data skew which can mess up the mapping, yet lead to loss of useful info if discarded randomly.
    • Work out Bounds wrt Loss of Accuracy.

LLMs, AI/ ML side:

  • https://newsletter.theaiedge.io/p/reduce-ai-model-operational-costs

Lucene, Search side:

  • https://www.elastic.co/search-labs/blog/scalar-quantization-101
  • https://www.elastic.co/search-labs/blog/scalar-quantization-in-lucene

Wednesday, April 16, 2025

Speculative Decoding

  • Ensemble of Weak + Strong model
  • Weak model has a quick first go at generating tokens/ inference (potentials)
  • Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
  • Overall making inferences via LLMs quicker and cheaper

More to follow..

  • https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/ 
  • https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
  • https://research.google/blog/looking-back-at-speculative-decoding/
  • https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120

Tuesday, April 8, 2025

Revisiting the Bitter Lesson

Richard Sutton's - The Bitter Lesson(s) continue to hold true. Scaling/ data walls could pose challenges to scaling AI general purpose methods (like searching and learning) beyond a point. And that's where human innovation & ingenuity would be needed. But hang on, wouldn't that violate the "..by our methods, not by us.." lesson?

Perhaps then something akin to human innovation/ discovery/ ingenuity/ creativity might be the next frontier of meta-methods. Machines in their typical massively parallel & distributed, brute-force, systematic trial & error fashion would auto ideate/ innovate/ discover solutions quicker, cheaper, better. Over & over again.

So machine discoveries shall be abound, just not Archimedes's Eureka kind, but Edison's 100-different ways style!

Sunday, April 6, 2025

Model Context Protocol (MCP)

Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc. 

MCP re-uses well known client-server architecture using JSON-RPC. 

Apps use MCP Clients -> MCP Servers (abstracts the service)

Kind of API++ for an AI world!

Saturday, April 5, 2025

Open Weight AI

Inspired by Open Source Software (OSS), yet not fully open...

With Open Weight (OW) typically the final model weights (& the fully trained model) are made available under a liberal free to reuse, modify, distribute, non-discriminating, etc licence. This helps for anyone wanting to start with the fully trained Open Weight model & apply them, fine-tune, modify weights (LoRA, RAG, etc) for custom use-cases. To that extent, OW has a share & reuse philosophy.
 
On the other hand, wrt training data, data sources, detailed architecture, optimizations details, and so on OW diverges from OSS by not making it compulsory to share any of these. So these remain closed source with the original devs, with a bunch of pros & cons. Copyright material, IP protection, commercial gains, etc are some stated advantages for the original devs/ org. But lack of visibility to the wider community, white box evaluation of model internals, biases, checks & balances are among the downsides of not allowing a full peek into the model.

Anyway, that's the present, a time of great flux. As models stabilize over time OW may tend towards OSS...

References

  • https://openweight.org/    
  • https://www.oracle.com/artificial-intelligence/ai-open-weights-models/
  • https://medium.com/@aruna.kolluru/exploring-the-world-of-open-source-and-open-weights-ai-aa09707b69fc
  • https://www.forbes.com/sites/adrianbridgwater/2025/01/22/open-weight-definition-adds-balance-to-open-source-ai-integrity/
  • https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
  • https://promptmetheus.com/resources/llm-knowledge-base/open-weights-model
  • https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/

Wednesday, April 2, 2025

The Big Book of LLM

A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic. 

Look forward to a paperback edition, which I certainly hope to own...

Tuesday, April 1, 2025

Mozilla.ai

Mozilla pedigree, AI focus, Open-source, Dev oriented.

Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.

Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.

Friday, March 28, 2025

Streamlit

Streamlit is a web wrapper for Data Science projects in pure Python. It's a lightweight, simple, rapid prototyping web app framework for sharing scripts.

  • https://streamlit.io/playground
  • https://www.restack.io/docs/streamlit-knowledge-streamlit-vs-flask-vs-django
  • https://docs.streamlit.io/develop/concepts/architecture/architecture
  • https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit

Saturday, March 15, 2025

Scaling Laws

Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.

Factors

  • Model size (N)
  • Dataset size (D)
  • Training Cost (aka Compute) (C)
  • Test Cross-entropy loss (L)

The intuitive way,

  • Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
  • Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
  • Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).

Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).

The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.

Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others. 

Key take away being,

  • The scale & bounds are getting larger over time. 
  • Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
  • Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.

References

  • https://en.wikipedia.org/wiki/Neural_scaling_law
  • https://lifearchitect.ai/chinchilla/
  • https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
  • https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
  • https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
  • https://lifearchitect.ai/the-sky-is-bigger/

Friday, February 28, 2025

Diffusion Models

Diffusion

  •     Forward, Backward (Learning), Sampling (Random)    
  •     Continous Diffusion
  •     VAE, Denoising Autoencoder
  •     Markov Chains
  •     U-Net
  •     DALL-E (OpenAI), Stable Diffusion,
  •     Imagen, Muse, VEO (Google)
  •     LLaDa, Mercury Coder (Inception)

Non-equilibrium Thermodynamics

  •     Langevin dynamics
  •     Thermodynamic Equilibrium - Boltzmann Distribution
  •     Wiener Process - Multidimensional Brownian Motion
  •     Energy Based Models

Gaussian Noise

  •     Denoising
  •     Noise/ Variance Schedule
  •     Derivation by Reparameterization

Variational Inference    

  •     Denoising Diffusion Probabilistic Model (DDPM)
  •     Noise Prediction Networks    
  •     Denoising Diffusion Implicit Model (DDIM)

Loss Functions

  •     Variational Lower Bound (VLB)
  •     Evidence Lower Bound (ELBO)
  •     Kullback-Leibler divergence (KL divergence)
  •     Mean Squared Error (MSE)

Score Based Generative Model

  •     Annealing
  •     Noise conditional score network (NCSN)
  •     Equivalence: DDPM and Score BBased Generative Models

Conditional (Guided) Generation

  •     Classifier Guidance    
  •     Classifier Free Guidance (CFG)

Latent Varible Generative Model

  •     Latent Diffusion Model (LDM)
  •     Lower Dimension (Latent) Space

References:

  • https://en.wikipedia.org/wiki/Diffusion_model
  • https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
  • https://www.ibm.com/think/topics/diffusion-models
  • https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
  • Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992