Algorithms, Design, Code and more: AI

Showing posts with label AI. Show all posts

Friday, April 18, 2025

AI Agentic Frameworks

With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:

MCP covered previously
Any-Agent from Mozilla.ai to switch between agents, vendors, clouds, etc
Agent2Agent interoperability protocol

Wednesday, April 16, 2025

Speculative Decoding

Ensemble of Weak + Strong model
Weak model has a quick first go at generating tokens/ inference (potentials)
Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
Overall making inferences via LLMs quicker and cheaper

More to follow..

https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/
https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
https://research.google/blog/looking-back-at-speculative-decoding/
https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120

Tuesday, April 8, 2025

Richard Sutton's - The Bitter Lesson(s) continue to hold true. Scaling/ data walls could pose challenges to scaling AI general purpose methods (like searching and learning) beyond a point. And that's where human innovation & ingenuity would be needed. But hang on, wouldn't that violate the "..by our methods, not by us.." lesson?

Perhaps then something akin to human innovation/ discovery/ ingenuity/ creativity might be the next frontier of meta-methods. Machines in their typical massively parallel & distributed, brute-force, systematic trial & error fashion would auto ideate/ innovate/ discover solutions quicker, cheaper, better. Over & over again.

So machine discoveries shall be abound, just not Archimedes's Eureka kind, but Edison's 100-different ways style!

Sunday, April 6, 2025

Model Context Protocol (MCP)

Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc.

MCP re-uses well known client-server architecture using JSON-RPC.

Apps use MCP Clients -> MCP Servers (abstracts the service)

Kind of API++ for an AI world!

Saturday, April 5, 2025

Open Weight AI

Inspired by Open Source Software (OSS), yet not fully open...

With Open Weight (OW) typically the final model weights (& the fully trained model) are made available under a liberal free to reuse, modify, distribute, non-discriminating, etc licence. This helps for anyone wanting to start with the fully trained Open Weight model & apply them, fine-tune, modify weights (LoRA, RAG, etc) for custom use-cases. To that extent, OW has a share & reuse philosophy.

On the other hand, wrt training data, data sources, detailed architecture, optimizations details, and so on OW diverges from OSS by not making it compulsory to share any of these. So these remain closed source with the original devs, with a bunch of pros & cons. Copyright material, IP protection, commercial gains, etc are some stated advantages for the original devs/ org. But lack of visibility to the wider community, white box evaluation of model internals, biases, checks & balances are among the downsides of not allowing a full peek into the model.

Anyway, that's the present, a time of great flux. As models stabilize over time OW may tend towards OSS...

References

https://openweight.org/
https://www.oracle.com/artificial-intelligence/ai-open-weights-models/
https://medium.com/@aruna.kolluru/exploring-the-world-of-open-source-and-open-weights-ai-aa09707b69fc
https://www.forbes.com/sites/adrianbridgwater/2025/01/22/open-weight-definition-adds-balance-to-open-source-ai-integrity/
https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
https://promptmetheus.com/resources/llm-knowledge-base/open-weights-model
https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/

Wednesday, April 2, 2025

The Big Book of LLM

A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic.

Look forward to a paperback edition, which I certainly hope to own...

Tuesday, April 1, 2025

Mozilla.ai

Mozilla pedigree, AI focus, Open-source, Dev oriented.

Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.

Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.

Friday, March 28, 2025

Streamlit

Streamlit is a web wrapper for Data Science projects in pure Python. It's a lightweight, simple, rapid prototyping web app framework for sharing scripts.

https://streamlit.io/playground
https://www.restack.io/docs/streamlit-knowledge-streamlit-vs-flask-vs-django
https://docs.streamlit.io/develop/concepts/architecture/architecture
https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit

Saturday, March 15, 2025

Scaling Laws

Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.

Factors

Model size (N)
Dataset size (D)
Training Cost (aka Compute) (C)
Test Cross-entropy loss (L)

The intuitive way,

Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).

Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).

The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.

Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others.

Key take away being,

The scale & bounds are getting larger over time.
Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.

References

https://en.wikipedia.org/wiki/Neural_scaling_law
https://lifearchitect.ai/chinchilla/
https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
https://lifearchitect.ai/the-sky-is-bigger/

Friday, February 28, 2025

Diffusion Models

Diffusion

Forward, Backward (Learning), Sampling (Random)
Continous Diffusion
VAE, Denoising Autoencoder
Markov Chains
U-Net
DALL-E (OpenAI), Stable Diffusion,
Imagen, Muse, VEO (Google)
LLaDa, Mercury Coder (Inception)

Non-equilibrium Thermodynamics

Langevin dynamics
Thermodynamic Equilibrium - Boltzmann Distribution
Wiener Process - Multidimensional Brownian Motion
Energy Based Models

Gaussian Noise

Denoising
Noise/ Variance Schedule
Derivation by Reparameterization

Variational Inference

Denoising Diffusion Probabilistic Model (DDPM)
Noise Prediction Networks
Denoising Diffusion Implicit Model (DDIM)

Loss Functions

Variational Lower Bound (VLB)
Evidence Lower Bound (ELBO)
Kullback-Leibler divergence (KL divergence)
Mean Squared Error (MSE)

Score Based Generative Model

Annealing
Noise conditional score network (NCSN)
Equivalence: DDPM and Score BBased Generative Models

Conditional (Guided) Generation

Classifier Guidance
Classifier Free Guidance (CFG)

Latent Varible Generative Model

Latent Diffusion Model (LDM)
Lower Dimension (Latent) Space

References:

https://en.wikipedia.org/wiki/Diffusion_model
https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
https://www.ibm.com/think/topics/diffusion-models
https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992

Sunday, January 26, 2025

Mechanistic Interpretability

Clearer better understanding of Neural Networks working (white box).
Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features

References

https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
https://www.neelnanda.io/mechanistic-interpretability/glossary
https://transformer-circuits.pub/2022/toy_model/index.html
https://www.anthropic.com/research/superposition-memorization-and-double-descent
https://transformer-circuits.pub/2023/toy-double-descent/index.html

Friday, January 24, 2025

State Space Models

Vector Space of States (of the System)
Alt. to Transformers, reducible to one another

(Image source: https://en.wikipedia.org/wiki/State-space_representation)

References

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
https://huggingface.co/blog/lbourdois/ssm-2022
https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
https://en.wikipedia.org/wiki/State-space_representation

Thursday, May 30, 2024

Mixture of Experts (MoE) Architecture

Enhancement to LLMs to align with expert models paradigm.

Each expert implemented as a separate Feed Forward Network (FFN) (though other trainable ML models Backprop should work).
The expert FFNs are introduced in parallel to the existing FFN layer after the Attention Layer.
Decision to route tokens to the expert is by a router.
Router is implemented a linear layer followed by a Softmax for probability of each expert, to pick the top few.

Sunday, February 20, 2022

OCR

Tesseract that "quirky command-line tool that does an outstanding job" (credit A. Kay) truly works well. Give it a shot whenever you get the opportunity.

Sample commands for ref:

tesseract IMG-1.jpg IMG-1 --psm 4

tesseract -l eng IMG-2.jpg b1