Algorithms, Design, Code and more: 2025

Friday, April 18, 2025

AI Agentic Frameworks

With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:

MCP covered previously
Any-Agent from Mozilla.ai to switch between agents, vendors, clouds, etc
Agent2Agent interoperability protocol

Thursday, April 17, 2025

On Quantization

Speed vs Accuracy trade off.
Reduce costs on storage, compute, operations .
Speed up output generation, inference, etc.
Work with lower precision data.
Cast/ map data from Int32, Float32, etc 32-bit or higher precision to lower precision data types such as 16-bit Brain Float (BFloat16) or 4-bit (NFloat)/ int4 or int8, etc.

East mapping Float32 (1-bit Sign, 7-bit Exponent, 23-bit Mantissa) => BFloat16 (1-bit Sign, 7-bit Exponent, 7-bit Mantissa). Just discard the higher 16-bits of mantissa. No overflow!
Straightforward mapping work out max, min, data distribution, mean, variance, etc & then sub-divide into equally sized buckets based on bit size of the lower precision data type. E.g int4 (4-bit) => 2^4 = 16 buckets.
Handle outliers, data skew which can mess up the mapping, yet lead to loss of useful info if discarded randomly.
Work out Bounds wrt Loss of Accuracy.

LLMs, AI/ ML side:

https://newsletter.theaiedge.io/p/reduce-ai-model-operational-costs

Lucene, Search side:

https://www.elastic.co/search-labs/blog/scalar-quantization-101
https://www.elastic.co/search-labs/blog/scalar-quantization-in-lucene

Wednesday, April 16, 2025

Speculative Decoding

Ensemble of Weak + Strong model
Weak model has a quick first go at generating tokens/ inference (potentials)
Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
Overall making inferences via LLMs quicker and cheaper

More to follow..

https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/
https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
https://research.google/blog/looking-back-at-speculative-decoding/
https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120

Tuesday, April 8, 2025

Richard Sutton's - The Bitter Lesson(s) continue to hold true. Scaling/ data walls could pose challenges to scaling AI general purpose methods (like searching and learning) beyond a point. And that's where human innovation & ingenuity would be needed. But hang on, wouldn't that violate the "..by our methods, not by us.." lesson?

Perhaps then something akin to human innovation/ discovery/ ingenuity/ creativity might be the next frontier of meta-methods. Machines in their typical massively parallel & distributed, brute-force, systematic trial & error fashion would auto ideate/ innovate/ discover solutions quicker, cheaper, better. Over & over again.

So machine discoveries shall be abound, just not Archimedes's Eureka kind, but Edison's 100-different ways style!

Sunday, April 6, 2025

Model Context Protocol (MCP)

Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc.

MCP re-uses well known client-server architecture using JSON-RPC.

Apps use MCP Clients -> MCP Servers (abstracts the service)

Kind of API++ for an AI world!

Saturday, April 5, 2025

Open Weight AI

Inspired by Open Source Software (OSS), yet not fully open...

With Open Weight (OW) typically the final model weights (& the fully trained model) are made available under a liberal free to reuse, modify, distribute, non-discriminating, etc licence. This helps for anyone wanting to start with the fully trained Open Weight model & apply them, fine-tune, modify weights (LoRA, RAG, etc) for custom use-cases. To that extent, OW has a share & reuse philosophy.

On the other hand, wrt training data, data sources, detailed architecture, optimizations details, and so on OW diverges from OSS by not making it compulsory to share any of these. So these remain closed source with the original devs, with a bunch of pros & cons. Copyright material, IP protection, commercial gains, etc are some stated advantages for the original devs/ org. But lack of visibility to the wider community, white box evaluation of model internals, biases, checks & balances are among the downsides of not allowing a full peek into the model.

Anyway, that's the present, a time of great flux. As models stabilize over time OW may tend towards OSS...

References

https://openweight.org/
https://www.oracle.com/artificial-intelligence/ai-open-weights-models/
https://medium.com/@aruna.kolluru/exploring-the-world-of-open-source-and-open-weights-ai-aa09707b69fc
https://www.forbes.com/sites/adrianbridgwater/2025/01/22/open-weight-definition-adds-balance-to-open-source-ai-integrity/
https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
https://promptmetheus.com/resources/llm-knowledge-base/open-weights-model
https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/

Wednesday, April 2, 2025

The Big Book of LLM

A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic.

Look forward to a paperback edition, which I certainly hope to own...

Tuesday, April 1, 2025

Mozilla.ai

Mozilla pedigree, AI focus, Open-source, Dev oriented.

Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.

Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.

Friday, March 28, 2025

Streamlit

Streamlit is a web wrapper for Data Science projects in pure Python. It's a lightweight, simple, rapid prototyping web app framework for sharing scripts.

https://streamlit.io/playground
https://www.restack.io/docs/streamlit-knowledge-streamlit-vs-flask-vs-django
https://docs.streamlit.io/develop/concepts/architecture/architecture
https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit

Saturday, March 15, 2025

Scaling Laws

Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.

Factors

Model size (N)
Dataset size (D)
Training Cost (aka Compute) (C)
Test Cross-entropy loss (L)

The intuitive way,

Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).

Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).

The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.

Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others.

Key take away being,

The scale & bounds are getting larger over time.
Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.

References

https://en.wikipedia.org/wiki/Neural_scaling_law
https://lifearchitect.ai/chinchilla/
https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
https://lifearchitect.ai/the-sky-is-bigger/

Friday, February 28, 2025

Diffusion Models

Diffusion

Forward, Backward (Learning), Sampling (Random)
Continous Diffusion
VAE, Denoising Autoencoder
Markov Chains
U-Net
DALL-E (OpenAI), Stable Diffusion,
Imagen, Muse, VEO (Google)
LLaDa, Mercury Coder (Inception)

Non-equilibrium Thermodynamics

Langevin dynamics
Thermodynamic Equilibrium - Boltzmann Distribution
Wiener Process - Multidimensional Brownian Motion
Energy Based Models

Gaussian Noise

Denoising
Noise/ Variance Schedule
Derivation by Reparameterization

Variational Inference

Denoising Diffusion Probabilistic Model (DDPM)
Noise Prediction Networks
Denoising Diffusion Implicit Model (DDIM)

Loss Functions

Variational Lower Bound (VLB)
Evidence Lower Bound (ELBO)
Kullback-Leibler divergence (KL divergence)
Mean Squared Error (MSE)

Score Based Generative Model

Annealing
Noise conditional score network (NCSN)
Equivalence: DDPM and Score BBased Generative Models

Conditional (Guided) Generation

Classifier Guidance
Classifier Free Guidance (CFG)

Latent Varible Generative Model

Latent Diffusion Model (LDM)
Lower Dimension (Latent) Space

References:

https://en.wikipedia.org/wiki/Diffusion_model
https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
https://www.ibm.com/think/topics/diffusion-models
https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992

Sunday, January 26, 2025

Mechanistic Interpretability

Clearer better understanding of Neural Networks working (white box).
Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features

References

https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
https://www.neelnanda.io/mechanistic-interpretability/glossary
https://transformer-circuits.pub/2022/toy_model/index.html
https://www.anthropic.com/research/superposition-memorization-and-double-descent
https://transformer-circuits.pub/2023/toy-double-descent/index.html

Friday, January 24, 2025

State Space Models

Vector Space of States (of the System)
Alt. to Transformers, reducible to one another

(Image source: https://en.wikipedia.org/wiki/State-space_representation)

References

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
https://huggingface.co/blog/lbourdois/ssm-2022
https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
https://en.wikipedia.org/wiki/State-space_representation

Monday, January 6, 2025

Spark API Categorization

A way to categorize Spark API features:

Flow of data is generally across the category swim lanes, from creation of a New Spark Context to reading data using I/O to Filter, Map/ Transform, Reduce/ Agg etc Action.
Lazy processing upto Transformation.
Steps only get executed once an Action is invoke.
Post Actions (Reduce, Collect, etc) there could again be I/O, thus the reverse flow from Action
Partition is a cross cutting concern across all layers. For I/O, Transformations, Actions could be across all or a few Partitions.
forEach on the Stream could be at either at Transform or Action levels.

The diagram is based on code within various Spark test suites.

Thursday, January 2, 2025

Mocked Kinesis (Localstack) with PySpark Streaming

Continuing with the same PySpark (ver 2.1.0, Python3.5, etc.) setup explained in an earlier post. In order to connect to the mocked Kinesis stream on Localstack from PySpark use the kinesis_wordcount_asl.py script located in Spark external/ (connector/) folder.

(a) Update value of master in kinesis_wordcount_asl.py

Update value of master(local[n], spark://localhost:7077, etc) in SparkContext in kinesis_wordcount_asl.py:
sc = SparkContext(appName="PythonStreamingKinesisWordCountAsl",master="local[2]")

(b) Add aSpark compiled jars to Spark Driver/ Executor Classpath

As explained in step (III) of an earlier post, to work with Localstack a few changes were done to the KinesisReceiver.scala onStart() to explicitly set endPoint on kinesis, dynamoDb, cloudWatch clients. Accordingly the compiled aSpark jars with the modifications need to be added to Spark Driver/ Executor classpath.

For Spark local mode (master="local[n]"): additions to classpath can be exported in the SPARK_CLASSPATH variable.

export aSPARK_PROJ_HOME="/Downlaod/Location/aSpark"
export SPARK_CLASSPATH="${aSPARK_PROJ_HOME}/target/original-aSpark_1.0-2.1.0.jar:${aSPARK_PROJ_HOME}/target/scala-2.11/classes:${aSPARK_PROJ_HOME}/target/scala-2.11/jars/*"

For Spark Standalone mode: "spark.executor.extraClassPath" needs to be set in either spark-defaults.conf or added as a SparkConf to SparkContext (see (II)(a))

(c) Ensure SPARK_HOME, PYSPARK_PYTHON & PYTHONPATH variables are exported.

(d) Run kinesis_wordcount_asl

python3.5 ${SPARK_HOME}/external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py SampleKinesisApplication myFirstStream http://localhost:4566/ us-east-1

put-records to Localstack Kinesis

aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata abcd"

Count of the words streamed (put) will show up on the kinesis_wordcount_asl console

Wednesday, January 1, 2025

Spark Streaming with Kinesis mocked on Localstack

In this post we get a Spark streaming application working with AWS Kinesis stream, a mocked version of Kinesis running locally on Localstack. In earlier posts we have explained how to get Localstack running and various AWS services up on Localstack. The client connections to AWS services (Localstack) is done using AWS cli and AWS Java-Sdk v1.

Environment: This set-up continues on a Ubuntu20.04, with Java-8, Maven-3.6x, Docker-24.0x, Python3.5, PySpark/ Spark-2.1.0, Localstack-3.8.1, AWS Java-Sdk-v1 (ver.1.12.778),

Once the Localstack installation is done, steps to follow are:

(I) Start Localstack
    # Start locally
    localstack start

    That should get Localstack should be running on: http://localhost:4566

(II) Check Kinesis services from CLI on Localstack

    # List Streams
    aws --endpoint-url=http://localhost:4566 kinesis list-streams

    # Create Stream
    aws --endpoint-url=http://localhost:4566 kinesis create-stream --stream-name myFirstStream --shard-count 1

    # List Streams
    aws --endpoint-url=http://localhost:4566 kinesis list-streams

    # describe-stream-summary
    aws --endpoint-url=http://localhost:4566 kinesis describe-stream-summary --stream-name myFirstStream

    # Put Record
    aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata abcd"
    aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata efgh"

(III) Connect to Kinesis from Spark Streaming

Download & Build a sample aSpark - Java Kinesis application.
The code is similar to Spark's kinesis-asl from external (connector) module. Except for a few changes to KinesisReceiver.scala onStart() method to explicitly set endPoint on kinesis, dynamoDb, cloudWatch clients. This enables Localstack endPoint url to be plugged into kinesis, dynamoDb & cloudwatch.

# Build
mvn install -DskipTests=true -Dcheckstyle.skip

# Run JavaKinesisWordCountASL with Localstack

JavaKinesisWordCountASL SampleKinesisApplication myFirstStream http://localhost:4566/

runJavaKinesisWordCountASL.sh script located in sbin/ folder of the aSpark project can be used to run JavaKinesisWordCoundASL from the shell

(IV) Add Data to Localstack Kinesis & View Counts on Console
    a) Put Record from cli
    aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata abcd"
    aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata efgh"

    b) Alternatively Put records from Java Kinesis application
    Download, build & run AmazonKinesisRecordProducerSample.java

    c) Now check the output console of JavaKinesisWordCountASL run in step (III) above. Counts of the words streamed from Localstack Kinesis will be displayed on the console.