• GenAI
- Text: Chat, Q&A, Compose, Summarize, Think, Search, Insights, Research
- Image: Gen, Identify, Search (Image-Image, Text-Image, etc), Label, Multimodal
- Code gen
- Research: Projects, Science, Breakthroughs
- MoE
• Agentic
- Workflows: GenAI, DNN, Scripts, Tools, etc combined to fulfil Objectives
-- Auto-Generated Plans & Objectives
- Standardization: MCP (API), Interoperability, Protocols
- RAG
- Tools: Websearch, DB, Invoke API/ Tools/ LLM, etc
• Context
- Fresh/ Updated
- Length: Cost vs Speed trade-off
- RAG
- VectorDB (Similarity/ Relevance)
- Memory enhanced
• Fine Tune
- Foundation models (generalists) -> Specialists
- LoRA
- Inference time scaling (compute, tuning, etc)
- Prompts
• Multimodal: Text, Audio, Video, Image, Graph, Sensors
• Safety/ Security
- Output Quality: Relevance, Accuracy, Correctness, Evaluation (Automated Rating, Ranking, JudgeLLM, etc)
-- Hallucination
- Privacy, Data Leak, Backdoor, Jailbreak
- Guard Rails
Insights on Java, Big Data, Search, Cloud, Algorithms, Data Science, Machine Learning...
Wednesday, October 8, 2025
AI/ML '25
Friday, April 18, 2025
AI Agentic Frameworks
With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:
- MCP covered previously
- Any-Agent from Mozilla.ai to switch between agents, vendors, clouds, etc
- Agent2Agent interoperability protocol
Thursday, April 17, 2025
On Quantization
- Speed vs Accuracy trade off.
- Reduce costs on storage, compute, operations .
- Speed up output generation, inference, etc.
- Work with lower precision data.
- Cast/ map data from Int32, Float32, etc 32-bit or higher precision to lower precision data types such as 16-bit Brain Float (BFloat16) or 4-bit (NFloat)/ int4 or int8, etc.
- East mapping Float32 (1-bit Sign, 7-bit Exponent, 23-bit Mantissa) => BFloat16 (1-bit Sign, 7-bit Exponent, 7-bit Mantissa). Just discard the higher 16-bits of mantissa. No overflow!
- Straightforward mapping work out max, min, data distribution, mean, variance, etc & then sub-divide into equally sized buckets based on bit size of the lower precision data type. E.g int4 (4-bit) => 2^4 = 16 buckets.
- Handle outliers, data skew which can mess up the mapping, yet lead to loss of useful info if discarded randomly.
- Work out Bounds wrt Loss of Accuracy.
LLMs, AI/ ML side:
- https://newsletter.theaiedge.io/p/reduce-ai-model-operational-costs
Lucene, Search side:
- https://www.elastic.co/search-labs/blog/scalar-quantization-101
- https://www.elastic.co/search-labs/blog/scalar-quantization-in-lucene
Wednesday, April 16, 2025
Speculative Decoding
- Ensemble of Weak + Strong model
- Weak model has a quick first go at generating tokens/ inference (potentials)
- Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
- Overall making inferences via LLMs quicker and cheaper
More to follow..
- https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/
- https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
- https://research.google/blog/looking-back-at-speculative-decoding/
- https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120
Tuesday, April 8, 2025
Revisiting the Bitter Lesson
Richard Sutton's - The Bitter Lesson(s) continue to hold true. Scaling/ data walls could pose challenges to scaling AI general purpose methods (like searching and learning) beyond a point. And that's where human innovation & ingenuity would be needed. But hang on, wouldn't that violate the "..by our methods, not by us.." lesson?
Perhaps then something akin to human innovation/ discovery/ ingenuity/ creativity might be the next frontier of meta-methods. Machines in their typical massively parallel & distributed, brute-force, systematic trial & error fashion would auto ideate/ innovate/ discover solutions quicker, cheaper, better. Over & over again.
So machine discoveries shall be abound, just not Archimedes's Eureka kind, but Edison's 100-different ways style!
Sunday, April 6, 2025
Model Context Protocol (MCP)
Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc.
MCP re-uses well known client-server architecture using JSON-RPC.
Apps use MCP Clients -> MCP Servers (abstracts the service)
Kind of API++ for an AI world!
Saturday, April 5, 2025
Open Weight AI
Inspired by Open Source Software (OSS), yet not fully open...
With Open Weight (OW) typically the final model weights (& the fully trained model) are made available under a liberal free to reuse, modify, distribute, non-discriminating, etc licence. This helps for anyone wanting to start with the fully trained Open Weight model & apply them, fine-tune, modify weights (LoRA, RAG, etc) for custom use-cases. To that extent, OW has a share & reuse philosophy.
On the other hand, wrt training data, data sources, detailed architecture, optimizations details, and so on OW diverges from OSS by not making it compulsory to share any of these. So these remain closed source with the original devs, with a bunch of pros & cons. Copyright material, IP protection, commercial gains, etc are some stated advantages for the original devs/ org. But lack of visibility to the wider community, white box evaluation of model internals, biases, checks & balances are among the downsides of not allowing a full peek into the model.
Anyway, that's the present, a time of great flux. As models stabilize over time OW may tend towards OSS...
References
- https://openweight.org/
- https://www.oracle.com/artificial-intelligence/ai-open-weights-models/
- https://medium.com/@aruna.kolluru/exploring-the-world-of-open-source-and-open-weights-ai-aa09707b69fc
- https://www.forbes.com/sites/adrianbridgwater/2025/01/22/open-weight-definition-adds-balance-to-open-source-ai-integrity/
- https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
- https://promptmetheus.com/resources/llm-knowledge-base/open-weights-model
- https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/
Wednesday, April 2, 2025
The Big Book of LLM
A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic.
Look forward to a paperback edition, which I certainly hope to own...
Tuesday, April 1, 2025
Mozilla.ai
Mozilla pedigree, AI focus, Open-source, Dev oriented.
Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.
Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.
Friday, March 28, 2025
Streamlit
Streamlit is a web wrapper for Data Science projects in pure Python. It's a lightweight, simple, rapid prototyping web app framework for sharing scripts.
- https://streamlit.io/playground
- https://www.restack.io/docs/streamlit-knowledge-streamlit-vs-flask-vs-django
- https://docs.streamlit.io/develop/concepts/architecture/architecture
- https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit
Saturday, March 15, 2025
Scaling Laws
Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.
Factors
- Model size (N)
- Dataset size (D)
- Training Cost (aka Compute) (C)
- Test Cross-entropy loss (L)
The intuitive way,
- Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
- Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
- Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).
Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).
The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.
Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others.
Key take away being,
- The scale & bounds are getting larger over time.
- Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
- Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.
References
- https://en.wikipedia.org/wiki/Neural_scaling_law
- https://lifearchitect.ai/chinchilla/
- https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
- https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
- https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
- https://lifearchitect.ai/the-sky-is-bigger/
Friday, February 28, 2025
Diffusion Models
Diffusion
- Forward, Backward (Learning), Sampling (Random)
- Continous Diffusion
- VAE, Denoising Autoencoder
- Markov Chains
- U-Net
- DALL-E (OpenAI), Stable Diffusion,
- Imagen, Muse, VEO (Google)
- LLaDa, Mercury Coder (Inception)
Non-equilibrium Thermodynamics
- Langevin dynamics
- Thermodynamic Equilibrium - Boltzmann Distribution
- Wiener Process - Multidimensional Brownian Motion
- Energy Based Models
Gaussian Noise
- Denoising
- Noise/ Variance Schedule
- Derivation by Reparameterization
Variational Inference
- Denoising Diffusion Probabilistic Model (DDPM)
- Noise Prediction Networks
- Denoising Diffusion Implicit Model (DDIM)
Loss Functions
- Variational Lower Bound (VLB)
- Evidence Lower Bound (ELBO)
- Kullback-Leibler divergence (KL divergence)
- Mean Squared Error (MSE)
Score Based Generative Model
- Annealing
- Noise conditional score network (NCSN)
- Equivalence: DDPM and Score BBased Generative Models
Conditional (Guided) Generation
- Classifier Guidance
- Classifier Free Guidance (CFG)
Latent Varible Generative Model
- Latent Diffusion Model (LDM)
- Lower Dimension (Latent) Space
References:
- https://en.wikipedia.org/wiki/Diffusion_model
- https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
- https://www.ibm.com/think/topics/diffusion-models
- https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
- Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992
Sunday, January 26, 2025
Mechanistic Interpretability
- Clearer better understanding of Neural Networks working (white box).
- Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features
References
- https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
- https://www.neelnanda.io/mechanistic-interpretability/glossary
- https://transformer-circuits.pub/2022/toy_model/index.html
- https://www.anthropic.com/research/superposition-memorization-and-double-descent
- https://transformer-circuits.pub/2023/toy-double-descent/index.html
Friday, January 24, 2025
State Space Models
- Vector Space of States (of the System)
- Alt. to Transformers, reducible to one another
References
- https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
- https://huggingface.co/blog/lbourdois/ssm-2022
- https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
- https://en.wikipedia.org/wiki/State-space_representation
Monday, January 6, 2025
Spark API Categorization
A way to categorize Spark API features:
- Flow of data is generally across the category swim lanes, from creation of a New Spark Context to reading data using I/O to Filter, Map/ Transform, Reduce/ Agg etc Action.
- Lazy processing upto Transformation.
- Steps only get executed once an Action is invoke.
- Post Actions (Reduce, Collect, etc) there could again be I/O, thus the reverse flow from Action
- Partition is a cross cutting concern across all layers. For I/O, Transformations, Actions could be across all or a few Partitions.
- forEach on the Stream could be at either at Transform or Action levels.
The diagram is based on code within various Spark test suites.
Thursday, January 2, 2025
Mocked Kinesis (Localstack) with PySpark Streaming
Continuing with the same PySpark (ver 2.1.0, Python3.5, etc.) setup explained in an earlier post. In order to connect to the mocked Kinesis stream on Localstack from PySpark use the kinesis_wordcount_asl.py script located in Spark external/ (connector/) folder.
(a) Update value of master in kinesis_wordcount_asl.py
Update value of master(local[n], spark://localhost:7077, etc) in SparkContext in kinesis_wordcount_asl.py:
sc = SparkContext(appName="PythonStreamingKinesisWordCountAsl",master="local[2]")
(b) Add aSpark compiled jars to Spark Driver/ Executor Classpath
As explained in step (III) of an earlier post, to work with Localstack a few changes were done to the KinesisReceiver.scala onStart() to explicitly set endPoint on kinesis, dynamoDb, cloudWatch clients. Accordingly the compiled aSpark jars with the modifications need to be added to Spark Driver/ Executor classpath.
- For Spark local mode (master="local[n]"): additions to classpath can be exported in the SPARK_CLASSPATH variable.
export aSPARK_PROJ_HOME="/Downlaod/Location/aSpark"
export SPARK_CLASSPATH="${aSPARK_PROJ_HOME}/target/original-aSpark_1.0-2.1.0.jar:${aSPARK_PROJ_HOME}/target/scala-2.11/classes:${aSPARK_PROJ_HOME}/target/scala-2.11/jars/*"
- For Spark Standalone mode: "spark.executor.extraClassPath" needs to be set in either spark-defaults.conf or added as a SparkConf to SparkContext (see (II)(a))
(c) Ensure SPARK_HOME, PYSPARK_PYTHON & PYTHONPATH variables are exported.
(d) Run kinesis_wordcount_asl
python3.5 ${SPARK_HOME}/external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py SampleKinesisApplication myFirstStream http://localhost:4566/ us-east-1
aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata abcd"
- Count of the words streamed (put) will show up on the kinesis_wordcount_asl console
Wednesday, January 1, 2025
Spark Streaming with Kinesis mocked on Localstack
In this post we get a Spark streaming application working with AWS Kinesis stream, a mocked version of Kinesis running locally on Localstack. In earlier posts we have explained how to get Localstack running and various AWS services up on Localstack. The client connections to AWS services (Localstack) is done using AWS cli and AWS Java-Sdk v1.
Environment: This set-up continues on a Ubuntu20.04, with Java-8, Maven-3.6x, Docker-24.0x, Python3.5, PySpark/ Spark-2.1.0, Localstack-3.8.1, AWS Java-Sdk-v1 (ver.1.12.778),
Once the Localstack installation is done, steps to follow are:
(I) Start Localstack
# Start locally
localstack start
That should get Localstack should be running on: http://localhost:4566
(II) Check Kinesis services from CLI on Localstack
# List Streams
aws --endpoint-url=http://localhost:4566 kinesis list-streams
# Create Stream
aws --endpoint-url=http://localhost:4566 kinesis create-stream --stream-name myFirstStream --shard-count 1
# List Streams
aws --endpoint-url=http://localhost:4566 kinesis list-streams
# describe-stream-summary
aws --endpoint-url=http://localhost:4566 kinesis describe-stream-summary --stream-name myFirstStream
# Put Record
aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata abcd"
aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata efgh"
(III) Connect to Kinesis from Spark Streaming
- Download & Build a sample aSpark - Java Kinesis application.
- The code is similar to Spark's kinesis-asl from external (connector) module. Except for a few changes to KinesisReceiver.scala onStart() method to explicitly set endPoint on kinesis, dynamoDb, cloudWatch clients. This enables Localstack endPoint url to be plugged into kinesis, dynamoDb & cloudwatch.
# Build
mvn install -DskipTests=true -Dcheckstyle.skip
# Run JavaKinesisWordCountASL with Localstack
- JavaKinesisWordCountASL SampleKinesisApplication myFirstStream http://localhost:4566/
- runJavaKinesisWordCountASL.sh script located in sbin/ folder of the aSpark project can be used to run JavaKinesisWordCoundASL from the shell
(IV) Add Data to Localstack Kinesis & View Counts on Console
a) Put Record from cli
aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata abcd"
aws --endpoint-url=http://localhost:4566 kinesis put-record --stream-name myFirstStream --partition-key 123 --data "testdata efgh"
b) Alternatively Put records from Java Kinesis application
Download, build & run AmazonKinesisRecordProducerSample.java
c) Now check the output console of JavaKinesisWordCountASL run in step (III) above. Counts of the words streamed from Localstack Kinesis will be displayed on the console.
