• GenAI
- Text: Chat, Q&A, Compose, Summarize, Think, Search, Insights, Research
- Image: Gen, Identify, Search (Image-Image, Text-Image, etc), Label, Multimodal
- Code gen
- Research: Projects, Science, Breakthroughs
- MoE
• Agentic
- Workflows: GenAI, DNN, Scripts, Tools, etc combined to fulfil Objectives
-- Auto-Generated Plans & Objectives
- Standardization: MCP (API), Interoperability, Protocols
- RAG
- Tools: Websearch, DB, Invoke API/ Tools/ LLM, etc
• Context
- Fresh/ Updated
- Length: Cost vs Speed trade-off
- RAG
- VectorDB (Similarity/ Relevance)
- Memory enhanced
• Fine Tune
- Foundation models (generalists) -> Specialists
- LoRA
- Inference time scaling (compute, tuning, etc)
- Prompts
• Multimodal: Text, Audio, Video, Image, Graph, Sensors
• Safety/ Security
- Output Quality: Relevance, Accuracy, Correctness, Evaluation (Automated Rating, Ranking, JudgeLLM, etc)
-- Hallucination
- Privacy, Data Leak, Backdoor, Jailbreak
- Guard Rails
Insights on Java, Big Data, Search, Cloud, Algorithms, Data Science, Machine Learning...
Wednesday, October 8, 2025
AI/ML '25
Friday, April 18, 2025
AI Agentic Frameworks
With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:
- MCP covered previously
- Any-Agent from Mozilla.ai to switch between agents, vendors, clouds, etc
- Agent2Agent interoperability protocol
Wednesday, April 16, 2025
Speculative Decoding
- Ensemble of Weak + Strong model
- Weak model has a quick first go at generating tokens/ inference (potentials)
- Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
- Overall making inferences via LLMs quicker and cheaper
More to follow..
- https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/
- https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
- https://research.google/blog/looking-back-at-speculative-decoding/
- https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120
Tuesday, April 8, 2025
Revisiting the Bitter Lesson
Richard Sutton's - The Bitter Lesson(s) continue to hold true. Scaling/ data walls could pose challenges to scaling AI general purpose methods (like searching and learning) beyond a point. And that's where human innovation & ingenuity would be needed. But hang on, wouldn't that violate the "..by our methods, not by us.." lesson?
Perhaps then something akin to human innovation/ discovery/ ingenuity/ creativity might be the next frontier of meta-methods. Machines in their typical massively parallel & distributed, brute-force, systematic trial & error fashion would auto ideate/ innovate/ discover solutions quicker, cheaper, better. Over & over again.
So machine discoveries shall be abound, just not Archimedes's Eureka kind, but Edison's 100-different ways style!
Sunday, April 6, 2025
Model Context Protocol (MCP)
Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc.
MCP re-uses well known client-server architecture using JSON-RPC.
Apps use MCP Clients -> MCP Servers (abstracts the service)
Kind of API++ for an AI world!
Saturday, April 5, 2025
Open Weight AI
Inspired by Open Source Software (OSS), yet not fully open...
With Open Weight (OW) typically the final model weights (& the fully trained model) are made available under a liberal free to reuse, modify, distribute, non-discriminating, etc licence. This helps for anyone wanting to start with the fully trained Open Weight model & apply them, fine-tune, modify weights (LoRA, RAG, etc) for custom use-cases. To that extent, OW has a share & reuse philosophy.
On the other hand, wrt training data, data sources, detailed architecture, optimizations details, and so on OW diverges from OSS by not making it compulsory to share any of these. So these remain closed source with the original devs, with a bunch of pros & cons. Copyright material, IP protection, commercial gains, etc are some stated advantages for the original devs/ org. But lack of visibility to the wider community, white box evaluation of model internals, biases, checks & balances are among the downsides of not allowing a full peek into the model.
Anyway, that's the present, a time of great flux. As models stabilize over time OW may tend towards OSS...
References
- https://openweight.org/
- https://www.oracle.com/artificial-intelligence/ai-open-weights-models/
- https://medium.com/@aruna.kolluru/exploring-the-world-of-open-source-and-open-weights-ai-aa09707b69fc
- https://www.forbes.com/sites/adrianbridgwater/2025/01/22/open-weight-definition-adds-balance-to-open-source-ai-integrity/
- https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
- https://promptmetheus.com/resources/llm-knowledge-base/open-weights-model
- https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/
Wednesday, April 2, 2025
The Big Book of LLM
A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic.
Look forward to a paperback edition, which I certainly hope to own...
Tuesday, April 1, 2025
Mozilla.ai
Mozilla pedigree, AI focus, Open-source, Dev oriented.
Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.
Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.
Friday, March 28, 2025
Streamlit
Streamlit is a web wrapper for Data Science projects in pure Python. It's a lightweight, simple, rapid prototyping web app framework for sharing scripts.
- https://streamlit.io/playground
- https://www.restack.io/docs/streamlit-knowledge-streamlit-vs-flask-vs-django
- https://docs.streamlit.io/develop/concepts/architecture/architecture
- https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit
Saturday, March 15, 2025
Scaling Laws
Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.
Factors
- Model size (N)
- Dataset size (D)
- Training Cost (aka Compute) (C)
- Test Cross-entropy loss (L)
The intuitive way,
- Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
- Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
- Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).
Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).
The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.
Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others.
Key take away being,
- The scale & bounds are getting larger over time.
- Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
- Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.
References
- https://en.wikipedia.org/wiki/Neural_scaling_law
- https://lifearchitect.ai/chinchilla/
- https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
- https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
- https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
- https://lifearchitect.ai/the-sky-is-bigger/
Friday, February 28, 2025
Diffusion Models
Diffusion
- Forward, Backward (Learning), Sampling (Random)
- Continous Diffusion
- VAE, Denoising Autoencoder
- Markov Chains
- U-Net
- DALL-E (OpenAI), Stable Diffusion,
- Imagen, Muse, VEO (Google)
- LLaDa, Mercury Coder (Inception)
Non-equilibrium Thermodynamics
- Langevin dynamics
- Thermodynamic Equilibrium - Boltzmann Distribution
- Wiener Process - Multidimensional Brownian Motion
- Energy Based Models
Gaussian Noise
- Denoising
- Noise/ Variance Schedule
- Derivation by Reparameterization
Variational Inference
- Denoising Diffusion Probabilistic Model (DDPM)
- Noise Prediction Networks
- Denoising Diffusion Implicit Model (DDIM)
Loss Functions
- Variational Lower Bound (VLB)
- Evidence Lower Bound (ELBO)
- Kullback-Leibler divergence (KL divergence)
- Mean Squared Error (MSE)
Score Based Generative Model
- Annealing
- Noise conditional score network (NCSN)
- Equivalence: DDPM and Score BBased Generative Models
Conditional (Guided) Generation
- Classifier Guidance
- Classifier Free Guidance (CFG)
Latent Varible Generative Model
- Latent Diffusion Model (LDM)
- Lower Dimension (Latent) Space
References:
- https://en.wikipedia.org/wiki/Diffusion_model
- https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
- https://www.ibm.com/think/topics/diffusion-models
- https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
- Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992
Sunday, January 26, 2025
Mechanistic Interpretability
- Clearer better understanding of Neural Networks working (white box).
- Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features
References
- https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
- https://www.neelnanda.io/mechanistic-interpretability/glossary
- https://transformer-circuits.pub/2022/toy_model/index.html
- https://www.anthropic.com/research/superposition-memorization-and-double-descent
- https://transformer-circuits.pub/2023/toy-double-descent/index.html
Friday, January 24, 2025
State Space Models
- Vector Space of States (of the System)
- Alt. to Transformers, reducible to one another
References
- https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
- https://huggingface.co/blog/lbourdois/ssm-2022
- https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
- https://en.wikipedia.org/wiki/State-space_representation
Thursday, May 30, 2024
Mixture of Experts (MoE) Architecture
Enhancement to LLMs to align with expert models paradigm.
- Each expert implemented as a separate Feed Forward Network (FFN) (though other trainable ML models Backprop should work).
- The expert FFNs are introduced in parallel to the existing FFN layer after the Attention Layer.
- Decision to route tokens to the expert is by a router.
- Router is implemented a linear layer followed by a Softmax for probability of each expert, to pick the top few.