• GenAI
- Text: Chat, Q&A, Compose, Summarize, Think, Search, Insights, Research
- Image: Gen, Identify, Search (Image-Image, Text-Image, etc), Label, Multimodal
- Code gen
- Research: Projects, Science, Breakthroughs
- MoE
• Agentic
- Workflows: GenAI, DNN, Scripts, Tools, etc combined to fulfil Objectives
-- Auto-Generated Plans & Objectives
- Standardization: MCP (API), Interoperability, Protocols
- RAG
- Tools: Websearch, DB, Invoke API/ Tools/ LLM, etc
• Context
- Fresh/ Updated
- Length: Cost vs Speed trade-off
- RAG
- VectorDB (Similarity/ Relevance)
- Memory enhanced
• Fine Tune
- Foundation models (generalists) -> Specialists
- LoRA
- Inference time scaling (compute, tuning, etc)
- Prompts
• Multimodal: Text, Audio, Video, Image, Graph, Sensors
• Safety/ Security
- Output Quality: Relevance, Accuracy, Correctness, Evaluation (Automated Rating, Ranking, JudgeLLM, etc)
-- Hallucination
- Privacy, Data Leak, Backdoor, Jailbreak
- Guard Rails
Insights on Java, Big Data, Search, Cloud, Algorithms, Data Science, Machine Learning...
Wednesday, October 8, 2025
AI/ML '25
Friday, April 18, 2025
AI Agentic Frameworks
With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:
- MCP covered previously
- Any-Agent from Mozilla.ai to switch between agents, vendors, clouds, etc
- Agent2Agent interoperability protocol
Thursday, April 17, 2025
On Quantization
- Speed vs Accuracy trade off.
- Reduce costs on storage, compute, operations .
- Speed up output generation, inference, etc.
- Work with lower precision data.
- Cast/ map data from Int32, Float32, etc 32-bit or higher precision to lower precision data types such as 16-bit Brain Float (BFloat16) or 4-bit (NFloat)/ int4 or int8, etc.
- East mapping Float32 (1-bit Sign, 7-bit Exponent, 23-bit Mantissa) => BFloat16 (1-bit Sign, 7-bit Exponent, 7-bit Mantissa). Just discard the higher 16-bits of mantissa. No overflow!
- Straightforward mapping work out max, min, data distribution, mean, variance, etc & then sub-divide into equally sized buckets based on bit size of the lower precision data type. E.g int4 (4-bit) => 2^4 = 16 buckets.
- Handle outliers, data skew which can mess up the mapping, yet lead to loss of useful info if discarded randomly.
- Work out Bounds wrt Loss of Accuracy.
LLMs, AI/ ML side:
- https://newsletter.theaiedge.io/p/reduce-ai-model-operational-costs
Lucene, Search side:
- https://www.elastic.co/search-labs/blog/scalar-quantization-101
- https://www.elastic.co/search-labs/blog/scalar-quantization-in-lucene
Wednesday, April 16, 2025
Speculative Decoding
- Ensemble of Weak + Strong model
- Weak model has a quick first go at generating tokens/ inference (potentials)
- Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
- Overall making inferences via LLMs quicker and cheaper
More to follow..
- https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/
- https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
- https://research.google/blog/looking-back-at-speculative-decoding/
- https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120
Tuesday, April 8, 2025
Revisiting the Bitter Lesson
Richard Sutton's - The Bitter Lesson(s) continue to hold true. Scaling/ data walls could pose challenges to scaling AI general purpose methods (like searching and learning) beyond a point. And that's where human innovation & ingenuity would be needed. But hang on, wouldn't that violate the "..by our methods, not by us.." lesson?
Perhaps then something akin to human innovation/ discovery/ ingenuity/ creativity might be the next frontier of meta-methods. Machines in their typical massively parallel & distributed, brute-force, systematic trial & error fashion would auto ideate/ innovate/ discover solutions quicker, cheaper, better. Over & over again.
So machine discoveries shall be abound, just not Archimedes's Eureka kind, but Edison's 100-different ways style!
Sunday, April 6, 2025
Model Context Protocol (MCP)
Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc.
MCP re-uses well known client-server architecture using JSON-RPC.
Apps use MCP Clients -> MCP Servers (abstracts the service)
Kind of API++ for an AI world!
Saturday, April 5, 2025
Open Weight AI
Inspired by Open Source Software (OSS), yet not fully open...
With Open Weight (OW) typically the final model weights (& the fully trained model) are made available under a liberal free to reuse, modify, distribute, non-discriminating, etc licence. This helps for anyone wanting to start with the fully trained Open Weight model & apply them, fine-tune, modify weights (LoRA, RAG, etc) for custom use-cases. To that extent, OW has a share & reuse philosophy.
On the other hand, wrt training data, data sources, detailed architecture, optimizations details, and so on OW diverges from OSS by not making it compulsory to share any of these. So these remain closed source with the original devs, with a bunch of pros & cons. Copyright material, IP protection, commercial gains, etc are some stated advantages for the original devs/ org. But lack of visibility to the wider community, white box evaluation of model internals, biases, checks & balances are among the downsides of not allowing a full peek into the model.
Anyway, that's the present, a time of great flux. As models stabilize over time OW may tend towards OSS...
References
- https://openweight.org/
- https://www.oracle.com/artificial-intelligence/ai-open-weights-models/
- https://medium.com/@aruna.kolluru/exploring-the-world-of-open-source-and-open-weights-ai-aa09707b69fc
- https://www.forbes.com/sites/adrianbridgwater/2025/01/22/open-weight-definition-adds-balance-to-open-source-ai-integrity/
- https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/
- https://promptmetheus.com/resources/llm-knowledge-base/open-weights-model
- https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/
Wednesday, April 2, 2025
The Big Book of LLM
A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic.
Look forward to a paperback edition, which I certainly hope to own...
Tuesday, April 1, 2025
Mozilla.ai
Mozilla pedigree, AI focus, Open-source, Dev oriented.
Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.
Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.
Friday, March 28, 2025
Streamlit
Streamlit is a web wrapper for Data Science projects in pure Python. It's a lightweight, simple, rapid prototyping web app framework for sharing scripts.
- https://streamlit.io/playground
- https://www.restack.io/docs/streamlit-knowledge-streamlit-vs-flask-vs-django
- https://docs.streamlit.io/develop/concepts/architecture/architecture
- https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit
Saturday, March 15, 2025
Scaling Laws
Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.
Factors
- Model size (N)
- Dataset size (D)
- Training Cost (aka Compute) (C)
- Test Cross-entropy loss (L)
The intuitive way,
- Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
- Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
- Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).
Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).
The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.
Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others.
Key take away being,
- The scale & bounds are getting larger over time.
- Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
- Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.
References
- https://en.wikipedia.org/wiki/Neural_scaling_law
- https://lifearchitect.ai/chinchilla/
- https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
- https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
- https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
- https://lifearchitect.ai/the-sky-is-bigger/
Friday, February 28, 2025
Diffusion Models
Diffusion
- Forward, Backward (Learning), Sampling (Random)
- Continous Diffusion
- VAE, Denoising Autoencoder
- Markov Chains
- U-Net
- DALL-E (OpenAI), Stable Diffusion,
- Imagen, Muse, VEO (Google)
- LLaDa, Mercury Coder (Inception)
Non-equilibrium Thermodynamics
- Langevin dynamics
- Thermodynamic Equilibrium - Boltzmann Distribution
- Wiener Process - Multidimensional Brownian Motion
- Energy Based Models
Gaussian Noise
- Denoising
- Noise/ Variance Schedule
- Derivation by Reparameterization
Variational Inference
- Denoising Diffusion Probabilistic Model (DDPM)
- Noise Prediction Networks
- Denoising Diffusion Implicit Model (DDIM)
Loss Functions
- Variational Lower Bound (VLB)
- Evidence Lower Bound (ELBO)
- Kullback-Leibler divergence (KL divergence)
- Mean Squared Error (MSE)
Score Based Generative Model
- Annealing
- Noise conditional score network (NCSN)
- Equivalence: DDPM and Score BBased Generative Models
Conditional (Guided) Generation
- Classifier Guidance
- Classifier Free Guidance (CFG)
Latent Varible Generative Model
- Latent Diffusion Model (LDM)
- Lower Dimension (Latent) Space
References:
- https://en.wikipedia.org/wiki/Diffusion_model
- https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
- https://www.ibm.com/think/topics/diffusion-models
- https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
- Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992
Thursday, May 30, 2024
Mixture of Experts (MoE) Architecture
Enhancement to LLMs to align with expert models paradigm.
- Each expert implemented as a separate Feed Forward Network (FFN) (though other trainable ML models Backprop should work).
- The expert FFNs are introduced in parallel to the existing FFN layer after the Attention Layer.
- Decision to route tokens to the expert is by a router.
- Router is implemented a linear layer followed by a Softmax for probability of each expert, to pick the top few.
Wednesday, March 31, 2021
Flip side to Technology - Extractivism, Exploitation, Inequality, Disparity, Ecological Damage
Anatomy of an AI system is a real eye-opener. This helps us to get a high level view of the enormous complexity and scale of the supply chains, manufacturers, assemblers, miners, transporters and other links that collaborate at a global scale to help commercialize something like an Amazon ECHO device.
The authors explain how extreme exploitation of human labour, environment and resources that happen at various levels largely remain unacknowledged and unaccounted for. Right from mining of rare elements, to smelting and refining, to shipping and transportation, to component manufacture and assembly, etc. these mostly happen under in-human conditions with complete disregard for health, well-being, safety of workers who are given miserable wages. These processes also cause irreversible damage to the ecology and environment at large.
Though Amazon Echo as an AI powered self-learning device connected to cloud-based web-services opens up several privacy, safety, intrusion and digital exploitation concerns for the end-user, yet focusing solely on Echo would amount to missing the forest for the trees! Most issues highlighted here would be equally true of technologies from many other traditional and non-AI, or not-yet-AI, powered sectors like automobiles, electronics, telecom, etc. Time to give a thought to these issues and bring a stop to the irreversible damage to humans lives, well-being, finances, equality, and to the environment and planetary resources!
Friday, February 28, 2020
Defence R&D Organisation Young Scientists Lab (DYSL)
Recently there was quite a lot of buzz in the media about the launch of DRDO Young Scientists Lab (DYSL). 5 such labs have been formed by DRDO each headed by a young director under the age of 35! Each lab has its own specialized focus area from among fields such as AI, Quantum Computing, Cognitive Technologies, Asymmetric Technologies and Smart Materials.
When trying to look for specifics on what these labs are doing, particularly the AI lab, there is very little to go by for now. While a lot of information about the vintage DRDO Centre of AI and Robotics (CAIR) lab is available on the DRDO website, there's practically nothing there regarding the newly formed DRDO Young Scientists Lab on AI (DYSL-AI). Neither are the details available anywhere else in the public domain, till end-Feb 2020 atleast. While these would certainly get updated soon for now there are just these interviews with the directors of the DYSL labs:
- Doordarshan's Y-Factor Interview with the 5 DYSL Directors Mr. Parvathaneni Shiva Prasad, Mr. Manish Pratap Singh, Mr. Ramakrishnan Raghavan, Mr. Santu Sardar, Mr. Sunny Manchanda
- Rajya Sabha TV Interview with DYSL-AI Director Mr. Sunny Manchanda
Wednesday, February 26, 2020
Sampling Plan for Binomial Population with Zero Defects
- Large n (> 15), large p (>0.1) => Normal Approximation
- Large n (> 15), small p (<0.1) => Poisson Approximation
- Small n (< 15), small p (<0.1) => Binomial Table
On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.
These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:
- Rule of 3s: That provides a bound for p=3/n, with a 95% confidence for a given success run of length n, with zero defects.
- Success run sample size (n) using Confidence Interval (C) & Reliability (R = 1 -p), when sampling with replacement (sampled item replaced and maybe selected again) taking a Binomial Distribution:
n = ln(1-C)/ln( R), where ln is the natural log, based on a probability of R^n, for a successful run of length n with zero defects. This can be further extended to the case when a maximum of k defects/ failures is acceptable.
Footnote on Distributions:
- Poisson distribution is used to model events within time/ space that are rare (small p) but show up large number of times (large n) & occur independent of the time since last event. Inter arrival times is an iid exponential random variables.
- Poisson confidence interval is derived from Gamma Distribution - which is defined using the two-parameters shape & scale. Exponential, Erlang & Chi-Squared are all special cases of Gamma Distrubtion. Gamma distribution is used in areas such as prediction of wait time, insurance claims, wireless communication signal power fading, age distribution of cancer events, inter-spike intervals, genomics. Gamma is also the conjugate prior of Bayesian statistics & exponential distribution.
- Bayesian Success Run can be derived using the Beta Distribution which is the conjugate prior for Binomial. Beta Distribution is defined via two shape parameters. Beta Distribution applications is found in order statistics (selection of k-th smallest from Uniform distribution), subjective logic, wavelet analysis, project management (PERT), etc.
Wednesday, September 18, 2019
Sim Swap Behind Twitter CEO's Account Hack
SIM swap fraud can be done by some form of social engineering and stealing/ illegally sharing personal data of user used to authenticate with the telecom operator. The other way is by malware or virus infected app or hardware taking over the user's device, or by plain old manipulation of personnel of the telecom company through pressure tactics, bribes, etc.
In order to limit cases of frauds DOT India has brought in a few mandatory checks into the process of swapping/ upgrading sim cards to be followed by all telecom operators. These include IVRS based confirmation call to the subscriber on current working sim, confirmation SMS to current working sim, and blocking of SMS features for 24 hours after swapping of sim.
The window of 24 hours is reasonably sized to allow the actual owner to react in case of a fraud thanks to these checks. Once they realize that their phone has mysteriously gone completely out of network coverage for long, and doesn't seem to work even after restarting and switching to a location known to have good coverage alarm bells ought to go off. Immediately they should contact the telecom operator's helpline number/ visit the official store.
At the same time, the window of 24 hours is not excessively long to discomfort a genuine user wanting to swap/ upgrade. Since SMS services remains disabled, SMS based OTP authentication for apps, banking etc. do not work within this period of time, thereby preventing misuse by fraudsters.
Perhaps, telecom regulators & players elsewhere need to follow suit. Twitter meanwhile has chosen to apply a band-aid solution by turning off their tweet via SMS feature post the hack. Clearly a lot more needs to be done to put an end to the menace.
Tuesday, March 26, 2019
Opinions On A Topic
There's a real need to automatically detect, flag & block misleading information from propagating. Though at the moment the technology doesn't exist, offerings are very likely to come up soon & get refined over time to nail the problem well enough. While we await breakthroughs on that front, for now the best bet is to depend on traditional human judgment.
- Make use of a set (not one or two) of trusted media sources, that employ professionals & expert journalists. Rely on their expertise to do the job of collecting & presenting the facts correctly. Assuming (hopefully) that these people/ organizations behave professionally, the information that gets through to these sources would be far better.
- Fact check details across the entire set of sources. This helps mitigate against a temporary (or permanent) deliberate/ inadvertent faltering, manipulation, influence, etc. of one odd sources. Use the set as a weak quorum that collectively highlights & prevents propagation of misinformation. Even if a few members there falter, unlikely that all would. The majority would not allow the fakes to make it into their respective channels.
- Challenging part being if a certain piece shows up as a breaking news on one channel & not the others. Could default to labeling it as fake/ unverified, with the following considerations for the news piece:
Case 1: Turns out fake, doesn't show up on the other sources
=> Remains Correctly Marked Fake
Case 2: Turns out to be genuine & eventually shows up on other/ majority sources
=> Gets Correctly Marked True
Case 3: Is genuine, but acquired via some form of journalistic brilliance (expose, criminal/ undercover journalism, etc.) that can't be re-run, or is about a region/ issue largely ignored by the mainstream media unwilling to do the verification, or for some other reason can't be verified
=> Remains Incorrectly Marked Fake
Case 3 is obviously the toughest to crack. While some specifics maybe impossible to verify, other allied details could be easier to access & verify. Once some other media groups (beyond the one that reported) get involved in the secondary verification there is some likelihood of true facts emerging.
For those marginalized there are social groups & organizations, governmental & non-governmental that have some reports published on issues from ground zero. At the same time, as connectivity improves, citizens themselves would be able to bring forth local issues onto national & international platforms. In the interim, these will have to be relied upon until commercial interests & mainstream media eventually bring the marginalized into the folds. Nonetheless, much more thought & effort is needed to check the spread of misinformation.
Finally, here's a little script 'op-on.sh' / 'op-on.py' (works/ tested on *nix desktop), to look up opinions (buzz) on any given topic across a set of media agencies, of repute. Alternatively, a bookmarklet could be added to the browser, which would enable looking up the opinions across the sites. The op-on bookmarklet (tested on Firefox & Chrome) can be installed by right clicking & adding as a bookmark in the browser (or by copying the script into the url of a new bookmark). Pop-up blockers in the browser will need to be temporarily disabled (e.g. by clicking allow pop-ups in Firefox) for the script to work.
The set of media agencies that these scripts look up include groups like TOI, IE, India Today, Times Now, WION, Ndtv, Hindu, HT, Print, Quint, Week, Reuters, BBC, and so on. This might help the curious human reader to look up all those sources for opinions on any topic of interest.
Update 1 (16-Sep-19): Some interesting developments:
- Google giving priority to original news article above the ones that are derived/ sourced from the original. A step that may prove beneficial to the spirit of original fearless journalism. (Sep'19)
- Economist's model revealing the underlying biases, or lack of it, with the Google news reporting algorithm. The title captures the crux of the article "Google rewards reputable reporting, not left-wing politics". (Jun'19)
- Ravish Kumar's speech on winning the Magasaysay award for 2019. He touches upon the various challenges, including issues like fake news, propaganda, among many others confronting the professional journalist in todays times, particularly out here in the largest democracy. (Sep'19)
- CSDS report on the impact of social media & others (paper, tv) on the voting behaviour of the Indian voter based on a survey of 24K+ voters from 211 constituencies. (Jun'19)
- NGO Digital Empowerment Foundation report (Fighting Fake News, WHOSE RESPONSIBILITY IS IT? ) on the awareness of fake news, disinformation, misuse of technology & so on among Indians from Tier II & Tier III cities. Study's based on a sample of 3K users across 11 states. Good staring point for those looking to solve the problem, though sample set is small. Also behaviour would differ vastly across regions, languages, metros, villages, etc. One size fits all approach is not likely to work. (Apr'19)
- BBC World Service study titled Duty, Identity, Credibility: 'Fake News' and the Ordinary Citizen in India, under the Beyond Fake News programme (Nov'18).
- Time to make Hopi language like features widespread. Hopi language has grammatical markers that specify whether you witnessed the event yourself, heard about it from someone else, or consider it to be an unchanging truth. Hopi speakers are forced by Hopi grammar to habitually frame all descriptions of reality in terms of the source and reliability of their information. (Source: Chapter 1: The Promise and Politics of the Mother Tongue from the book "The Horse, The Wheel & Language")