Showing posts with label Artificial Neural Network. Show all posts
Showing posts with label Artificial Neural Network. Show all posts

Friday, April 18, 2025

AI Agentic Frameworks

With prolification of AI Agents, it's only logical that there will be attempts at standardization and building protocols & frameworks:

Thursday, April 17, 2025

On Quantization

  • Speed vs Accuracy trade off.
  • Reduce costs on storage, compute, operations .
  • Speed up output generation, inference, etc.
  • Work with lower precision data.
  • Cast/ map data from Int32, Float32, etc 32-bit or higher precision to lower precision data types such as 16-bit Brain Float (BFloat16) or 4-bit (NFloat)/ int4 or int8, etc.
    • East mapping Float32 (1-bit Sign, 7-bit Exponent, 23-bit Mantissa) => BFloat16 (1-bit Sign, 7-bit Exponent, 7-bit Mantissa). Just discard the higher 16-bits of mantissa. No overflow!
    • Straightforward mapping work out max, min, data distribution, mean, variance, etc & then sub-divide into equally sized buckets based on bit size of the lower precision data type. E.g int4 (4-bit) => 2^4 = 16 buckets. 
    • Handle outliers, data skew which can mess up the mapping, yet lead to loss of useful info if discarded randomly.
    • Work out Bounds wrt Loss of Accuracy.

LLMs, AI/ ML side:

  • https://newsletter.theaiedge.io/p/reduce-ai-model-operational-costs

Lucene, Search side:

  • https://www.elastic.co/search-labs/blog/scalar-quantization-101
  • https://www.elastic.co/search-labs/blog/scalar-quantization-in-lucene

Wednesday, April 16, 2025

Speculative Decoding

  • Ensemble of Weak + Strong model
  • Weak model has a quick first go at generating tokens/ inference (potentials)
  • Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
  • Overall making inferences via LLMs quicker and cheaper

More to follow..

  • https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/ 
  • https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
  • https://research.google/blog/looking-back-at-speculative-decoding/
  • https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120

Sunday, April 6, 2025

Model Context Protocol (MCP)

Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc. 

MCP re-uses well known client-server architecture using JSON-RPC. 

Apps use MCP Clients -> MCP Servers (abstracts the service)

Kind of API++ for an AI world!

Wednesday, April 2, 2025

The Big Book of LLM

A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic. 

Look forward to a paperback edition, which I certainly hope to own...

Tuesday, April 1, 2025

Mozilla.ai

Mozilla pedigree, AI focus, Open-source, Dev oriented.

Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.

Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.

Saturday, March 15, 2025

Scaling Laws

Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.

Factors

  • Model size (N)
  • Dataset size (D)
  • Training Cost (aka Compute) (C)
  • Test Cross-entropy loss (L)

The intuitive way,

  • Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
  • Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
  • Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).

Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).

The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.

Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others. 

Key take away being,

  • The scale & bounds are getting larger over time. 
  • Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
  • Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.

References

  • https://en.wikipedia.org/wiki/Neural_scaling_law
  • https://lifearchitect.ai/chinchilla/
  • https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
  • https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
  • https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
  • https://lifearchitect.ai/the-sky-is-bigger/

Friday, February 28, 2025

Diffusion Models

Diffusion

  •     Forward, Backward (Learning), Sampling (Random)    
  •     Continous Diffusion
  •     VAE, Denoising Autoencoder
  •     Markov Chains
  •     U-Net
  •     DALL-E (OpenAI), Stable Diffusion,
  •     Imagen, Muse, VEO (Google)
  •     LLaDa, Mercury Coder (Inception)

Non-equilibrium Thermodynamics

  •     Langevin dynamics
  •     Thermodynamic Equilibrium - Boltzmann Distribution
  •     Wiener Process - Multidimensional Brownian Motion
  •     Energy Based Models

Gaussian Noise

  •     Denoising
  •     Noise/ Variance Schedule
  •     Derivation by Reparameterization

Variational Inference    

  •     Denoising Diffusion Probabilistic Model (DDPM)
  •     Noise Prediction Networks    
  •     Denoising Diffusion Implicit Model (DDIM)

Loss Functions

  •     Variational Lower Bound (VLB)
  •     Evidence Lower Bound (ELBO)
  •     Kullback-Leibler divergence (KL divergence)
  •     Mean Squared Error (MSE)

Score Based Generative Model

  •     Annealing
  •     Noise conditional score network (NCSN)
  •     Equivalence: DDPM and Score BBased Generative Models

Conditional (Guided) Generation

  •     Classifier Guidance    
  •     Classifier Free Guidance (CFG)

Latent Varible Generative Model

  •     Latent Diffusion Model (LDM)
  •     Lower Dimension (Latent) Space

References:

  • https://en.wikipedia.org/wiki/Diffusion_model
  • https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
  • https://www.ibm.com/think/topics/diffusion-models
  • https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
  • Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992



Sunday, January 26, 2025

Mechanistic Interpretability

  • Clearer better understanding of Neural Networks working (white box).
  • Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features

References

  • https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
  • https://www.neelnanda.io/mechanistic-interpretability/glossary
  • https://transformer-circuits.pub/2022/toy_model/index.html
  • https://www.anthropic.com/research/superposition-memorization-and-double-descent
  • https://transformer-circuits.pub/2023/toy-double-descent/index.html 

Friday, January 24, 2025

State Space Models

  • Vector Space of States (of the System)
  • Alt. to Transformers, reducible to one another 
 
        (Image source: https://en.wikipedia.org/wiki/State-space_representation)

References

  • https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
  • https://huggingface.co/blog/lbourdois/ssm-2022
  • https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
  • https://en.wikipedia.org/wiki/State-space_representation

Thursday, May 30, 2024

Mixture of Experts (MoE) Architecture

Enhancement to LLMs to align with expert models paradigm. 

  • Each expert implemented as a separate Feed Forward Network (FFN) (though other trainable ML models Backprop should work).
  • The expert FFNs are introduced in parallel to the existing FFN layer after the Attention Layer.
  • Decision to route tokens to the expert is by a router. 
  • Router is implemented a linear layer followed by a Softmax for probability of each expert, to pick the top few.

Friday, February 28, 2020

Defence R&D Organisation Young Scientists Lab (DYSL)


Recently there was quite a lot of buzz in the media about the launch of DRDO Young Scientists Lab (DYSL). 5 such labs have been formed by DRDO each headed by a young director under the age of 35! Each lab has its own specialized focus area from among fields such as AI, Quantum Computing, Cognitive Technologies, Asymmetric Technologies and Smart Materials.

When trying to look for specifics on what these labs are doing, particularly the AI lab, there is very little to go by for now. While a lot of information about the vintage DRDO Centre of AI and Robotics (CAIR) lab is available on the DRDO website, there's practically nothing there regarding the newly formed DRDO Young Scientists Lab on AI (DYSL-AI). Neither are the details available anywhere else in the public domain, till end-Feb 2020 atleast. While these would certainly get updated soon for now there are just these interviews with the directors of the DYSL labs:

  • Doordarshan's Y-Factor Interview with the 5 DYSL Directors Mr. Parvathaneni Shiva Prasad, Mr. Manish Pratap Singh, Mr. Ramakrishnan Raghavan, Mr. Santu Sardar, Mr. Sunny Manchanda







  • Rajya Sabha TV Interview with DYSL-AI Director Mr. Sunny Manchanda





Monday, April 9, 2018

Learning Deep

Head out straight to KdNugget's Top 20 Deep Learning Papers of 2018. Has a good listing of research publications spanning over the last 4-5 years. You could further go on to read the papers referred to within these papers & then those referred to in the referred papers & so on for some really deep learning!