Showing posts with label Artificial Neural Network. Show all posts
Showing posts with label Artificial Neural Network. Show all posts

Sunday, April 6, 2025

Model Context Protocol (MCP)

Standardization Protocol for AI agents. Enables them to act, inter-connect, process, parse, invoke functions. In other words to Crawl, Browse, Search, click, etc. 

MCP re-uses well known client-server architecture using JSON-RPC. 

Apps use MCP Clients -> MCP Servers (abstracts the service)

Kind of API++ for an AI world!

Wednesday, April 2, 2025

The Big Book of LLM

A book by Damien Benveniste of AIEdge. Though a work in progress, chapters 2 - 4 available for preview are fantastic. 

Look forward to a paperback edition, which I certainly hope to own...

Tuesday, April 1, 2025

Mozilla.ai

Mozilla pedigree, AI focus, Open-source, Dev oriented.

Blueprint Hub: Mozilla.ai's Hub of open-source templtaized customizable AI solutions for developers.

Lumigator: Platform for model evaluation and selection. Consists a Python FastAPI backend for AI lifecycle management & capturing workflow data useful for evaluation.

Saturday, March 15, 2025

Scaling Laws

Quick notes around Chinchilla Scaling Law/ Limits & beyond for DeepLearning and LLMs.

Factors

  • Model size (N)
  • Dataset size (D)
  • Training Cost (aka Compute) (C)
  • Test Cross-entropy loss (L)

The intuitive way,

  • Larger data will need a larger model, and have higher training cost. In other words, N, D, C all increase together, not necessarily linearly, could be exponential, log-linear, etc.
  • Likewise Loss is likely to increase for larger datasets. So an inverse relationship between L & D (& the rest).
  • Tying them into equations would be some constants (scaling, exponential, alpha, beta, etc), unknown for now (identified later).

Beyond common sense, the theoretical foundations linking the factors aren't available right now. Perhaps the nature of the problem is it's hard (NP).

The next best thing then, is to somehow work out the relationships/ bounds empirically. To work with existing Deep Learning models, LLMs, etc using large data sets spanning TB/ PB of data, Trillions of parameters, etc using large compute budget cumulatively spanning years.

Papers by Hestness & Narang, Kaplan, Chinchilla are all attempts along the empirical route. So are more recent papers like Mosaic, DeepSeek, MoE, Llam3, Microsoft among many others. 

Key take away being,

  • The scale & bounds are getting larger over time. 
  • Models from a couple of years back, are found to be grossly under-trained in terms of volumes of training data used. They should have been trained on an order of magnitude larger training data for an optimal training, without risk of overfitting.
  • Conversely, the previously used data volumes are suited to much smaller models (SLMs), with inference capabilities similar to those older LLMs.

References

  • https://en.wikipedia.org/wiki/Neural_scaling_law
  • https://lifearchitect.ai/chinchilla/
  • https://medium.com/@raniahossam/chinchilla-scaling-laws-for-large-language-models-llms-40c434e4e1c1
  • https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
  • https://medium.com/nlplanet/two-minutes-nlp-scaling-laws-for-neural-language-models-add6061aece7
  • https://lifearchitect.ai/the-sky-is-bigger/

Friday, February 28, 2025

Diffusion Models

Diffusion

  •     Forward, Backward (Learning), Sampling (Random)    
  •     Continous Diffusion
  •     VAE, Denoising Autoencoder
  •     Markov Chains
  •     U-Net
  •     DALL-E (OpenAI), Stable Diffusion,
  •     Imagen, Muse, VEO (Google)
  •     LLaDa, Mercury Coder (Inception)

Non-equilibrium Thermodynamics

  •     Langevin dynamics
  •     Thermodynamic Equilibrium - Boltzmann Distribution
  •     Wiener Process - Multidimensional Brownian Motion
  •     Energy Based Models

Gaussian Noise

  •     Denoising
  •     Noise/ Variance Schedule
  •     Derivation by Reparameterization

Variational Inference    

  •     Denoising Diffusion Probabilistic Model (DDPM)
  •     Noise Prediction Networks    
  •     Denoising Diffusion Implicit Model (DDIM)

Loss Functions

  •     Variational Lower Bound (VLB)
  •     Evidence Lower Bound (ELBO)
  •     Kullback-Leibler divergence (KL divergence)
  •     Mean Squared Error (MSE)

Score Based Generative Model

  •     Annealing
  •     Noise conditional score network (NCSN)
  •     Equivalence: DDPM and Score BBased Generative Models

Conditional (Guided) Generation

  •     Classifier Guidance    
  •     Classifier Free Guidance (CFG)

Latent Varible Generative Model

  •     Latent Diffusion Model (LDM)
  •     Lower Dimension (Latent) Space

References:

  • https://en.wikipedia.org/wiki/Diffusion_model
  • https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
  • https://www.ibm.com/think/topics/diffusion-models
  • https://hackernoon.com/what-is-a-diffusion-llm-and-why-does-it-matter
  • Large Language Diffusion Models (LLaDA): https://arxiv.org/abs/2502.09992



Sunday, January 26, 2025

Mechanistic Interpretability

  • Clearer better understanding of Neural Networks working (white box).
  • Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features

References

  • https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
  • https://www.neelnanda.io/mechanistic-interpretability/glossary
  • https://transformer-circuits.pub/2022/toy_model/index.html
  • https://www.anthropic.com/research/superposition-memorization-and-double-descent
  • https://transformer-circuits.pub/2023/toy-double-descent/index.html 

Friday, January 24, 2025

State Space Models

  • Vector Space of States (of the System)
  • Alt. to Transformers, reducible to one another 
 
        (Image source: https://en.wikipedia.org/wiki/State-space_representation)

References

  • https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
  • https://huggingface.co/blog/lbourdois/ssm-2022
  • https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
  • https://en.wikipedia.org/wiki/State-space_representation

Friday, February 28, 2020

Defence R&D Organisation Young Scientists Lab (DYSL)


Recently there was quite a lot of buzz in the media about the launch of DRDO Young Scientists Lab (DYSL). 5 such labs have been formed by DRDO each headed by a young director under the age of 35! Each lab has its own specialized focus area from among fields such as AI, Quantum Computing, Cognitive Technologies, Asymmetric Technologies and Smart Materials.

When trying to look for specifics on what these labs are doing, particularly the AI lab, there is very little to go by for now. While a lot of information about the vintage DRDO Centre of AI and Robotics (CAIR) lab is available on the DRDO website, there's practically nothing there regarding the newly formed DRDO Young Scientists Lab on AI (DYSL-AI). Neither are the details available anywhere else in the public domain, till end-Feb 2020 atleast. While these would certainly get updated soon for now there are just these interviews with the directors of the DYSL labs:

  • Doordarshan's Y-Factor Interview with the 5 DYSL Directors Mr. Parvathaneni Shiva Prasad, Mr. Manish Pratap Singh, Mr. Ramakrishnan Raghavan, Mr. Santu Sardar, Mr. Sunny Manchanda







  • Rajya Sabha TV Interview with DYSL-AI Director Mr. Sunny Manchanda





Monday, April 9, 2018

Learning Deep

Head out straight to KdNugget's Top 20 Deep Learning Papers of 2018. Has a good listing of research publications spanning over the last 4-5 years. You could further go on to read the papers referred to within these papers & then those referred to in the referred papers & so on for some really deep learning!