Showing posts with label Large Language Model. Show all posts
Showing posts with label Large Language Model. Show all posts

Thursday, May 30, 2024

Mixture of Experts (MoE) Architecture

Enhancement to LLMs to align with expert models paradigm. 

  • Each expert implemented as a separate Feed Forward Network (FFN) (though other trainable ML models Backprop should work).
  • The expert FFNs are introduced in parallel to the existing FFN layer after the Attention Layer.
  • Decision to route tokens to the expert is by a router. 
  • Router is implemented a linear layer followed by a Softmax for probability of each expert, to pick the top few.