Sunday, January 26, 2025

Mechanistic Interpretability

  • Clearer better understanding of Neural Networks working (white box).
  • Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features

References

  • https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
  • https://www.neelnanda.io/mechanistic-interpretability/glossary
  • https://transformer-circuits.pub/2022/toy_model/index.html
  • https://www.anthropic.com/research/superposition-memorization-and-double-descent
  • https://transformer-circuits.pub/2023/toy-double-descent/index.html 

No comments:

Post a Comment