Algorithms, Design, Code and more: Mechanistic Interpretability

Sunday, January 26, 2025

Mechanistic Interpretability

Clearer better understanding of Neural Networks working (white box).
Strong grounds for Superposition: n-dimensions (neurons) represent more than n-features

References

https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
https://www.neelnanda.io/mechanistic-interpretability/glossary
https://transformer-circuits.pub/2022/toy_model/index.html
https://www.anthropic.com/research/superposition-memorization-and-double-descent
https://transformer-circuits.pub/2023/toy-double-descent/index.html

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)