Deep Learning – Wisdombox

Between Memorization and Meaning: When Neural Networks Learn, But Not the Way We Expect

An empirical exploration of why arithmetic models sometimes neither memorize nor generalize (Part 2 of an ongoing exploration into how small language models learn arithmetic) In the previous post, I described an experiment that started with a simple goal:to observe memorization and eventual grokking in a small language model trained on arithmetic operations. The setupContinueContinue reading “Between Memorization and Meaning: When Neural Networks Learn, But Not the Way We Expect”

Why arithmetic models look dumb long after they’ve learned the rule

An experiment in memorization, grokking, and misleading loss curves This post documents an experiment that didn’t go the way I expected.What started as a simple attempt to observe memorization and grokking in arithmetic models turned into a deeper lesson about how misleading loss curves can be — especially for algorithmic tasks. What I expected toContinueContinue reading “Why arithmetic models look dumb long after they’ve learned the rule”