An empirical exploration of why arithmetic models sometimes neither memorize nor generalize (Part 2 of an ongoing exploration into how small language models learn arithmetic) In the previous post, I described an experiment that started with a simple goal:to observe memorization and eventual grokking in a small language model trained on arithmetic operations. The setupContinueContinue reading “Between Memorization and Meaning: When Neural Networks Learn, But Not the Way We Expect”
Category Archives: AI
Why arithmetic models look dumb long after they’ve learned the rule
An experiment in memorization, grokking, and misleading loss curves This post documents an experiment that didn’t go the way I expected.What started as a simple attempt to observe memorization and grokking in arithmetic models turned into a deeper lesson about how misleading loss curves can be — especially for algorithmic tasks. What I expected toContinueContinue reading “Why arithmetic models look dumb long after they’ve learned the rule”