An experiment in memorization, grokking, and misleading loss curves This post documents an experiment that didn’t go the way I expected.What started as a simple attempt to observe memorization and grokking in arithmetic models turned into a deeper lesson about how misleading loss curves can be — especially for algorithmic tasks. What I expected toContinueContinue reading “Why arithmetic models look dumb long after they’ve learned the rule”