An in-depth exploration inspired by Yann LeCun and Bill Dally's GTC 2025 discussion, detailing my thinking process on integrating JEPA with transformer models and developing dynamic learning and memory management in AGI.
A deep dive into Ring Attention—how models like Gemini and Claude handle long contexts using clever memory tricks like sliding windows, compressed memory, and selective token referencing.