The Curved Spacetime of Transformer Architectures

2 pointsposted 7 hours ago
by luis_likes_math

1 Comments

luis_likes_math

7 hours ago

We introduce a geometric framework for understanding Transformer language models through an analogy with General Relativity. In this view, keys and queries define a curved “space of meaning,” and attention acts like gravity, moving information across it. Layers represent discrete time steps where token representations evolve along curved—not straight—paths shaped by context. Through visualization and simulation experiments, we show that these trajectories indeed bend and reorient, confirming the presence of attention-induced curvature in embedding space.