CAD: Disaggregating Core Attention for Efficient Long-Context LLM Training

6 pointsposted 10 hours ago
by ginda307

1 Comments

user

10 hours ago

[deleted]