CAD: Disaggregating Core Attention for Efficient Long-Context LLM Training

6 pointsposted 2 months ago
by ginda307

1 Comments

user

2 months ago

[deleted]