Has Cursor started training models on private user data?

4 pointsposted 5 months ago
by jg2007

4 Comments

jg2007

5 months ago

Cursor recently published a new blog outlining how they train models. Interestingly, the blog does not clarify how they handle opt-out user data and/or business user data -- exact phrasing: "[cursor's] model runs on every user action, handling over 400 million requests per day. As a result, we have a lot of data about which suggestions users accept and reject. This post describes how we use this data to improve Tab using online reinforcement learning."

As a matter of fact, the wording sounds like all cursor user data (opt-in and opt-out alike) are being used.

Anyone knows what's going on behind the scenes?

NitpickLawyer

5 months ago

If you read the fineprint, they all say mostly the same variation on "we do not train foundational models on your data". That is not to say they won't train other models, or use signals to train other models. It's just the data that doesn't get copied to the training set.

And this makes sense. You train on your own data, and use the signals to know if your run was good or not.

reasonableklout

5 months ago

Don't they transparently say that they train models on your actions by default unless you opt-out as part of the install flow?

fithisux

5 months ago

That is why I use VScodium or Theia and Positron.

No AI features enabled.