When machine learning tells the wrong story

304 pointsposted 5 days ago
by jackcook

31 Comments

Upvoter33

4 days ago

Good article, neat research behind it.

I think the paper's contributions really don't have anything to do with ML; it's about the new side channel with interrupts, which is a cool find. ML just gets more people to read it, which I guess is ok. I mean, you could just use "statistics" here in much the same way.

I remember an advisor once telling me: once you figure out what a paper is really about, rewrite it, and remove the stuff you used to think it was about. The title of this paper should be about the new side channel, not about the ML story, imho.

But this is just a nitpick. Great work!

jackcook

4 days ago

Thanks for reading! The two stories are of course deeply intertwined: we wouldn’t have found the new side channel without the cautionary tale about machine learning.

But the finding about ML misinterpretation is particularly notable because it calls a lot of existing computer architecture research into question. In the past, attacks like this were very difficult to pull off without an in-depth understanding of the side channel being exploited. But ML models (in this case, an LSTM) generally go a bit beyond “statistics” because they unlock much greater accuracy, making it much easier to develop powerful attacks that exploit side channels that aren’t really understood. And there are a lot of ML-assisted attacks created in this fashion today: the Shusterman et al. paper alone has almost 200 citations, a huge amount for a computer architecture paper.

The point of publishing this kind of research is to better understand our systems so we can build stronger defenses — the cost of getting this wrong and misleading the community is pretty high. And this would technically still be true even if we ultimately found that the cache was responsible for the prior attack. But of course, it helps that we discovered a new side channel along the way — this really drove our point home. I probably could have emphasized this more in my blogpost.

albert_e

4 days ago

yes - I also feel this does not have strong new findings about ML except some common sense that all ML practitioners should have: that is, do not interpret ML results as cause-and-effect explanations when the data you have captured and modelled does not warrant it.

Maybe in the real world, this common sense gets lost in the deluge of correlations when people are immersed in a sea of data -- but good experiment design and peer review should ideally sift out any unsound conclusions and interpretations -- which, to be fair, this replication study does an excellent job of!

Well done, and good luck to the OP!

thunderbong

5 days ago

Wonderful article. I never thought I'll be able to understand side-channel attacks so easily.

The article read like a murder mystery where you know who the villain is right in the beginning but you need to find it how they did it!

Marked as favorite.

8n4vidtmkvmk

4 days ago

Almost didn't read it because of the length and how it starts. Generally I just want the meat, not the backstory. But your comment convinced me to read it, and it is indeed great!

brabel

4 days ago

> Next year, I will start my six-year PhD in computer science back at MIT, and I could not be more thrilled!

Incredible... and all started because the author had a "lucky idea" to try something random like using a counter instead of the much more advanced cache-evicting attack of the original side channel attack... which only worked because of concepts they had no idea about at the time :D

I am one of the probably thousands of others who were not so lucky and quickly abandoned the idea of staying in academia and went to work in the industry for a mediocre career.

I started an Honours Degree (kind of Masters in Australia) in Computer Science where I wanted to write a Thesis on Artificial Intelligence (this was much earlier than the current AI hype, circa 2010) based on AI applications I had studied in the regular AI course (how AI was being used by wineries to improve their wine quality and production - I wanted to try and apply their techniques on more "general" applications) but the supervisor I got had zero interest in helping, and I had zero support from anyone else, so it was impossible to continue, specially when I had a full time job offer for quite a good salary, and if I had done so I would probably never get anywhere... as the author mentions, it was thanks to their supervisor and to others who helped him along the way that everything just happened for him... alone, you must be extremely driven and talented to get anywhere, which I think I wasn't either.

tokinonagare

4 days ago

Being at the good place with the good people is indeed a very important factor for succeeding or not. In my first PhD experience in Japan my professor and the others just kept criticizing whatever I proposed for 3 years without giving me actionable ideas. The prof in the lab next door loved my research, sadly I found him too late to switch lab. Now I'm at a place with half of the people in the country that can understand fully another project of mine and give a shit (that's a grand number of 2 people), and my project already benefited from some of their data. Plus the director likes me too and includes my in the lab activity even if I'm not officially affiliated to the lab. Now, that's the environment I can succeed in. My take way is finding the right environment and people may be difficult, but it's crucial other even very good work is done for nothing.

albert_e

4 days ago

Great read!

Tangent and the smallest of nitpicks about the page: the <HR> element's styling with a line of big dots confused me into thinking it was the position indicator of an image carousel!

pandaxtc

5 days ago

This article is awesome, your writing is super approachable and the interactive demos are really cool. I also appreciate the background on how you got into doing this sort of thing.

jackcook

5 days ago

Thank you! Really appreciate it

user

5 days ago

[deleted]

netaustin

5 days ago

Very interesting and well-explained. Given that the research has been out for two years, any interested data collectors have considered this! Forget hackers, this an exploit for enterprises and governments!

Could websites concerned with privacy deploy a package that triggers interrupts randomly? Could a browser extension do it for every site?

jackcook

4 days ago

Websites doing this would have to be careful about it: they might become the only website triggering a lot of interrupts randomly, which then makes them easy to identify.

Our countermeasure which triggers interrupts randomly is implemented as a browser extension, the source code for which is available here: https://github.com/jackcook/bigger-fish

I'm not sure I would recommend it for daily use though, I think our tests showed it slowed page load times down by about 10%.

vessenes

4 days ago

I'm on safari/macOS, and many of the counting related demonstrations did not vary as much as claimed -- some did, with significant computer use, but I'd bet some mitigations have been implemented already in Safari.

Nevertheless, EXTREMELY cool paper.

j_crick

4 days ago

This was interesting, well written and not difficult to understand, and we need more stuff like that. Thank you!

bsenftner

4 days ago

So well written, if you were not a successful researcher, I'd suggest you go into writing. That was a pleasure to read.

gregdeon

5 days ago

Wonderfully written article! Thanks for sharing.

Everdred2dx

4 days ago

One of the most approachable distillations of CS research I've ever read. Thanks!

amadeoeoeo

4 days ago

This is awesome, interesting and well explained. Thank you! I would love to hear more about it. I wonder which concrete real life uses this might have (had).

jll29

4 days ago

Great piece of technical writing.

I hope when Jack is in Oxford he'll also visit Cambridge to give a guest talk in the late Ross Anderson's former group.

shakna

5 days ago

I wonder if adopting io_uring on Linux might allow a browser to preserve the privacy a little, in this specific case. (Though it is very hard to get right, unfortunately.)

astrobe_

4 days ago

As far as browsers are concerned the actual solution is banning Javascript from regular Web. JS is basically remote code execution (even more so since JIT became the norm); it is a terrible idea that will continue to create all sorts of problems.

pjdesno

5 days ago

My suspicion is that io_uring itself mitigates syscall overhead but doesn't do anything to change interrupts.

You could probably do things at the OS level to change interrupt behavior in a way that would mitigate this attack significantly, I'll need to read the paper to see if they discuss this.

pjdesno

4 days ago

Interrupt noise can be eliminated by eliminating the interrupts themselves using user-space drivers like SPDK and DPDK for storage and networking, but (a) that would require a massive change in application architecture, and (b) it wouldn't help non-movable interrupts like softirqs or IPIs for rescheduling and TLB shootdown.

Softirqs aren't really interrupts, and they're totally under kernel control, so it might be possible to spread them out across cores or otherwise reduce their signal.

Eliminating noise from IPIs for rescheduling and TLB shootdown might require crazy architectural changes to the CPU - for instance an architecturally isolated fast timer which is basically a separate CPU, polls a queue of TLB shootdown requests and a wakeup request flag, and can exit without waking the CPU from a halt.

Fuzzing the timer seems like a hack - it doesn't eliminate the information leakage, but just makes it harder to measure. You can eliminate the signal by only reporting the amount of time that passes in user mode, but that results in a clock that can be arbitrarily slower than wall clock time. I suppose you could add a correction factor that's heavily filtered, so the final timer is never off by more than a constant amount, but this would have to be implemented as a new OS timer type with instrumentation in every interrupt handler, and then Javascript would have to be updated to use that new timer.

vlovich123

5 days ago

Correct. Probably the only way to mitigate interrupt stuff today is what they mentioned - you inject noise into the system intentionally with their example being to make network requests to local addresses. Fundamentally though the challenge is that if you start doing that, you probably start degrading performance fairly quickly for your neighbors. It’s really hard to balance mitigations that retain good performance. A more comprehensive solution probably involves a redesign of how we build CPUs and operating systems rather than trying to keep fighting this in software.

vlovich123

5 days ago

Let me correct something I stated. To get this correct you still need a timer which you can inject noise into to screw up measurements in theory.

atoav

4 days ago

I come from embedded audio programming, where e.g. the variable loads of UI code can be problematic (=audible) for audio quality if you don't do things right.

Maybe we need to do things the other way around? So instead of trying to mask everything we are doing, we run browsers/tabs in a processing environment where the noise can't be measured because it does not occur during the same time window. In audio that is done by using a high priority fixed timer that interrupts the rest of the processing.

My OS knowledge is too marginal to know whether that would be truly feasible, but I can't help to think: yeah it is possible to fix that on a more fundamental level.

rsktaker

5 days ago

It takes a lot of time and effort to decide how to best explain something. Thank you, this was a wonderful read!

danhudlow

3 days ago

Is there a significant difference in accuracy if the victim website is loaded offscreen?

teleforce

4 days ago

>I don’t want to name an actual social media platform, so I’ll just make one up: let’s call it Facebook

I see what you did there