How CERN serves 1EB of data via FUSE [video]

154 pointsposted 13 hours ago
by pabs3

58 Comments

niemandhier

7 hours ago

People here keep claiming “Anything is possible with unlimited budget”.

Cerns budget is 1.4 billion Euro, 50 million Euro for all IT infrastructure.

https://cds.cern.ch/record/2888205/files/English.pdf#page18

It’s not the money, it’s the people. Update: Added source.

atoav

6 hours ago

That kind of place can draw a certain kind of employee. This finding is hard to transfer to commercial projects. Sure employees will always claim to be really motivated, especially in the marketing material, but are they we-are-nerds-working-on-the-bleeding-edge-of-human-knowledge-motivated?

Probably not, but there is surely some manager out there who made themselves believe they can motivate their employees to show the same devotion for the self-made hardships of some mostely pointless SaaS product. If you want to grab that kind of spirit, what you do needs to fundamentally make sense beyond just making somebody money.

sligor

4 hours ago

That's exactly how we were able to go to the moon in 55 years ago. And why it's complicated today. It was of course lot of money. But it was mostly a lot of highly skilled, motivated devoted people doing for an ultimate common goal. Money would not have been sufficient by itself.

HPsquared

14 minutes ago

Since then, a LOT of the smart motivated people have been lured into either banking or adtech. The pay is good and the technical problems can be pretty interesting but the end result lacks that "wow factor".

wvh

an hour ago

In other words, if you permit, pure capitalism isn't a sufficiently good motive to get something significant done. But of course most of us don't work towards an ultimate common goal – and neither did most people in those times. One wonders if there is enough meaning left these days to go 'round and ensure most of us feel passionate about the stuff we (have to) do. Maybe we really need a god or war or common enemy to unite all strands into a strong rope.

jedrek

4 hours ago

Also, CERN does not have a profit motive.

How much good work have the people reading this thread had to trash because it didn't align with Q3 OKRs? How much time and energy did they put into garbage solutions because they had to hit a KPI by the last day of June?

bayindirh

2 hours ago

> Also, CERN does not have a profit motive.

This is a great point. We work with CERN on a project, and we're all volunteers, but we work on something we need, and contribute back to it wholeheartedly.

At the end of the day, CERN wants to give away project governance to someone else in the group, because they don't want to be the BDFL of anything they help creating. It allows them to pursue new and shiny things and build them.

There's an air consisting of "it's done when it's done", and "we need this, so we should build this without watering it down", so projects move at a steady pace, but the code and product is always as high quality as possible.

niemandhier

4 hours ago

CERN buddy of mine suggested that exposing a colony of physicists to elevated ambient levels of helium would trigger excessive infrastructure building behavior.

quailfarmer

5 hours ago

That’s a great observation, and I think generally correct, but there are private companies where that sort of motivation exists, for basically the same reason

guappa

2 hours ago

Then they get bought by some megacorp which kills the motivation.

lokimedes

7 hours ago

Also, the in-kind contributions from hundreds of institutes around the world. Much can, and has, been said about physicist code, but CERN is the center of a massive community of “pre-dropout” geniuses. I can’t count the number of former students that later joined Google and the likes. Many are frequenting HN.

adev_

6 hours ago

CERN was a good example of how much can be done with how little when you have the right people.

For a long time, the entire Linux distribution (Scientific Linux) used for ~15K collaborators, the infra and the grid computing was managed by a team of around 4-5 people.

The teams managing the network access (LanDB), the distributed computing system, the scientific framework (ROOT) and the storage are also small, dedicated skilled teams.

And the result speaks for itself.

Unfortunately, most of that went to shit quite recently when they replaced the previous head of IT by a Microsoft fanboy/girl coming from outside of the scientific environment. The first thing he/she did was to force Microsoft bloatware everywhere to replace existing working OSS solutions.

wuming2

4 hours ago

> Unfortunately, most of that went to shit quite recently when they replaced the previous head of IT by a Microsoft fanboy(girl?) coming from outside of the scientific environment.

Painful to read so I did a short check. From a news post I don’t want to link here, but easily found searching “CERN, the famous scientific lab where the web was born, tells us why it's ditching Microsoft and helping others do the same”, direction taken in 2019 seemed quite the opposite. I am not sure how current head of IT at CERN, Enrica Porcari, fits in to the story. Insider info will be appreciated.

adev_

4 hours ago

> direction taken in 2019 seemed quite the opposite

The head of IT changed in 2021 if it answers your question.

wuming2

4 hours ago

Don’t see any previous experience at Microsoft [2]. Just a self taught fan then?

Edit: “Partnership is the art of understanding shared value. In WFP we have a number of partnerships, not many, but the ones that we have are deep, are sustained, are long-term. And definitely UNICC is one of them. Enrica Porcari, Chief Information Officer and Director Technology Division at the WFP” [1]

United Nations International Computing Centre (UNICC) is a Microsoft shop. Legit to assume, if OP statement holds true, she got the business sponsorship going while CIO at the World Food Program (WFP).

This kind of attempted executive takeover is always the strategy of a team. Who sponsored and voted for her at CERN is the real person of interest.

1. https://www.unicc.org/our-values/what-makes-us-unique/

2. https://cgnet.com/blog/former-cgnet-employee-enrica-porcari-...

wuming2

2 hours ago

Joachim Mnich, Director for Research and Computing and her boss [4], holds the position also since January 2021 [1]. Mike Lamont, Director for Accelerators and Technology, also got the job at the same time [2]. Finally Fabiola Gianotti, Director-General, in 2019 extended her tenure for a second term “to start on 1st January 2021” [3].

So in 2019 the initiative to remove Microsoft began. With renewal and promotions taking in to effect it stopped. Interesting. Feeling a strong Microsoft US vs Munich DE vibe. With a twitch of IT.

1. https://home.cern/about/who-we-are/our-people/biographies/jo...

2. https://home.cern/about/who-we-are/our-people/biographies/mi...

3. https://home.cern/about/who-we-are/our-people/biographies/fa...

4. https://german-dac.web.cern.ch/sites/default/files/2022.01%2...

wuming2

7 minutes ago

“newly created CERN Venture Connect programme (CVC), launched in 2023 […] In establishing CVC, CERN’s Entrepreneurship team entered discussions with Microsoft, with the aim to better leverage the Microsoft for Startups Founders Hub“ [1].

Under the purview of Christopher Hartley, Director of Industry, Procurement & Knowledge Transfer (IPT) [2], Microsoft is gaining more footholds at CERN. Won’t be too far fetched to consider Mr Hartley and Ms Porcari as working together to achieve some sort of common good.

1. https://home.cern/news/news/knowledge-sharing/journey-cern-e...

2. https://german-dac.web.cern.ch/sites/default/files/2022.01%2...

dguest

3 hours ago

There was a huge initiative at CERN to move to non-MS products.

It was great actually: suddenly we were leaving behind a bunch of bloated MS cruft and working with nice stuff. As someone working at CERN I was really inspired, not just by the support for open source but by how well it all worked.

Then next thing I knew we were doubling down on MS stuff. I don't know what happened. It was sad though, and the user experience did not improve in the end.

I'm not close enough to CERN-IT to know the details. But for what it's worth, no one I knew in IT could think of a good reason for going back.

amelius

4 hours ago

> Cerns budget is 1.4 billion Euro

Kind of weird that a company like Uber has a valuation of $150 billion Euro.

dguest

3 hours ago

Most of the people who make CERN work aren't working for CERN. The IT department is under CERN, but there are many thousands of "users" who don't get payed by CERN at all. Quite a lot of the fabrication and most of the physics analysis is done by national labs and universities around the world.

elashri

3 hours ago

CERN budget on experiment level is being paid mostly by contributions from the institutions that is part of this experiment. I am talking about operation, R&D and this would also include personnel contributions to different aspect. There is also service work that each one of the users must do beside doing physics. I am for example work on software development stack beside my current physics analysis. Some of my colleagues working on hardware.

Then there are country level contributions that pays for CERN infrastructure and maintenance (and inter experiment stuff) and direct employees salaries.

dguest

13 minutes ago

The important point here is that (I believe) the 1.4 billion above doesn't account for all the work done directly by institutes. Institutes pay CERN, but they also channel government grants to fund a huge amount of work directly.

Most of the people I know who "worked at" CERN never got a pay check that said CERN on it.

yccs27

4 hours ago

Apples to oranges. Budget is per year, valuation is total.

A better comparison would be Uber's revenue of $37 billion in 2023.

amelius

3 hours ago

I don't see why it's Apples to oranges. Uber could pay for 150 CERN-years.

chmod775

3 hours ago

No, they could not.

Valuation is not money in the bank. It does not even represent an amount that is convertible to an equal amount of liquid currency.

It's a number that is hardly useful for anything and I'm tired of people cooking up all sorts of nonsense with it.

amelius

3 hours ago

Ok, maybe it's 75 CERN years or maybe even 10. The point still stands.

PS: Sorry if you got tired, but I'm tired of people explaining what valuation isn't when we're just talking orders of magnitude.

exe34

2 hours ago

it's only useful for getting loans that you'll pay back with a bigger loan. it's how rich people are always cash-poor but wealthy and live wealthy lifestyles.

gwervc

3 hours ago

How many people ordering a meal (often out of laziness) per day vs thinking and searching the mysteries of universe? Economically it makes sense that Uber generates a lot more of cash.

chrisandchris

2 hours ago

I think you misinterpreted that there shall be a correlation between _valuation_ and _earnings_. Ubers _first_ ever positive year was 2013, after 15 years in business [1] . Uber may be generating cash, but it's also loosing (lost) cash a lot faster than it was generating it. By taking 2013 as reference (~2 billion), it needs another 5 of those years just to recover from its losses in 2012 (9 billion). I understand the economics behind it, but its valuation is way out of reality.

[1] https://www.theverge.com/2024/2/8/24065999/uber-earnings-pro...

dauertewigkeit

3 hours ago

Good hiring managers can find the hidden gems. These are typically people who don't have the resume to join FAANG immediately, due to lacking the pedigree, but who have lots of potential. Also these same people typically don't last long because they do eventually move on.

Also it helps that Europe is so behind in tech that if you want to do some cutting edge tech you are almost forced to join a public institution because private ones are not doing anything exciting.

guappa

2 hours ago

Because doing the millionth CRUD in USA is very exciting?

wvh

an hour ago

One wonders if things win because they really are better, or because there's sufficient financial momentum behind them. I have worked in the public sector for some years, and I don't think Europe is behind, just that the budgets are a lot smaller. If you want to capture a lot of people in an ecosystem or walled garden, you're going to need money, and lots of it. For all that's good and bad about it, most of that excess is concentrated in the US, in a few hotspots. No need to get distracted and put a flag on somebody like a Zuckerberg or Jobs or Gates though.

rob_c

6 hours ago

Yes, but that still covers infrastructure (cables) and a lot of equipment for the experiments including but not limited to massive storage and tape backup, distributed local compute, and local cluster management all with users busy trying to pummel it with the latest and greatest ideas of how they can use it faster and better... Not to mention specialist software and licences. 50M doesn't go that far when you factor all of this in

udev4096

7 hours ago

This is fascinating. How are they managing or even taking backup for this gigantic storage?

ephimetheus

7 hours ago

For experiment data, there is a layer on top of all of this that distributes datasets across the computing grid. That system has a way to handle replicate at the dataset level.

rob_c

6 hours ago

Tape and off-site replicas at globally distributed data centres for science. Of the 1EB a huge amount of that is probably in automated recall and replication with "users" running staged processing of the data at different sites ultimately with data being reduced to "manageable" GB-TB level for scientists to do science

fnands

3 hours ago

Yup, lots of tape for stuff in cold storage, and then some subset of that on disk spread out over several sites.

It's kinda interesting to watch anything by Alberto Pace, the head of storage at CERN to get an understanding of the challenges and constraints: https://www.youtube.com/watch?v=ym2am-FumXQ

I was basically on the helpdesk for the system for a few years so had to spend a fair amount of time helping people replicate data from one place to another, or from tape onto disk.

qwertox

8 hours ago

IIRC I had issues with inotify when I was editing files on a remote machine via SSHFS, when these files were being used inside a Docker container. inotify inside the container did not trigger the notifications, whereas it did, when editing a file with an editor directly on that host.

I think this was related to FUSE, that Docker just didn't get notified.

a-dub

8 hours ago

does modern fuse still context switch too much or does it now use io_uring or similar?

mappu

7 hours ago

FUSE over io_uring is still WIP: https://lwn.net/Articles/988186/

FUSE Passthrough landed in kernel 6.9, which also reduces context switching in some cases: https://www.phoronix.com/news/Linux-6.9-FUSE-Passthrough . The benchmarks in this article are pretty damning for regular FUSE.

Dwedit

4 hours ago

FUSE Passthrough is only useful for filesystems that wrap an existing filesystem, such as union mounts. Otherwise, you don't have an open file to hand over.

a-dub

6 hours ago

yeah but still not great for metadata operations, no?

i remember it was really not great for large sets of search paths because it defeated the kernel's built-in metadata caches with excessive context switching?

Dwedit

8 hours ago

Last I read about FUSE, adding a 128KB read-ahead buffer drastically reduced context switching.

jgalt212

2 hours ago

I'm convinced CERN could greatly benefit from "middle out".

synicalx

9 hours ago

1EB with only 30k users, thats a wild TB-per-user ratio. My frame of reference; the largest storage platform I've ever worked on was a combined ~60PB (give or take) and that had hundreds of millions of users.

shric

7 hours ago

My frame of reference; the largest storage platform I've ever worked on was a combined ~tens of EB (give or take) and that had over a billion users.

chipdart

7 hours ago

Most humans don't handle sensor and simulation data for a living, though. CERN just so happens to employ thousands who do that for a living.

hackernewds

8 hours ago

That's the scale of the universe, compared to data generated by humans

maybeben

11 hours ago

i mean, they also have one of the largest ceph deployments. anything is scalable with no budget.

pas

10 hours ago

slide 22 states that the cost is 1 CHF/TB/month (on 10+2 erasure coded disks), though it would be interesting to do a breakdown of costs (development, hardware, maintenance, datacenter, servicing, management, etc..)

pclmulqdq

8 hours ago

1 CHF/TB/month is a bit expensive for storage at that scale, so it would definitely be interesting to see what they're spending the money on and what they are (and aren't) counting in that price.

rob_c

6 hours ago

Tape backup, accessibility, networking, availability... At 1CHF/TB that's a lot better than my local university still charging >100x that for such services internally

hackernewds

8 hours ago

No budget often tags along with no accountability

hi-v-rocknroll

9 hours ago

They probably consume Panasas, IBM, DDN, and BeeGFS gear and licensing too.

adev_

6 hours ago

Nop.

Most internal data is spread between Ceph and home-made distributed storage system named EOS (https://indico.cern.ch/event/138478/contributions/149912/att...) running over commodity hardware.

The only commerical-backed storage system is the long term storage tape system. Still it has an home-made overlay API over it to interface with the rest of the systems.

rob_c

6 hours ago

Good god no. Nowhere near anything so crass. CEPH and EOS all the way

InDubioProRubio

5 hours ago

The things you can build when everyone is a rockstar :D