The future of software engineering is SRE

108 pointsposted 11 hours ago
by Swizec

51 Comments

solatic

2 hours ago

I think there's two kinds of software-producing-organizations:

There's the small shops where you're running some kind of monolith generally open to the Internet, maybe you have a database hooked up to it. These shops do not need dedicated DevOps/SRE. Throw it into a container platform (e.g. AWS ECS/Fargate, GCP Cloud Run, fly.io, the market is broad enough that it's basically getting commoditized), hook up observability/alerting, maybe pay a consultant to review it and make sure you didn't do anything stupid. Then just pay the bill every month, and don't over-think it.

Then you have large shops: the ones where you're running at the scale where the cost premium of container platforms is higher than the salary of an engineer to move you off it, the ones where you have to figure out how to get the systems from different companies pre-M&A to talk to each other, where you have N development teams organizationally far away from the sales and legal teams signing SLAs yet need to be constrained by said SLAs, where you have some system that was architected to handle X scale and the business has now sold 100X and you have to figure out what band-aids to throw at the failing system while telling the devs they need to re-architect, where you need to build your Alertmanager routing tree configuration dynamically because YAML is garbage and the routing rules change based on whether or not SRE decided to return the pager, plus ensuring that devs have the ability to self-service create new services, plus progressive rollout of new alerts across the organization, etc., so even Alertmanager config needs to be owned by an engineer.

I really can't imagine LLMs replacing SREs in large shops. SREs debugging production outages to find a proximate "root" technical cause is a small fraction of the SRE function.

augusteo

7 hours ago

stackskipton makes a good point about authority. SRE works at Google because SREs can block launches and demand fixes. Without that organizational power, you're just an on-call engineer who also writes tooling.

The article's premise (AI makes code cheap, so operations becomes the differentiator) has some truth to it. But I'd frame it differently: the bottleneck was never really "writing code." It was understanding what to build and keeping it running. AI helps with one of those. Maybe.

nasretdinov

29 minutes ago

> because SREs can block launches and demand fixes

I didn't find that particularly true during my tenure, but obviously Google is huge, so there probably exist teams that actually can afford to behave this way...

pcj-github

3 hours ago

If the agent swarm is collectively smarter and better than the SRE, they'll be replaced just like other types of workers. There is no domain that has special protection.

bronlund

3 hours ago

My thoughts exactly. This is just some guy grasping at straws before he understands that he will have to bow to our new overlords sooner or later.

Edit: Or maybe he is fully aware and just need to push some books before it's too late.

measurablefunc

3 hours ago

What about C-suite executives & shareholders? Are they safe from automation?

bjt12345

3 hours ago

The thing about C-suite executives is they usually have short tenures, however the management levels below them are often cozy in their bureaucracy, resist change, often trying to outlast the new management.

I actually argue that AI will therefore impact these levels of management the most.

Think about it, if you were employed as a transformational CEO would you risk trying to fight existing managers or just replace them with AI?

joe_mamba

2 hours ago

>I actually argue that AI will therefore impact these levels of management the most.

Not AI but bad economy and mass layoffs tend to wipe out management positions the most. As a decent IC, in case of layoffs in bad economy, you'll always find some place to work at if you're flexible with location and salary because everyone still needs people who know how to actually build shit, but nobody needs to add more managers in their ranks to consume payroll and add no value.

bjt12345

2 hours ago

A lot of large companies lay off swags of technical staff regularly (or watch them leave), and rotate CEOs but their middle management have jobs for life - as the Peter Principe states, they are promoted to their highest respective incompetence and stay there because no CEO has time to replace them.

AI will transform this.

joe_mamba

2 hours ago

Disagree with the "jobs for life" part for management. Only managers who are there thanks to connection, nepotism or cronyism, are there for life as long as those shielding them also stay in place. THose who got in or got promoted to management meritocratically don't have that protection and are the first to be let go.

At all large MNCs I worked at, management got hired and fired mostly on their (or lack thereof) connections and less on what they actually did. Once they got let go, they had near impossible time finding another management position elsewhere without connections in other places.

mraza007

2 hours ago

This is so true Especially with middle managers they are they the ones that are hit the hardest

joe_mamba

an hour ago

Yes I was talking about middle managers mostly. Upper management, C-suite, execs are mostly protected from firing unless they F-up big time like sexual assault, hate speech, etc.

rcbdev

16 minutes ago

Yes. The AI cannot be the child/other type of beneficiary of a well-connected person, yet.

p_v_doom

2 hours ago

Generally yes. The more power one holds in an organization the more safe they are from automation.

vkou

an hour ago

Automating away shareholders can't come soon enough.

vjvjvjvjghv

an hour ago

The make the decisions so I doubt they will soon themselves to be automated away. Their main risk will be that nobody can buy their products once everything is automated.

I wonder if capitalism and democracy will be just a short chapter in history that will be replaced by something else. Autocratic governments seem to be the most prevalent form of government in history.

silisili

2 hours ago

I was an old school SRE before the days of containerization and such. Today, we have one who is a YAML wizard and I won't even pretend to begin to understand the entire architecture between all the moving pieces(kube, flux, helm, etc).

That said, Claude has absolutely no problem not only answering questions, but finding bugs and adding new features to it.

In short, I feel they're just as screwed as us devs.

Sparkyte

6 hours ago

As an SRE I can tell you AI can't do everything. I have done a little software development, even AI can't do everything. What we are likely to see is operational engineering become the consolidated role between the two. Knows enough about software development and knows enough about site reliability... blamo operational engineer.

adelmotsjr

8 hours ago

For those who were oblivious to what SRE means, just like me: SRE os _site reliability engineering_

F7F7F7

6 hours ago

I knew what an SRE was and found the article somewhat interesting with a slightly novel (throwaway), more realistic take, on the "why need Salesforce when you can vibe your own Salesforce convo."

But not defining what an SRE is feels like a glaring, almost suffocating, omission.

joshuaisaact

3 hours ago

Couldn't disagree with this article more. I think the future of software engineering is more T-shaped.

Look at the 'Product Engineer' roles we are seeing spreading in forward-thinking startups and scaleups.

That's the future of SWE I think. SWEs take on more PM and design responsibilities as part of the existing role.

reeredfdfdf

an hour ago

I agree. In many cases it's probably easier for a developer to become more of a product person, than for a product person to become a dev. Even with LLM's you still need to have some technical skills & be able to read code to handle technical tasks effectively.

Of course things might look different when the product is something that requires really deep domain knowledge.

zahlman

3 hours ago

> And you definitely don't care how a payments network point of sale terminal and your bank talk to each other... Good software is invisible.

> ...

> Are you keeping up with security updates? Will you leak all my data? Do I trust you? Can I rely on you?

IMO, if the answers to those questions matter to you, then you damn well should care how it works. Because even if you aren't sufficiently technically minded to audit the system, having someone be able to describe it to you coherently is an important starting point in building that trust and having reason to believe that security and privacy will work as advertised.

ivan_gammel

2 hours ago

Operational excellency was always part of the job, regardless of what fancy term described it, be it DevOps, SRE or something else. The future of software engineering is software engineering, with emphasis on engineering.

stackskipton

7 hours ago

As someone who works in Ops role (SRE/DevOps/Sysadmin), SREs are something that only works at Google mainly because for Devs to do SRE, they need ability to reject or demand code fixes which means you need someone being a prompt engineer who needs to understand the code and now they back to being developer.

As for more dedicated to Ops side, it's garbage in, garbage out. I've already had too many outages caused by AI Slop being fed into production, calling all Developers = SRE won't change the fact that AI can't program now without massive experienced people controlling it.

bionsystem

4 hours ago

Most devs can't do SRE, in fact the best devs I've met know they can't do SRE (and vice versa). If I may get a bit philosophical, SRE must be conservative by nature and I feel that devs are often innovative by nature. Another argument is that they simply focus on different problems. One sets up an IDE and clicks play, has some ephemeral devcontainer environment that "just works", and the hard part is to craft the software. The other has the software ready and sometimes very few instructions on how to run it, + your typical production issues, security, scaling, etc. The brain of each gets wired differently over time to solve those very different issues effectively.

zinodaur

3 hours ago

I don’t understand this take - if all engineers go on call, they learn real quick what happens when their coworkers are too innovative. It is a good feedback loop that teaches them not to make unreliable software.

SREs are great when the problem is “the network is down” or “kubernetes won’t run my pods”, but expecting a random engineer to know all the failure modes of software they didn’t build and don’t have context on never seems to work out well.

rincebrain

4 hours ago

It's possible to do both, you just need to be cognizant of what you're doing in both positions.

A tricky part becomes when you don't have both roles for something, like SRE-developed tools that are maintained by the ones writing them, and you need to strike the balance yourselves until/unless you wind up with that split. If you're not aware of both hats and juggling wearing them intentionally, in that case, you can wind up with tools out of SRE that are worse than any SWE-only tool might ever be, because the SREs sometimes think they won't make the same mistakes, but all the same feature-focused things apply for SRE-written tools too...

chubot

4 hours ago

Yeah, I think that when writing code becomes cheap, then all the COMPLEMENTS become more valuable:

    - testing
    - reviewing, and reading/understanding/explaining
    - operations / SRE

nbevans

2 hours ago

Surely SRE is just a .md file like everything else? :upside-down-face:

willtemperley

4 hours ago

This may be true about SaaS. Not all software is SaaS, thankfully.

alexgotoi

2 hours ago

There were several cheaper than programmers options to automate things, Robot Processing Automation being probably the most known, but it never get the expected traction.

Why (imo)? Senior leaders still like to say: I run a 500 headcount finance EMEA organization for Siemens, I am the Chief People Officer of Meta anf I lead an org of 1000 smart HR pros. Most of their status is still tight to the org headcount.

almosthere

7 hours ago

Until you find out there are 40 - 80 startups writing agents in the SRE space :/

ozim

2 hours ago

Basically that’s what people are doing with YOLO mode letting Claude do everything in the system.

Nextgrid

7 hours ago

It only matters if any of those can promise reliability and either put their own money where their mouth is or convince (and actually get them to pay up) a bigger player to insure them.

Ultimately hardware, software, QA, etc is all about delivering a system that produces certain outputs for certain inputs, with certain penalties if it doesn’t. If you can, great, if you can’t, good luck. Whether you achieve the “can” with human development or LLM is of little concern as long as you can pay out the penalties of “can’t”.

ikiris

4 hours ago

And I wish them luck, because the thought of current ai bots doing SRE work effectively is laughable.

deadbabe

6 hours ago

CRE - Code Reliability Engineering

AI will not get much better than what we have today, and what we have today is not enough to totally transform software engineering. It is a little easier to be a software engineer now, but that’s it. You can still fuck everything up.

falcor84

6 hours ago

> AI will not get much better than what we have today

Wow, where did this come from?

From what just comes to my mind based on recent research, I'd expect at least the following this or next year:

* Continuous learning via an architectural change like Titans or TTT-E2E.

* Advancement in World Models (many labs focusing on them now)

* Longer-running agentic systems, with Gas Town being a recent proof of concept.

* Advances in computer and browser usage - tons of money being poured into this, and RL with self-play is straightforward

* AI integration into robotics, especially when coupled with world models

jayd16

3 hours ago

What does robotics have to do with writing better code? Is this just a random AI wishlist?

giancarlostoro

7 hours ago

What? Maybe OPs future. SWE is just going to replace QA and maybe architects if the industry adopts AI more, but there's a lot of hold outs. There's plenty of projects out there that are 'boring' and will not bother.

hahahahhaah

4 hours ago

Operational excellence will always be needed but part of that is writing good code. If the slop machine has made bad decisions it could be more efficient to rewrite using human expertise and deploy that.

dionian

5 hours ago

But there is bad code and good code and SREs cant tell you which is which, nor fix it.

bionsystem

4 hours ago

My take (I'm an SRE) is that SRE should work pre-emptively to provide reproducible prod-like environments so that QA can test DEV code closer to real-life conditions. Most prod platforms I've seen are nowhere near that level of automation, which makes it really hard to detect or even reproduce production issues.

And no, as an SRE I won't read DEV code, but I can help my team test it.

dmoy

2 hours ago

> And no, as an SRE I won't read DEV code, but I can help my team test it.

I mean to each their own. Sometimes if I catch a page and the rabbit hole leads to the devs code, I look under the covers.

And sometimes it's a bug I can identify and fix pretty quickly. Sometimes faster than the dev team because I just saw another dev team make the same mistake a month prior.

You gotta know when to cut your losses and stop searching the rabbit hole though, that's true.

bionsystem

an hour ago

I agree with your nuance, but that's not my default mode, unless I know the language and the domain well I am not going to write an MR. I'm going to read the stack trace to see it it's a conf issue though.

VirusNewbie

3 hours ago

Why not? I'm a SWE SRE and I'm arguably better at telling good code from bad code than many of the pure devs I've worked with.

ks2048

6 hours ago

This says nothing about how if AI can write software, AI cannot do these other things.