bilekas
16 hours ago
I'm so surprised there is so much pushback against this.. AWS is extremely expensive. The use cases for setting up your system or service entirely in AWS are more rare than people seem to realise. Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?
> We have 730+ days with 99.993% measured availability and we also escaped AWS region wide downtime that happened a week ago.
This is a very nice brag. Given they are using their ddos protection ingress via CloudFlare there is that dependancy, but in that case I can 100% agree than DNS and ingress can absolutely be a full time job. Running some microservices and a database absolutely is not. If your teams are constantly monitoring and adjusting them such as scaling, then the problem is the design. Not the hosting.
Unless you're a small company serving up billions of heavy requests an hour, I would put money on the bet AWS is overcharging you.
fulafel
16 hours ago
The direct cost is the easy part. The more insidious part is that you're now cultivating a growing staff of technologists whose careers depend on doing things the AWS way, getting AWS certified to ensure they build your systems the AWS Well Architected Way instead of thinking themselves, and can upsell you on AWS lock-in solutions using AWS provided soundbites and sales arguments.
("Shall we make the app very resilient to failure? Yes running on multiple regions makes the AWS bill bigger but you'll get much fewer outages, look at all this technobabble that proves it")
And of course AWS lock-in services are priced to look cheaper compared to their overpricing of standard stuff[1] - if you just spend the engineering effort and IaC coding effort to move onto them, this "savings" can be put to more AWS cloud engineering effort which again makes your cloud eng org bigger and more important.
[1] (For example implementing your app off containers to Lambda, or the db off PostgreSQL to DynamoDB etc)
Hilift
16 hours ago
> The direct cost is the easy part
I don't think it is easy. I see most organizations struggle with the fact that everything is throttled in the cloud. CPU, storage, network. Tenants often discover large amounts of activity they were previously unaware of, that contributes to the usage and cost. And there may be individuals or teams creating new usages that are grossly impacting their allocation. Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings.
Then you can start adding in the Cloud value, such as incomprehensible networking diagrams that are probably non-compliant in some way (guess which ones!), and security? What is it?
_the_inflator
9 hours ago
Yes. Cloud sellers new this: Happy path for this flagship project, the shinny new object, and some additional services. After the point of no return what usually happens is, that cloud will be a replica of bare metal development.
As an Computer Science dude and former C64/Amiga coder in Senior Management of a large international Bank, I saw first hand, how cost balloon simply due to the fact, that the bank recreates and replicates its bare metal environment in the cloud.
So increasing costs while nothing changed. Imagine that: fixed resources, no test environments, because virtualisation was out of the equation in the cloud due to policies and SDLC processes. And it goes on: releases on automation? Nope, request per email and attached scan of a paper document as sign-off.
Of course your can buy a Ferrari and use it as a farm tractor. I bet it is possible with a little modification here and there.
Another fact is, that lock in plays a huge role. Once you are in it, no matter what you subscribe to, magically everything slows suddenly down, a bit, but since I am a guy who uses a time tracker to test and monitor apps, I could easily draw a line even without utilizing my Math background: enforced throtelling.
There is a difference between 100, 300 and 500ms for SaaS websites - people without prior knowledge of peceptual psychology feel it but cannot but their finger in the wound. But since we are in the cloud, suddenly a cloud manager will offer you an speed upgrade - just catered for your needs! Here, have a trial period over 3 month for free and experience the difference for your business!
I am a bit of opinionated here and really suppose, that cloud metrics analysed the banks traffic and service usage to willingly slow it down in a way, only professionals could find out. Have you promised to be lightning fast in the first place? No, that's not what the contract says. We fed you with it, but a "normal" speed was agreed upon. It is like getting a Porsche as a rental car for free when you take your VW Beetle to the dealer for a checkup. Hooked, of course. A car is a car after all. How to boil a frog? Slowly.
Of course there will be more sales and this is achilles' heel for every business and indifferent customers - easy prey.
It is a vicious cycle, almost like taxation. You cannot hide from it, no escape and it is always on the rise.
franktankbank
4 hours ago
Ferrari actually makes tractors.
m-gasser
14 hours ago
> Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings.
Sounds interesting, which setting is that?
Hilift
8 hours ago
Multiple Active Result Sets (MARS). During large query responses or bulk loads, "full" packets cause an additional packet to be sent over the wire with about five bytes to hold the MARS "wrapper". The net result is one full packet, and one empty packet on the wire, alternating. The performance impact in LAN latency is negligible. However on higher latency between AWS and your premises it has a terrible performance impact.
MARS isn't strictly needed for most things. Some features that requires it are ORM (EF) proxies and lazy loading. If you need MARS, there are third party "accelerators" that workaround this madness.
"MARS Acceleration significantly improves the performance of connections that use the Multiple Active Result Sets (MARS) connection option."
https://documentation.nitrosphere.com/resources/release-note...
twodave
5 hours ago
Yeah, honestly most lazy loading and EF proxy use I have seen is more aptly named lazy coding instead. There are times when you might be running 3-4 queries to project some combination of them and want to do that in parallel, but in general if you have lazy loading enabled in EF you are holding up a sign that says “inconsistent reads happening in here”.
I use and love EF, but generally leave MARS off when possible because it is responsible for more trouble than performance gains nearly every time.
infecto
8 hours ago
Is that not a client connection flag? MARS does not require a setting change on the server?
anonymars
7 hours ago
I think you may have misinterpreted what he said. I can see why it seems to imply a server setting but that isn't the case
> Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings
infecto
6 hours ago
You are right. For some reason when I initially sped through the post I read it as if RDS was doing something wrong.
infecto
14 hours ago
Would love to know as well.
vidarh
15 hours ago
I was about to rage at you over the first sentence, because this is so often how people start trying to argue bare metal setups are expensive. But after reading the rest: 100% this. I see so many people push AWS setups not because it's the best thing - it can be if you're not cost sensitive - but because it is what they know and they push what they know instead of evaluating the actual requirements.
kelnos
17 minutes ago
> people push AWS setups not because it's the best thing - it can be if you're not cost sensitive
This is so weird to me, because if you're running a company, you should be cost-sensitive. Sure, you might be willing to spend extra money on AWS in the very beginning if it helps you get to market faster. But after that, there's really no excuse: profit margin should be a very important consideration in how you run your infrastructure.
Of course, if you're VC backed, maybe that doesn't matter... that kind of company seems to mainly care about user growth, regardless of how much money is being sent to the incinerator to get it.
hibikir
15 hours ago
Well, they aren't wrong about the bare metal either: Every organization ends up tied to their staff, and said staff was hired to work on the stack you are using. People end up in quite the fights because their supposed experts are more fond of uniformity and learning nothing new.
Many a company was stuck with a datacenter unit that was unresponsive to the company's needs, and people migrated to AWS to avoid dealing with them. This straight out happened in front of my eyes multiple times. At the same time, you also end up in AWS, or even within AWS, using tools that are extremely expensive, because the cost-benefit analysis for the individuals making the decision, who often don't know very much other than what they use right now, are just wrong for the company. The executive on top is often either not much of a technologist or 20 years out of date, so they have no way to discern the quality of their staff. Technical disagreements? They might only know who they like to hang out with, but that's where it ends.
So for path dependent reasons, companies end up making a lot of decisions that in retrospect seem very poor. In startups if often just kills the company. Just don't assume the error is always in one direction.
whstl
14 hours ago
Sure but I have seen the exact same thing happen with AWS.
In a large company I worked the Ops team that had the keys to AWS was taking literal months to push things to the cloud, causing problems with bonuses and promotions. Security measures were not in place so there were cyberattacks. Passwords of critical services lapsed because they were not paying attention.
At some point it got so bad that the entire team was demoted, lost privileges, and contractors had to jump in. The CTO was almost fired.
It took months to recover and even to get to an acceptable state, because nothing was really documented.
LPisGood
5 hours ago
I can’t believe the CTO wasn’t fired for that.
Edman274
14 hours ago
The entire value proposition of AWS vs running one's own server is basically this: is it easier to ask for permission, or forgiveness? You're asking for permission to get a million dollars worth of servers / hardware / power upgrades now, or you're asking for forgiveness for spending five million dollars in AWS after 10 months. Which will be easy: permission or forgiveness?
chasemp
2 hours ago
I had not thought of it this way, but interesting point. I have seen this as well.
infecto
14 hours ago
Your comment also jogged my memory of how terrible bare metal days used to be. I think now with containers it can be better but the other reason so many switched to cloud is we don’t need to think about buying the bare metal ahead of time. We don’t need to justify it to a DevOps gatekeeper.
vidarh
13 hours ago
That so many people remember bare metal as of 20+ years ago is a large part of the problem.
A modern server can be power cycled remotely, can be reinstalled remotely over networked media, can have its console streamed remotely, can have fans etc. checked remotely without access to the OS it's running etc. It's not very different from managing a cloud - any reasonable server hardware has management boards. Even if you rent space in a colo, most of the time you don't need to set foot there other than for an initial setup (and you can rent people to do that too).
But for most people, bare metal will tend to mean renting bare metal servers already configured anyway.
When the first thing you then tend to do is to deploy a container runtime and an orchestrator, you're effectively usually left with something more or less (depending on your needs) like a private cloud.
As for "buying ahead of time", most managed server providers and some colo operators also offer cloud services, so that even if you don't want to deal with a multi-provider setup, you can still generally scale into cloud instances as needed if your provider can't bring new hardware up fast enough (but many managed server providers can do that in less than a day too).
I never think about buying ahead of time. It hasn't been a thing I've had to worry about for a decade or more.
mmarq
9 hours ago
> A modern server can be power cycled remotely, can be reinstalled remotely over networked media, can have its console streamed remotely, can have fans etc. checked remotely without access to the OS it's running etc. It's not very different from managing a cloud - any reasonable server hardware has management boards. Even if you rent space in a colo, most of the time you don't need to set foot there other than for an initial setup (and you can rent people to do that too).
All of this was already possible 20 years ago, with iLO and DRAC cards.
vidarh
6 hours ago
Yes, that's true, but 20 years ago a large proportion of lower end servers people were familiar with didn't have anything like it, and so a whole lot even of developers who remember "pre-cloud" servers have never experienced servers with them.
infecto
10 hours ago
You are right but I just think people miss the history when we talk about moving to the cloud. It was not that long ago at a reasonable size Bay Area company, I would need to justify new metal to be provisioned to standup a service I was tasked with.
kelnos
10 minutes ago
That memory is part of the problem: it doesn't reflect today's reality. You can have an IT ops team that buys and sets up servers, and then sets up (perhaps) Kubernetes and a nice CI/CD pipeline on top of it. They can fairly easily bill individual teams for usage, and teams have to justify their costs, just like they (hopefully!) do in any sane org that's running in the cloud.
The bad old days of begging an IT ops person for a server, and then throwing a binary over the fence at them so they can grumble while they try to get it running safely in production... yeah, no, that doesn't have to be a thing anymore.
The "we" you speak of is the problem: if your org hires actual real sysadmins and operations people (not people who just want to run everything on AWS), then "you" don't have to worry about it.
dumbledoren
11 hours ago
The catch is that bare metal is SO cheap and performant that you can buy legions of it and have it lying around. And datacenters, their APIs and whatnot advanced so much that you can even have automations that automatically provision and set up your bare metal servers. With containers, it gets even better.
And, lets face it - arent you already overprovisioning on the cloud because you cant risk your users waiting 1-2 minutes until your new nodes and pods get up? So basically the 'autoscaling' of cloud has always been a myth.
baq
15 hours ago
> Many a company was stuck with a datacenter unit that was unresponsive to the company's needs
I'd like to +1 here - it's an understated risk if you've got datacenter-scale workloads. But! You can host a lot of compute on a couple racks nowadays, so IMHO it's a problem only if you're too successful and get complacent. In the datacenter, creative destruction is a must and crucially finance must be made to understand this, or they'll give you budget targets which can only mean ossification.
alemanek
11 hours ago
In orgs I have seen this it is usually a symptom of the data center unit being starved of resources. It’s like they have only been given the choice of on prem but ridiculous paperwork and long lead times or pay 20x for cloud.
Like can’t we just give the data center org more money and they can over provision hardware. Or can we not have them use that extra money to rent servers from OVH/Hetzner during the discovery phase to keep things going while we are waiting on things to get sized or arrive?
Aeolun
5 hours ago
I feel like companies are unreasonably afraid of cost up front, never mind that they’re going to pay more for cloud over the next 6 months, spending 6x monthly cloud cost on a single server makes them hesitate.
It’s how they always refuse to spend half my monthly salary on the computer I work on, and instead insist I use an underpowered windows machine.
jjmarr
an hour ago
opex vs capex.
LPisGood
5 hours ago
The problem is if you over-provision and buy 2x as many resources as you need, this looks bad from a utilization standpoint. If you buy 2x as expensive cloud solutions and “auto scale” you will have a much higher utilization for the same coat.
dumbledoren
6 hours ago
> Or can we not have them use that extra money to rent servers from OVH/Hetzner
Or just use Hetzner for major performance at low cost... Their apis and stuff make it look like its your datacenter.
vidarh
14 hours ago
It's simple enough to hire people with experience with both, or pay someone else to do it for you. These skills aren't that hard to find.
If you hire people that are not responsive to your needs, then, sure, that is a problem that will be a problem irrespective of what their pet stack is.
embedding-shape
14 hours ago
> said staff was hired to work on the stack you are using
Looking back at doing various hiring decisions at various levels of organizations, this is probably the single biggest mistake I've done multiple times, hiring specific people using specific technology because we were specifically using that.
You'll end up with a team unwilling to change, because "you hired me for this, even if it's best for the business with something else, this is what I do".
Once I and the organizations shifted our mindset to hiring people who are more flexible, even if they have expertise in one or two specific technologies, they won't put their head in the sand whenever changes come up, and everything became a lot easier.
vidarh
12 hours ago
Exactly. If someone has "Cloud Engineer" in the headline of their resume instead of "Devops Engineer" it's already warning and worth probing. If someone has "AWS|VMWare Engineer" in their bio, it's a giant red flag to me. Sometimes it's people just being aware where they'll find demand, but often it's indicative of someone who will push their pet stack - and it doesn't matter if it's VMWare on-prem or AWS (both purely as examples; it doesn't matter which specific tech it is), it's equally bad if they identify with a specific stack irrespective of what the stack is.
I'll also tend to look closely at whether people have "gotten stuck" specialising in a single stack. It won't make me turn them down, but it will make me ask extra questions to determine how open they are to alternatives when suitable.
torginus
14 hours ago
The weird thing is I'm old enough to have grown up in the pre-cloud world, and most of the stuff, like file servers, proxies, dbs, etc. isn't any more difficult to set up than AWS stuff, it's just that the skills are different
Also there's a mindset difference - if I gave you a server with 32 cores you wouldn't design a microservice system on it, would you? After all there's nowhere to scale to.
But with AWS, you're sold the story of infinite compute you can just expect to be there, but you'll quickly find out just how stingy they can get with giving you more hardware automatically to scale to.
I don't dislike AWS, but I feel this promise of false abundance has driven the growth in complexity and resource use of the backend.
Reality tends to be you hit a bottleneck you have a hard time optimizing away - the more complex your architecture, the harder it is, then you can stew.
vidarh
11 hours ago
> But with AWS, you're sold the story of infinite compute you can just expect to be there, but you'll quickly find out just how stingy they can get with giving you more hardware automatically to scale to.
This is key.
Most people never scale to a size where they hit that limit, and in most organisations where that happens, someone else have to deal with it, and so most developers are totally unaware of just how fictional the "infinite scalability" actually is.
Yet it gets touted as a critical advantage.
At the same time, most developers have never ever tried to manage modern server harware, and seem think it is somehwat like managing the hardware they're using at home.
torginus
6 hours ago
But that limit is well below on what you could get even in a gaming machine (AWS cpus are SMT threads, so a 32 core machine is actually 64 cpus by AWS) - you can get that in a high end workstation, and I'd guess that's way more power than most people end up using even in their large-ish scale AWS projects.
Aeolun
5 hours ago
> AWS cpus are SMT threads
Not on the AMD machines from m7 (and the others which share the same architecture)
ApolloFortyNine
11 hours ago
>I see so many people push AWS setups not because it's the best thing - it can be if you're not cost sensitive - but because it is what they know and they push what they know instead of evaluating the actual requirements.
I kinda feel like this argument could be used against programming in essentially any language. Your company, or you yourself, likely chose to develop using (whatever language it is) because that's what you knew and what your developers knew. Maybe it would have been some percentage more efficient to use another language, but then you and everyone else has to learn it.
It's the same with the cloud vs bare metal, though at least in the cloud, if your using the right services, if someone asked you tomorrow to scale 100x you likely could during the workday.
And generally speaking if your problem is at a scale where baremetal is trivial to implement, its likely we're only taking about a few hundred dollars a month being 'wasted' in AWS. Which is nothing to most companies, especially when they'd have to consider developer/devops time.
vidarh
11 hours ago
> if someone asked you tomorrow to scale 100x you likely could during the workday.
I've never seen a cloud setup where that was true.
For starters: Most cloud providers will impose limits on you that often means going 100x would involve pleading with account managers to have limits lifted and/or scrounding a new, previously untested, combination of instance sizes.
But secondly, you'll tend to run into unknown bottlenecks long before that.
And so, in fact, if that is a thing you actually want to be able to do, you need to actually test it.
But it's also generally not a real problem. I more often come across the opposite: Customers who've gotten hit with a crazy bill because of a problem rather than real use.
But it's also easy enough to set up a hybrid setup that will spin up cloud instances if/when you have a genuine need to be able to scale up faster than you can provision new bare metal instances. You'll typically run an orchestrator and run everything in containers on a bare metal setup too, so typically it only requires having an auto-scaling group scaled down to 0, and warm it up if load nears critical level on your bare metal environment, and then flip a switch in your load balancer to start directing traffic there. It's not a complicated thing to do.
Now, incidentally, your bare metal setup is even cheaper because you can get away with a higher load factor when you can scale into cloud to take spikes.
> And generally speaking if your problem is at a scale where baremetal is trivial to implement, its likely we're only taking about a few hundred dollars a month being 'wasted' in AWS. Which is nothing to most companies, especially when they'd have to consider developer/devops time.
Generally speaking, I only relatively rarely work on systems that cost less than in the tens of thousands per month and up, and what I consistently see with my customers is that the higher the cost, the bigger the bare-metal advantage tends to be as it allows you to readily amortise initial setup costs of more streamlined/advanced setups. The few places where cloud wins on cost is the very smallest systems, typically <$5k/month.
12_throw_away
9 hours ago
> if your using the right services, if someone asked you tomorrow to scale 100x you likely could during the workday.
"The right services" is I think doing a lot of work here. Which services specifically are you thinking of?
- S3? sure, 100x, 1000x, whatever, it doesn't care about your scale at all (your bill is another matter).
- Lambdas? On their own sure you can scale arbitrarily, but they don't really do anything unless they're connected to other stuff both upstream and downstream. Can those services manage 100x the load?
- Managed K8s? Managed DBs? EC2 instances? Really anything where you need to think about networking? Nope, you are not scaling this 100x without a LOT of planning and prep work.
vidarh
6 hours ago
> Nope, you are not scaling this 100x without a LOT of planning and prep work.
You're note getting 100x increase in instances without justifying it to your account manager, anyway, long before you figure out how to get it to work.
EC2 has limits on the number of instances you can request, and it certainly won't let you 100x unless you've done it before and already gone through the hassle to get them to raise your limits.
On top of that, it is not unusual to hit availability issues with less common instance types. Been there, done that, had to provision several different instance types to get enough.
raw_anon_1111
7 hours ago
I only work at companies that are using cloud because I hate administering systems and I hate dealing with system administrators when I need resources.
torginus
14 hours ago
Unfortunately it's not, and it gets more difficult the more cloud-y your app gets.
You can pay for EC2+EBS+network costs, or you can have a fancy cloud native solution where you pay for Lambda, ALBs, CloudWatch, Metrics, Secret Manager, (things you assume they would just give you, like if you eat at a restaurant, you probably won't expect to pay for the parking, toilet, or paying rent for the table and seats).
So cloud billing is its own science and art - and in most orgs devs don't even know how much the stuff they're building costs, until finance people start complaining about the monthly bills.
jmaker
12 hours ago
We run regular FinOps meetings within departments, so everyone’s aware. I think everyone should. But it’s a lot of overhead of course. So a dev is concerned not only with DevOps anymore but with DevSecFinOps. Not everyone can cope with so many aspects at once. There’s a lot of complexity creep in that.
torginus
11 hours ago
Yeah, AWS has the billing panel, that's where I usually discover that after I make a rough estimate on how much the thing I'm building should cost by studying the relevant tables, I end up with stuff costing twice as much, because on top of the expected items there's always a ton of miscellaneous stuff I never thought about.
UltraSane
12 hours ago
I have Claude, ChatGPT, and Gemini analyze our AWS bills and usage metrics once a month and they are surprisingly good at finding savings.
jmaker
13 hours ago
It’s a marketing trap. But also a job guarantee since everyone’s in the same trap. You got a couple cloud engineers or "DevOps" that lobby for AWS or any other hyperscaler, NaiveDate managers that write down some decision report littered with logical fallacies, and a few years in the sink cost is so high you can’t get off of it, and instead of doing productivity work you’re sitting in myriads of FinOps meetings, where even fewer understand what’s going on.
Engineering mangers are promised cost savings on the HR level. Corporate finance managers are promised OpEx for CapEx trade-off, the books look better immediately. Cloud engineers are embarking on their AWS journey of certification being promised an uptick to their salaries. It’s a win/win for everyone, in isolation, a local optimum for everyone, but the organization now has to pay way more than it—hypothetically—would have been paying for bare metal ops. And hypothetical arguments are futile.
And it lends itself well to overengineering and the microservices cargo cult. Your company ends up with a system distributed around the globe across multiple AZs per region of business operations, striving to shave off those 100ms latency off your clients’ RTT. But it’s outgrown your comprehension, and it’s slow anyway, and you can’t scale up because it’s expensive. And instead of having one problem, you now have 99 and your bill is one.
geodel
8 hours ago
All great points. I have seen in company of smart people CIO/CTO would freely up admit "Look we know cloud may not be cheap or easier to manage but this is the direction we have taken since we are getting out of owning or managing hardware/datacenter"
So it is not like one can dazzle decision makers with any logic or hard data. They are just announcing the decision while calling it a robust discussion over pros and cons of on-prem vs cloud placement.
jmaker
7 hours ago
Yep. I’ve also seen managerial people worship AWS sales reps as oracles, misconstruing ordinary sales meetings with them as something divine, in which they would disclose a lot of company’s IP in awe for them, just to listen to some blabbing superficial truisms. I mean, ChatGPT could tell you more. To add insult to that, the managerial people wouldn’t listen to their own senior, staff, principal engineers, and prefer to follow what the AWS reps told them.
It’s really disturbing how the human factor controls decision making in corporations.
For my peace of mind, I chose a sane path - if the company as an entity decides to do AWS, I will do my best to meet its goals. I’ve got all Professional and Specialty certs. It’s the human nature. No purpose in tilting at windmills.
geodel
an hour ago
> For my peace of mind, I chose a sane path - if the company as an entity decides to do AWS, I will do my best to meet its goals...
Amen to that.
Any kind of performance improvement, monitoring work I did for my applications has met with indifference or derision from managers. Because only if I had put efforts in cloud migration we could be "Horizontal Pod Scaling" for performance and fully managed Datadog console for monitoring the services.
hinkley
13 hours ago
My last team decided to hand manage a Memcached cluster because it cost half as much as an unmanaged service versus AWS’s alternative. Don’t know how much we really saved versus opportunity cost on dev time though. But it’s close to negative.
jmaker
12 hours ago
One of the issues there is that pricing a managed service deprives your people or gaining extra experience. There’s a synergy over time, the more you manage yourself. But it’s totally justified to pick a managed service if it checks out for your budget. The problem I saw often emanate was bad decision making, bad opportunity cost estimation. In other words, there’s an opportunity cost to picking the managed service, too, and they offset each other more or less.
hinkley
9 hours ago
I wonder if there’s enough space for a Do Well By Doing Good company out there to provide a ladder from cheap self managed up to fully automated rolling upgrades.
Because it was mostly fine at first, but later we had some close calls when there were changes that needed to be made on the servers. By the time we managed to mess up our hand managed incremental restart process, we had several layers of cache and so accidentally wiping one didn’t murder our backend, but did throw enough alerts to cause a P2. And because we were doing manual bucketing of caches instead of consistent hashing we hit the OOMKiller a couple times while dialing in.
But at this point it was difficult to move back to managed.
This feels closest to digital ocean’s business model.
winrid
37 minutes ago
That should be extremely low maintenance...
hinkley
16 minutes ago
As long as we didn’t need to touch the machines, nearly zero. But you gotta touch them sometime.
anal_reactor
15 hours ago
My manager wants me to make this silly AWS certification.
Let me go on a tangent about trains. In Spain before you board a high-speed train you need to go though full security check, like on an airport. In all other EU countries you just show up and board, but in Spain there's the security check. The problem is that even though the security check is an expensive, inefficient theatre, just in case something does blow up, nobody wants to be the politician that removed the security check. There will be no reward for a politician that makes life marginally easier for lots of people, but there will be severe punishment for a politician that is involved in a potential terrorist attack, even if the chance of that happening is ridiculously small.
This is exactly why so many companies love to be balls deep into AWS ecosystem, even if it's expensive.
rsav
14 hours ago
Nobody gets fired for buying IB^H^H AWS
embedding-shape
14 hours ago
> In all other EU countries you just show up and board, but in Spain there's the security check
Just for curiosity's sake, did any other EU countries have any recent terrorist attacks involving bombs on trains in the capital, or is Spain so far alone with this experience?
gtr
14 hours ago
London had the tube bombings, but there is no security scanning there.
embedding-shape
13 hours ago
AFAIK, there is no security scanning on the metro/"tube" in Spain either, it's on the national train lines.
Edit: Also, after looking it up, it seems like London did add temporary security scanners at some locations in the wake of those bombings, although they weren't permanent.
Russia is the only other European country besides Spain that after train bombings added permanent security scanners. Belgium, France and a bunch of other countries have had train bombings, but none of them added permanent scanners like Spain or Russia did.
rixed
3 hours ago
Not true, France had this on the train to the Netherlands (Thalys) after some crazy attacked some passengers in that train. They also added electronic gates to most high speed trains in many large stations.
Notice how these inefficient processes create large, compact lines of passengers, which would made the casualties much worse in case of an actual bomb.
iberator
12 hours ago
Checkout Madrid 2004 terror attacks... So deadly that Spain left Afghanistan and Iraq afik.
embedding-shape
11 hours ago
That's exactly the event I was alluding to, good detective work :)
freetanga
13 hours ago
kleiba
14 hours ago
How does Spain deal with trains that come in from a neighboring country?
hedora
14 hours ago
The security check has nothing to do with protecting trains or passengers, so your question is irrelevant.
kleiba
14 hours ago
Thanks for letting me know that my question is irrelevant. Sorry for taking up your time.
snovv_crash
14 hours ago
French trains come in without any security checks.
mrits
14 hours ago
AWS doesn’t have to be expensive.
embedding-shape
14 hours ago
Sure, but you outgrow the free ("trial") resources in a blink, and then it starts being expensive compared to the alternatives.
kelnos
26 minutes ago
> when did people forget how to run a baremetal server ?
I don't think people have forgotten, but I think Amazon has done an amazingly excellent job of marketing and developer relations over the years, to the point that they've convinced most developers that doing your own thing is 1) a lot of expensive, specialized work, and 2) actively dangerous for your business, whether it's because of security, uptime, or some other ops bogeyman of the day.
Note that I said "developers". Most developers are not sysadmins or IT operations people. Most of them have never set up Linux on a desktop or laptop, let alone on real server hardware (that they've also set up themselves). Most of them never had the chance to forget how to run a bare-metal server; they never knew how in the first place. (Hell, I've been running desktop Linux for 25+ years, and I don't think I've ever set up Linux on actual server hardware. Closest I've come is bare-metal Solaris, but that was like 25 years ago.)
"DevOps" today usually means that you know how to run a CLI tool or drive a web interface to deploy your automatically-built container artifact to some cloud-based production system that someone else manages, hiding the details from you. (This bit also can be true for shops that run on bare metal, depending on how advanced their own sysadmin/ops team is.) While developers are often not decision-makers in a larger org, they can be at smaller orgs, and once those developers get on the cloud, you probably will stay on the cloud (companies like OneUptime are the exception, not the rule), even if you've gotten much larger and it's stupid expensive to continue running that way.
esskay
16 hours ago
> I'm so surprised there is so much pushback against this
I'm not. It seems to be happening a lot. Any time a topic about not using AWS comes up here, or on Reddit there a sudden surge of people appearing out of nowhere shouting down anyone who suggests other options. It's honestly starting to feel like paid shilling.
Spooky23
15 hours ago
I don’t think it’s paid shilling, it’s dogma that reflects where people are working here. The individual engineers are hammers and AWS is the nail.
AWS/Azure/GCP is great, but like any tool or platform you need to do some financial/process engineering to make an optimal choice. For small companies, time to market is often key, hence AWS.
Once you’re a little bigger, you may develop frameworks to operate efficiently. I have apps that I run in a data center because they’d cot 10-20x at a cloud provider. Conversely, I have apps that get more favorable licensing terms in AWS that I run there, even though the compute is slower and less efficient.
You also have people who treat AWS with the old “nobody gets fired for buying IBM” mentality.
briffle
10 hours ago
The tooling should be getting close to manage this on-prem now, with VM's, K8s clusters, networking, storage, etc. I know that oxide computers exists, and they look fantastic, but there has got to be more 'open' ways to run things on your own Dell/HP/Supermicro servers with NVMe drives. Especially since VMware has jacked up their prices since being acquired.
Talos OS looks really interesting. But I also need the storage parts, networking parts, etc.
glitchcrab
9 hours ago
I run several Talos clusters (provisioned by Cluster API) on commodity hardware which is part of a Proxmox cluster in my homelab
nostrebored
4 hours ago
teams can't even run k8s in the cloud. teams I've seen running k8s on prem have always been disaster shows. productivity in the gutter.
the tooling to manage on prem is truly awful, and attempts to port the nice semantics of cloud have all slowly died (who remembers localstack?)
dangus
15 hours ago
I think a lot of engineers who remember the bare metal days have legitimate qualms about going back to the way that world used to work especially before containerization/Kubernetes.
I imagine a lot of people who use Linux/AWS now started out with bare metal Microsoft/VMWare/Oracle type of environments where AWS services seemed like a massive breath of fresh air.
baq
15 hours ago
I remember having to put in orders for pallets of servers which then ended up storage somewhere because there were not enough people to carry and wire them up and/or there wasn't enough rack space to install them.
Having an ability to spin up a server or a vm when you need it without having to ask a single question is very liberating. Sometimes such elasticity is exactly what's needed. OTOH other people's servers aren't always the wise choice, but you have to know both environments to make the right choice, and nowadays I feel most people don't really know anything about bare metal.
kaydub
4 hours ago
Not just spin up a server, you can spin up whole regions, even in foreign countries, at the click of a button.
iso1631
13 hours ago
I spin up a VM on my xen vm estate whenever I want it with just some clickops or teraform (depending on the environment)
baq
9 hours ago
What do you think the pallets of servers were intended for
lazyfanatic42
14 hours ago
the best is having rackspace & power but not enough cooling, hahaha murder me
snark42
13 hours ago
That only happens when you have your own data center. That's a whole different issue and most people with their own hardware don't have their own data centers as it's not particularly cost efficient except at incredibly large scale.
kijin
14 hours ago
That's the beauty of VMs.
Luckily, Amazon is far from the only VM provider out there, so this discussion doesn't need to be polarized between "AWS everything" and "on-premise everything". You can rent VMs elsewhere for a fraction of the cost. There are many places that will rent you bare metal servers by the hour, just as if they were VMs. You can even mix VMs and bare metal servers in the same datacenter.
Spooky23
12 hours ago
No doubt -- there are plenty of downsides to running your own stuff. I'm not anti-AWS. I'm pro-efficiency, and pro making deliberate choices. If there's a choice is spend $10M extra on AWS because the engineers get a good vibe -- there should be a compelling reason why that vibe is worth $10M. (And there may well be)
Look at what Amazon/Google/Microsoft does. If you told me you advocate running your own power plants, I'd eyeroll. But... if you're as large a power consumer as a hyper-scaler, totally different story. Google and Microsoft are investing in lighting up old nuclear plants.
array_key_first
10 hours ago
My company runs all their own bare metal data centers but it's containerized, and it's basically magic.
tayo42
13 hours ago
Containers with k8s and bare metal aren't mutually exclusive.
If anything it enables a hybrid environment
parliament32
10 hours ago
I don't think it's paid shilling, I think it's people who got bamboozled into learning cloud-provider-clickops over actual systems work and feel threatened when you suggest hyperscalers aren't the future.
sgarland
2 hours ago
Nailed it. I’ve found that people get super defensive when you inadvertently reveal that they don’t know the fundamentals of their job.
TheCondor
15 hours ago
It’s the current version of CCIE or some of the other certs. People pay money to learn how to operate AWS, other thing erode the value of their investment.
indymike
12 hours ago
A lot of people here's careers have been made by moving into AWS. A lot of people's future careers will be made by moving out of AWS. That's just the tech treadmill in action.
Do what works best for your situation.
BirAdam
15 hours ago
I'm not either. I used to do fully managed hosting solutions at a datacenter. I had to do everything from hardware through debugging customer applications. Now, people pay me to do the same but on cloud platforms and the occasional on-prem stuff. In general, the younger people I've come across have no idea how to set anything up. They've always just used awscli, the AWS Console, or terraform. I've even been ridiculed for suggesting people not use AWS. Thing is, public cloud really killed my passion for the industry in general.
Beyond public cloud being bad for the planet, I also hate that it drains companies of money, centralizes everyone's risk, and helps to entrench Amazon as yet another tech oligarchic fiefdom. For most people, these things just don't matter apparently.
palata
14 hours ago
> Thing is, public cloud really killed my passion for the industry in general.
Similar here, I think. I got into Computer Science because I liked software... the way it was. Now I truly think that most software completely sucks.
The thing is that it has grown so much since then, that most developers come from a different angle.
ecshafer
13 hours ago
I think in 5-10 years there is going to be very profitable consulting on setting up data center infrastructure, and de-clouding for companies.
alphager
13 hours ago
Why do you think public cloud is worse for the environment than a private dc? I'd expect the larger dcs to be more energy efficient.
red-iron-pine
11 hours ago
> It's honestly starting to feel like paid shilling.
the companies selling Cloud are also massive IT giants with unlimited compute resources and extensive online marketing operations.
like of fucking course they're using shillbots, they run the backend shillbot infrastructure.
they literally have LLM chatbot agents as an offering, and it's trivially easy to create fake users and repost / retweet last weeks comments to create realistic looking accounts, when then shill hard for whatever their goals are.
7thaccount
15 hours ago
I think some of that is a certain group of people will do anything to play with the new shiny stuff. In my org it's cloud and now GPU.
The cloud stuff is extremely expensive and doesn't work any better than our existing solutions. Like a commentator said below, it's insidious as your entire organization later becomes dependent on that. If you buy a cloud solution, you're also stuck with the vendor deciding to double the cost of the product once you're locked in.
The GPU stuff is annoying as all of our needs are fine with normal CPU workloads today. There are no performance issues, so again...what's the point? Well... somebody wants to play with GPUs I guess.
ghaff
15 hours ago
Resume-driven development. It's probably pretty much always been a thing.
dumbledoren
10 hours ago
Possible. However what is more likely is that a lot of long-time tech workers have vested stocks or investments in Amazon and they dont want the cash cow (AWS) to get hampered. And similarly a lot of tech workers have invested in AWS skills, so they cant risk those skills becoming less valued in the marketplace due to alternatives.
sneak
12 hours ago
If your spend is less than a few thousand per month, using cloud services is a no-brainer. For most startups starting up, their spend is minimal, so launching on the cloud is the default (and correct!) option.
Migrating to lower cost options thereafter when scaling is prudent, but you "build one to throw away", as it were.
mrits
14 hours ago
I think people that lived through the time where their severs are down because the admin forgot to turn them back on after he drove 50 miles back from the colo might not want to live through that again
steelegbr
15 hours ago
AWS may be overcharging but it's a balancing act. Going on-prem (well, shared DC) will be cheaper but comes with requirements for either jack of all trades sysadmins or a bunch of specialists. It can work well if your product is simple and scalable. A lot of places quietly achieve this.
That said, I've seen real world scenarios where complexity is up the wazoo and an opex cost focus means you're hiring under skilled staff to manage offerings built on components with low sticker prices. Throw in a bit of the old NIH mindset (DIY all the things!) and it's large blast radii with expensive service credits being dished out to customers regularly. On a human factors front your team will be seeing countless middle of the night conference calls.
While I'm not 100% happy with the AWS/Azure/GCP world, the reality is that on-prem skillsets are becoming rarer and more specialist. Hiring good people can be either really expensive or a bit of a unicorn hunt.
mhitza
14 hours ago
It's a chicken and egg problem. If the cloud didn't become such a proeminent thing, the last decade and a half would have seen the rise of much better tools to manage on-premise servers (= requiring less in-depth sysadmin expertise). I think we're starting to see such tools appear in the last few years after enough people got burned by cloud bills and lockin.
hibikir
15 hours ago
And don't forget the real crux of the problem: Do I even know whether a specialist is good or not? Hiring experts is really difficult if you don't have the skill in the topic, and if you do, you either not need an expert, or you will be biased towards those that agree with you.
It's not even limited to sysadmins, or in tech. How do you know whether a mechanic is very good, or iffy? Is a financial advisor giving you good advice, or basically robbing you? It's not as if many companies are going to hire 4 business units worth of on prem admins, and then decide which one does better after running for 3 years, or something empirical like that. You might be the poor sob that hires the very expensive, yet incompetent and out of date specialist, whose only remaining good skill is selling confidence to employers.
dns_snek
15 hours ago
> Do I even know whether a specialist is good or not?
Of course but unless I misunderstood what you meant to say, you don't escape that by buying from AWS. It's just that instead of "sysadmin specialists" you need "AWS specialists".
If you want to outsource the job then you need to go up at least 1 more layer of abstraction (and likely an order of magnitude in price) and buy fully managed services.
everfrustrated
15 hours ago
This only gets worse as you go higher in management. How does a technical founder know what good sales or marketing looks like? They are often swayed by people who can talk a good talk and deliver nothing.
ambicapter
15 hours ago
The good news with marketing and sales is that you want the people who talk a good talk, so you're halfway there, you just gotta direct them towards the market and away from bilking you.
PenguinCoder
15 hours ago
I'm proudly 100% on prem Linux sys admin. There are not openings for my skills and they do not pay as well as whatever cloud hotness is "needed".
marcosdumay
14 hours ago
Nobody is hiring generalists nowadays.
At the same time, the incredible complexity of the software infrastructure is making specialists more and more useless. To the point that almost every successful specialist out there is just some disguised generalist that decided to focus their presentation in a single area.
NDizzle
13 hours ago
Maybe everyone is retaining generalists. I keep being given retention bonuses every year, without asking for a single one so far.
As mentioned below, never labeled "full stack", never plan on it. "Generalist" is what my actual title became back in the mid 2000s. My career has been all over the place... the key is being stubborn when confronted with challenges and being able to scale up (mentally and sometimes physically) to meet the needs, when needed. And chill out when it's not.
zer00eyz
13 hours ago
> Nobody is hiring generalists nowadays.
What?
I throw up in my mouth every time I see "full stack" in a job listing.
We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end. Full Stack is the "webmaster" of the modern era. It might mean front and back end, it might mean sysadmin and DBA as well.
marcosdumay
10 hours ago
Even full stack listings come with a list of technologies that the candidate must have deep knowledge of.
> We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end.
On a first approximation, those roles were all wrong. If your people don't wear many of those hats at the same time, they won't be able to create software.
But yeah, we did get rid of roles. And still require people to be specialized to the point it's close to impossible to match the requirements of a random job.
whstl
14 hours ago
That's the crazy thing.
Most AWS-only Ops engineers I know are making bank and in high demand, and Ops teams are always HUGE in terms of headcount outside of startups.
The "AWS is cheaper" thing is the biggest grift in our industry.
haik90
14 hours ago
I think this is driven by the market itself and the way cloud promotes their product.
After fully in cloud for sometimes, we’re moving to hybrid solutions. The upper management happy with costs and the cloud engineer had new toy's
devnullbrain
6 hours ago
1. large, homogenous domain where the budget for your department is large
2. niche, bespoke domain primarily occupied by companies looking to cut costs
hedora
14 hours ago
I wonder how vibe coding will impact this.
You can easily get your service up by asking claude code or whatever to just do it
It produces aws yaml that’s better than many devops people I’ve worked with. In other words, it absolutely should not be trusted with trivial tasks, but you could easily blow $100K’s per year for worse.
throwforfeds
13 hours ago
I've been contemplating this a lot lately, as I just did code review on a system that was moving all the AWS infrastructure into CDK, and it was very clear the person doing it was using an LLM which created a really complicated, over engineered solution to everything. I basically rewrote the entire thing (still pairing with Claude), and it's now much simpler and easier to follow.
So I think for developers that have deep experience with systems LLMs are great -- I did a huge migration in a few weeks that probably would have taken many months or even half a year before. But I worry that people that don't really know what's going on will end up with a horrible mess of infra code.
whstl
12 hours ago
To me it's clear that most Ops engineers are vibe coding their scripts/yamls today.
The time difference between having a script ready has decreased dramatically in the last 3 years. The amount of problems when deploying the first time has also increased in the same period.
The difference between the ones who actually know what they're doing and the ones who don't is whether they will refactor and test.
bcrosby95
11 hours ago
It depends upon how many resources your software needs. At 20 servers we spend almost zero time managing our servers, and with modern hardware 20 servers can get you a lot.
Its easier than ever to do this but people are doing it less and less.
canucktrash669
14 hours ago
Managed servers reduce the on-prem skillset requirement and can also deliver a lot of value.
The most frustrating part of hyperscalers is that it's so easy to make mistakes. Active tracking of you bill is a must, but the data is 24-48h late in some cases. So a single engineer can cause 5-figure regrettable spend very quickly.
tayo42
13 hours ago
What size companies are we talking about
canucktrash669
5 hours ago
I've seen this in startups with 7 figure ARR (where annual cloud costs were also 7 figures).
Also seen that in F500 where a single architect caused a 5-figure mistake which remove cloud privileges from the entire architecture team. Can't make it up.
tayo42
an hour ago
I mean 5 figures is nothing to most companies when your spending 10s of millions on aws.
dumbledoren
10 hours ago
> AWS may be overcharging but it's a balancing act. Going on-prem (well, shared DC) will be cheaper but comes with requirements for either jack of all trades sysadmins or a bunch of specialists
Much easier to find. Even more, they are skills much easier to learn for existing engineers. What's better, they are fundamental skills that will never lose their value as those systems are what everything else is built on.
Aurornis
14 hours ago
> I'm so surprised there is so much pushback against this.. AWS is extremely expensive.
I see more comments in favor than pushing back.
The problem I have with these stories is the confirmation bias that comes with them. Going self-hosted or on-premises does make sense in some carefully selected use cases, but I have dozens of stories of startup teams spinning their wheels with self-hosting strategies that turn into a big waste of time and headcount that they should have been using to grow their businesses instead.
The shared theme of all of the failure stories is missing the true cost of self-hosting: The hours spent getting the servers just right, managing the hosting, debating the best way to run things, and dealing with little issues add up but are easily lost in the noise if you’re not looking closely. Everyone goes through a honeymoon phase where the servers arrive and your software is up and running and you’re busy patting yourselves on the back about how you’re saving money. The real test comes 12 months later when the person who last set up the servers has left for a new job and the team is trying to do forensics to understand why the documentation they wrote doesn’t actually match what’s happening on the servers, or your project managers look back at the sprints and realize that the average time spent on self-hosting related tasks and ideas has added up to a lot more than anyone would have guessed.
Those stories aren’t shared as often. When they are, they’re not upvoted. A lot of people in my local startup scene have sheepish stories about how they finally threw in the towel on self-hosting and went to AWS and got back to focusing on their core product. Few people are writing blog posts about that because it’s not a story people want to hear. We like the heroic stories where someone sets up some servers and everything just works perfectly and there are no downsides.
You really need to weigh the tradeoffs, but many people are not equipped to do that. They just think their chosen solution will be perfect and the other side will be the bad one.
mjr00
13 hours ago
> I have dozens of stories of startup teams spinning their wheels with self-hosting strategies that turn into a big waste of time and headcount that they should have been using to grow their businesses instead.
Funnily enough, the article even affirms this, though most people seemed to have skimmed over it (or not read it at all).
> Cloud-first was the right call for our first five years. Bare metal became the right call once our compute footprint, data gravity, and independence requirements stabilised.
Unless you've got uncommon data egress requirements, if you're worried about optimizing cloud spend instead of growing your business in the first 5 years you're almost certainly focusing on the wrong problem.
> You really need to weigh the tradeoffs, but many people are not equipped to do that. They just think their chosen solution will be perfect and the other side will be the bad one.
This too. Most of the massive AWS savings articles in the past few days have been from companies that do a massive amount of data egress i.e. video transfer, or in this case log data. If your product is sending out multiple terabytes of data monthly, hosting everything on AWS is certainly not the right choice. If your product is a typical n-tier webapp with database, web servers, load balancer, and some static assets, you're going to be wasting tons of time reinventing the wheel when you can spin up everything with redundancy & backups on AWS (or GCP, or Azure) in 30 minutes.
chickensong
8 hours ago
All valid and important points, but missing a painful one, also rarely represented in threads like this: flaky hardware.
Almost every bare metal success story paints a rosy picture of perfect hardware (which thankfully is often the case), or basic hard failures which are easily dealt with. Disk replacement or swapping 1u compute nodes is expected and you probably have spares on hand. But it's a special feeling to debug the more critical parts that likely don't have idle spares just sitting around. The raid controller that corrupts it's memory, reboots, and rolls back to it's previous known-good state. The network equipment that locks up with no explanation. Critical components that worked flawless for months or years, then shit the bed, but reboot cleanly.
Of course everyone built a secure management vlan and has remote serial consoles hooked up to all such devices right? Right? Oh good, they captured some garbled symbols. The vendor's first tier of support will surely not be outsourced offshore or read from a script, and will have a quick answer that explains and fixes everything. Right?
The cloud isn't always the right choice, but if you can make it work, it sure is nice to not deal with entire categories of problems when using it.
sgarland
2 hours ago
Not saying those things don’t happen, but having worked with on-prem for 2 years, and having ran ancient (13 years old currently) servers in my homelab for 5 years, I’ve never seen them. Bad CPU, bad RAM, yes - and modern servers are extremely good at detecting these and alerting you.
In my homelab, in 5 years of running the aforementioned servers (3x Dell R620, and some various Supermicros) 24/7/365, the only thing I had fail was a power supply. Turns out they’re redundant, so I ordered another one, and the spare kept the server up in the meantime. If I was running these for a business, I’d keep hot spares around.
DrewADesign
14 hours ago
> The shared theme of all of the failure stories is missing the true cost of self-hosting: The hours spent getting the servers just right, managing the hosting, debating the best way to run things, and dealing with little issues add up but are easily lost in the noise if you’re not looking closely.
What the modern software business seems to have lost is the understanding that ops and dev are two different universes. DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems and the role is absolutely no substitute for a systems administrator. Having someone that helps derive the requirements for your infrastructure, then designs it, builds it , backs it up, maintains it, troubleshoots it, monitors performance, determines appropriate redundancy, etc. etc. etc. and then tells the developers how to work with it is the missing link. Hit-by-a-bus documentation, support and update procedures, security incident response… these are all problems we solved a long time ago, but sort of forgot about moving everything to cloud architecture.
mjr00
13 hours ago
> DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems and the role is absolutely no substitute for a systems administrator.
This is revisionist history. DevOps was a reaction to the fact that many/most software development organizations had a clear separation between "developers" and "sysadmins". Developers' responsibility ended when they compiled an EXE/JAR file/whatever, then they tossed it over the fence to the sysadmins who were responsible for running it. DevOps was the realization that, huh, software works between when the people responsible for building the software ("Dev") are also the same people responsible for keeping it running ("Ops").
tstrimple
9 hours ago
It was very much this for me. I knew the hosting side of things because my second job as a programmer was at a small ISP that hosted custom websites. I got used to maintaining Linux web and email servers by hand over SSH. There were some common scripts, but for the most part the pattern was SSH into the server and make the changes you need to make. Most of my early startup career was like this. Closely working with hardware, the server installs, hosting configs as well as the code that actually powered things.
Jump to my first "enterprise" job and suddenly I can't fix things anymore. I have to submit tickets to other teams to look at why the thing I built isn't running as expected. That, to me, was pure insanity. The sysadmins knew fuck all about my app and as far as I was concerned barely knew how to admin systems. I knew a lot more in my 20's after all. But the friction of not running what I wrote was absolutely real and one of the main killers of productivity versus my startup days.
I also have seen this from most of the "enterprise" companies that do "DevOps" when really they just mean they have a sysadmin team who uses modern tools and IaC. The same exact friction and issues exist between dev and ops as before DevOps days. Those companies are explicitly doing DevOps wrong. When you look at the troubleshooting steps during an incident, it's identical. Bring in the devs and the ops team so we can figure out what's going on. I do think startups are more likely to get DevOps right because they aren't trying to force it on the only mental model they seem to be able to understand.
I've also found that dev teams who run and maintain their own stacks are better about automatic failure recovery and overall more reliable solutions. Whether that's due to better alignment between the app code and the app stack during development or because the dev team is now the first call when things aren't working I'm not entirely sure. Likely a mix of both.
hndc
14 hours ago
> DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems
DevOps, conceptually, goes back to the 90s. I was using the term in 2001. If memory serves, AWS didn't really start to take off until the mid/late aughts, or at least not until they launched S3.
DevOps was a reaction to the software lifecycle problem and didn't have anything to do with AWS. If anything it's the other way around: AWS and cloud hosting gained popularity in part due to DevOps culture.
wredcoll
14 hours ago
> What the modern software business seems to have lost is the understanding that ops and dev are two different universes.
This is a fascinating take, if you ask me, treating them as separate is the whole problem!
The point of being an engineer is to solve real world problems, not to live inside your own little specialist world.
Obviously there's a lot to be said for being really good at a specialized set of skills, but thats only relevant to the part where you're actually solving problems.
sgarland
2 hours ago
The issue is that precious few devs are actually good at ops. There are a ton of abstractions that have sprung up that attempt to paper over this, and they all suck for various reasons.
If you think you need to actually know your programming language of choice to be a good dev, I have news for you about actually knowing how computers work to be good at ops.
tetha
9 hours ago
To me it feels like nuance has been lost.
Personally, I would never self-host some B2C or B2B application if you have less than 50 - 100 techies in a healthy org. You can get just too much from a few VMs and/or a few dedicated servers at like Hetzner, OVH, or AWS managed services. At least for the average web rest thingy with a DB and some file storage. I'm sure it's possible to find counter-examples.
On the other hand, we are about 120 devs at work now, couple thousand B2B customers, 10 Platform Ops, 7 HW & DC Ops. I guess we have more ops-people than a startup may have people. Once we get rid of VMWare licensing, our colos are ridiculously cheap when amortized across 5 years compared to AWS or cloud hosting. Once EOL, they'll also reduce cloud-costs on cheaper providers for test systems and provide spontaneous failover and disaster recovery tests.
We're now also getting good cross-team scaling processes going and at this point the big barriers are actually getting enough power and cooling, not buying/racking/maintaining systems. That will be a big price tag next year, but we've not paid that money to AWS the last two years, so it's fine.
As I keep saying internally, self-hosting is like buying a 40 ton excavator, like Large Marge or a 40 ton truck. If you have enough stuff to utilize a 40 ton truck, it's good. If you need to move food around in an urban environment, or need to move an organ transplant between hospitals, a 40 ton truck tends to be rather inefficient and very expensive to maintain and run.
rustystump
3 hours ago
Rarely do startups fail because of a decision like self hosting or not. In many cases it isnt even a few bad decisions but a long series of them plus outside factors which are uncontrollable.
In my experience the aws problem isnt so much that aws is that costly relative to bare metal but that most people do not execute well on aws over provisioning like mad to solve design issues.
There is a perverse benefit to sales at aws to push nonsense product too because of incentives. But people forget that aws isnt even as bad as a ton of other company spend. I have seen a fortune 100 add a few mm to their annual salesforce contract “for funnzies” because at their scale it wasnt that much money.
vb-8448
16 hours ago
> Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?
It's a way to "commoditize" engineers. You can run on premise or mixed infra better and cheaper, but only if you know what you are doing. This requires experienced guys and doesn't work with new grad hired by big cons and sold ad "cloud experts".
calgoo
16 hours ago
Also, when something breaks, you are responsible. If you put it in AWS like everyone else and it breaks, then its their problem not yours. We will still implement workarounds and fixes when it happens, but we are not responsible. Basic enterprise rules these days is to always pay someone else to be responsible.
vb-8448
15 hours ago
Actually nothing new here, this was the same in the pre-cloud era where everyone in enterprises prefer big names(ibm, microsoft, oracle, ecc) to pass the responsibility to them in case of failures ... aka "nobody get fired because of buying IBM"
marcosdumay
14 hours ago
And the big name companies always refuse to take responsibility, and have worse reliability metrics than the lean alternatives...
but somehow that is never a problem.
nickstinemates
13 hours ago
Reality matters less than perception.
snoman
9 hours ago
This fired of some warning bells in my head. Is the data available to actually make a verifiable claim regarding those reliability metrics like you are.
marcosdumay
6 hours ago
Microsoft and Oracle were on the vanguard of suing people that published metrics about them into bankruptcy... So, do you trust the metrics they publish?
IBM is older, and it's incredibly well documented how mainframes are more expensive to run than normal servers.
iso1631
10 hours ago
The only metric that's important is the CTO's bonus
When everyone is suffering because AWS is having its bi-yearly 8 hour outage, the CTO isn't blamed, bonus all round, and maybe the AWS sales team takes him for an apology lunch
When the CTO is up for 1500 days straight then has a 2 hour downtime when nobody else does, the CTO is blamed, no bonus, and more likely to get fired
vidarh
15 hours ago
Unless you put someone on retainer to be responsible, which you can do cheaper than to keep your AWS setup from breaking...
(I do that for people; my AWS using customers consistently end up needing more help)
bbarnett
11 hours ago
It's always your problem. The difference is, if you control things, you can fix it, work around it, resolve it.
If not, you're at the mercy of others.
chasd00
15 hours ago
> then its their problem not yours
this is the main advantage of cloud, no one cares if the site/service/app is down as long as it's someone else's fault and responsibility.
fabian2k
16 hours ago
A large part of the different views on this topic are due to the way people estimate the amount of saved effort and money because you're pushing some admin duties to the cloud provider instead of doing this yourself. And people come to vastly different conclusions on this aspect.
It's also that the requirements vary a lot, discussions here on HN often seem to assume that you need HA and lots of scaling options. That isn't universally true.
nicce
16 hours ago
> A large part of the different views on this topic are due to the way people estimate the amount of saved effort and money because you're pushing some admin duties to the cloud provider instead of doing this yourself. And people come to vastly different conclusions on this aspect
This applies only if you had an extra customer that pays the difference. Basically argument only holds if you can’t take more customers because upkeeping the infrastructure takes too much time or you need to hire extra person which takes more money than AWS bill difference.
tstrimple
9 hours ago
> discussions here on HN often seem to assume that you need HA and lots of scaling options.
Funny how our perceptions differ. I seem to mostly see people saying all you need is a cheap Hetzner instance and postgres to solve all technical problems. We clearly all have different working environments and requirements. That's I roll my eyes at the suggestions in threads I see of going all in on colo. My last two major cloud migrations were due to colo facilities shutting down. They were getting kicked out and had a deadline. In one of the cases, the company I was working with was the second largest client at the colo but when the largest client decided to pull out the owners decided the economics of running the datacenter didn't make sense to them anymore. Switching colo facilities when you have a few servers isn't a big deal. It's annoying but manageable. When you have hundreds to thousands of servers, it becomes a major operational risk and is enormously disruptive to business as usual.
rdtsc
11 hours ago
> I'm so surprised there is so much pushback against this.. AWS is extremely expensive.
Basic rationalization. People will go to extraordinary lengths to justify and defend the choices they made. It's a defense mechanism: if they spent millions on AWS they are not going to sit idly while HN discusses saving hundreds of thousands with everyone nodding and agreeing. It's important for their own sanity to defend the choice they made.
yomismoaqui
16 hours ago
> Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?
We should coin the term "Cloud Learned Helplessness"
JCM9
16 hours ago
As the author points out AWS can provide a few things that you wouldn’t want to try and replicate (like CloudFront) but for most other things you’re very much correct. AWS is ultimately very expensive for what it is. The complicated billing that’s full of surprises also makes cost management a head-banging experience.
tyingq
16 hours ago
Fair, though using AWS solely for CloudFront would mean you should compare to Cloudflare, Akamai, Fastly, etc. I'm not sure if the value prop for it looks so great if you don't include the "integrated with your other AWS stuff" benefit.
JCM9
14 hours ago
Agree, CloudFront isn’t super competitive with CDN focused vendors. It’s basically the “well you’re already on AWS so may as well just use this” play.
vidarh
15 hours ago
I mean, AWS egress is so expensive that I'd put something else in front of it for anyone who has any decent amount of traffic.
maccard
14 hours ago
I work for a small company owned by a huge company. We are entirely independent except for purchasing, IT, and budget approval. We run our CI on AWS, and it’s slow and flaky for a variety of reasons (compiling large c++ projects combined with instance type pressure). It’s also expensive.
We planned a migration to move from 4OD instances to one on prem machine and we guessed we’d save $1000/mo, our builds would be faster and we’d have less failures due to capacity issues. We even had a spare workstation and a rack in the office that so the capex was 0.
I plugged the machine into the rack and no internet connectivity. Put in an IT ticket which took 2 days for a reply, only to be told that this was an unauthorised machine and needed to be imaged by IT. The back and forth took 4 weeks, multiple meetings and multiple approvals. My guess is that 4 people spent probably 10 hours arguing whether we should do this or not.
On AWS I can write a python script and have a running windows instance in 15 minutes.
wredcoll
14 hours ago
This is the root success of aws, it lets internal teams bypass sysadmin departments.
maccard
12 hours ago
The same story applies for software. If I want to buy a license of X for someone, I have to go through procurement, and it takes weeks even for <$50 purchases. Yet if its on the AWS marketplace it’s pre approved as long is doesn’t breach the AWS budget.
ghaff
14 hours ago
Working around official IT was certainly a significant factor early on. I'm less convinced it is nearly as big a driver (or a downside depending on your perspective) today.
mrktf
10 hours ago
It depends on organization size, just my anecdotal example, I would say the moment IT department becomes own island (for example: can totally ignore requests, with excuses staff overbooked/we need extra planning/6 months extra meetings. Or even worse - process request,but up to point where it can show for upper management and blame you for wasting resources) - you can go full cloud, at least there it is possible get something working in reasonable time.
whstl
12 hours ago
Especially considering that outside of startups (where approval would be fast with or without cloud), virtual infrastructure also got its own bureaucratic process.
ghaff
12 hours ago
A lot of people forget that, when server virtualization was still gaining momentum in a lot of circles, it wasn't uncommon at less technically savvy customers--say a regional bank at the time--to be told that it might take 2 months to provision a new server.
whstl
12 hours ago
I don't think anyone is forgetting that in this thread, as there's dozens of answers mentioning this.
But as an example: It took about 3 months to provision an AWS server in a recent company I consulted for due to their own bureaucracy and ineptitude of the Ops team.
On the other hand, when I needed a few CI servers for a startup I worked at, I just collected them from AppleStore during lunch hour.
Now this above is what people are "forgetting" and don't want to listen to.
maccard
7 hours ago
For us the problem is every device that gets plugged into our network is disabled by default, IT need to enable the port and they'll only enable it on machines that they've imaged.
But because AWS isn't in the office, it's fine. We could probably use Hetzner or OVH, but then we have to go through procurement which is as much of as hassle as going through IT.
izacus
15 hours ago
A lot of people here have built their whole professional careers around knowing AWS and deploying to it.
Moving away is an existential issue for them - this is why there's such pushback. A huge % of new developer and devops generation doesn't know anything about deploying software on bare metal or even other clouds and they're terrified about being unemployed.
goalieca
15 hours ago
meanwhile skills in operating systems, networking, and optimization are declining. Every system i've seen in the last 10 years or so has left huge cash on the table by not being aware of the basics.
snoman
9 hours ago
That could have more to do with containerization than the cloud - and that was a goal if I recall.
jsight
8 hours ago
> I'm so surprised there is so much pushback against this..
Same, this trend towards "AWS all the things" has really amazed me.
We've all mocked small companies copying big companies by trying to make their app super-duper scalable from the very start. After all, everyone things they are the next google, despite their 5 total users right now.
But this is really the opposite. AWS is phenomenal for the startup that would readily trade high opex for lower capex. Servers aren't the cheapest things in the world to buy and they depreciate. It makes total sense for startups to start this way.
But why are big companies, with an actual budget for staff, copying the behavior of their favorite startups?
guax
7 hours ago
Opex looks nicer on the sheets than capex for large deployments. Incredible high investment from AWS on luring in C level with "white-papers" and promises of cost and governance magical revolutions. I've heard the promise of cheaper, faster where you can focus on "innovation". I am yet to see any of it become a reality.
mk89
7 hours ago
How would you do multi-region deployments with your own DC?
This is an issue for several companies that start small and within 5 years they find the need to expand abroad. Be it for data sovereignty or so, which is becoming more important than ever in the last 10 years.
Duplicating a region is "a few clicks away" on AWS. This is what the provider enables you to do.
This and a lot of other things. And for such things, yes, you gotta pay.
Hikikomori
6 hours ago
I mean its not that complicated. Rent space in another location, get separate fibers/wavelength between them, redundant internet connection.
But if you're in a growth/startup phase it doesn't make much sense to spend engineering time on this, not that multi region setups in Aws is one button either. Once you're past that and paying aws a million per week or so I think it can make sense to offload expensive services to your own hardware.
SJC_Hacker
12 hours ago
> I'm so surprised there is so much pushback against this.. AWS is extremely expensive. The use cases for setting up your system or service entirely in AWS are more rare than people seem to realise. Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?
Long term yes you can save money rolling your own.
But with cloud you can get something up and running within maybe a few days, sometimes even faster. Often with built in scalability.
This is a much easier sell to the non-tech (i.e., money) people.
If the project continues, the path of least resistance is often to just continue with the cloud solution. At a certain point, there will be so much tech debt that any savings from long term costs from the traditional on-premises, co-location or managed hosting, are vastly by the cost of migration.
vidarh
16 hours ago
There is this belief that it is not extremely expensive and/or that the ops cost of bare metal will outpace it. It is a belief, and it is very rarely supported by facts.
Having done consulting in this space for a decade, and worked with containerised systems since before AWS existed, my experience is that managing an AWS system is consistently more expensive and that in fact the devops cost is part of what makes AWS an expensive option.
neves
16 hours ago
It's always nice to remember that AWS is responsible for 70% of Amazon profits.
vidarh
15 hours ago
As Jeff Bezos has been quoted as saying "your margin is my opportunity"...
The biggest difficulty in eating into AWS market share is that believing it is cheap has become religion.
speleding
15 hours ago
The complexity of AWS versus bare metal depends on what you are doing. Setting up an apache app server: just as easy on bare metal. Setting up high availability MySQL with hot failover: much easier on AWS. And a lot of businesses need a highly available database.
PenguinCoder
15 hours ago
Most businesses really don't need that complexity. They think they do. Premature optimization.
speleding
14 hours ago
If your database has a hardware failure then you could loose all sales and customer data since your last backup, plus cost of the down time while you restore. I struggle to think of a business where that is acceptable.
evanelias
14 hours ago
Why are you ignoring the huge middle ground between "HA with fully automated failover" and "no replication at all"?
Basic async logical replication in MySQL/MariaDB is extremely easy to set up, literally just a few commands to type.
Ditto for doing failover manually the rare times it is needed. Sure, you'll have a few minutes of downtime until a human can respond to the "db is down" alert and initiates failover, but that's tolerable for many small to medium sized businesses with relatively small databases.
That approach was extremely common ~10-15 years ago, and online businesses didn't have much worse availability than they do today.
nostrebored
4 hours ago
When I worked at AWS, the majority of customers who thought they had database backups had not tested recovery. The majority of them could not recover. At that point, RDS sells itself.
The other huge middle ground here is developer competency and meticulousness.
People radically overestimate how competent the average company writing software is.
evanelias
3 hours ago
Putting aside the fact that replication and backups are separate operational topics -- even if a company has no competent backend engineers, there are plenty of good database consultancies that can help with this sort of thing, as a one-time cost, which ends up being cheaper than the ongoing markup of a managed cloud database product.
There's also a big difference between incompetent and inexperienced. Operational incidents are how your team gains experience!
Leaning on managed cloud services can definitely make sense when you're a small startup, but as a company matures and grows, it becomes a crutch -- and an expensive one at that.
speleding
13 hours ago
I've done quite a few MySQL setups with replication. I would not call setup "extremely easy", but then, I'm not a full time DB admin. MySQL upgrades and general trouble shooting is so much more painful than AWS aurora where everything just takes a few clicks. And things like blue/green deployment, where you replicate your entire setup to try out a DB upgrade, are really hard to do onprem.
evanelias
12 hours ago
Without specifics it's hard to respond. But speaking as a software engineer who has been using MySQL for 22 years and learned administrative tasks as-needed over the years, personally I can't relate to anything you are saying here! What part of async replication setup did you find painful? How does Aurora help with troubleshooting? Why use blue/green for upgrade testing when there are much simpler and less expensive approaches using open source tools?
danhor
14 hours ago
My "Homeserver" with its database running on an old laptop has less downtime than AWS.
I expect most, if not 99%, of all businesses can cope with a hardware failure and the associated downtime while restoring to a different server, judging from the impact of the recent AWS outage and the collective shrug in response. With a proper raid setup, data loss should be quite rare, if more is required a primary + secondary setup with a manual failover isn't hard.
wredcoll
14 hours ago
That's not the same as a "high availibility hot swap redundant multi region database".
Running mysqldump to a usb disk in the office once a day is pretty cheap.
spwa4
15 hours ago
A high availability MySQL server on AWS is about the same difficulty as on your own kubernetes instance (I've got a play one on one of those $100 N100 machines, got one with 16G mem). Then:
helm repo add mariadb-operator https://mariadb-operator.github.io/mariadb-operator
helm install mariadb-operator mariadb-operator/mariadb-operator
And then you can just provision MariaDB "kind", ie. you kubectl apply with something specifying database name, maximum memory, type of high availability (single primary or multimaster) and secret reference and there you go: new database, ready to be plugged into other pods.papichulo2023
15 hours ago
Dont you need ECC in your db nodes?
dd_xplore
14 hours ago
N100 supports DDR5 memory (although 1 channel) but I believe DDR5 has some error correction... May not be full ECC
spwa4
7 hours ago
N100 is my homelab, for playing. For instance I have a kubernetes cluster running KubeVirt, which runs 5 VMs, which ... have a kubernetes installation (so I have multiple worker nodes doing a "distributed filesystem" all of which is resharing disks from the same SSD). My production servers are generally older Xeons with ECC ram, which are also running kubernetes.
1oooqooq
14 hours ago
amazing how nobody even know about ECC these days.
see so many series B+ companies running DB and storage without a care in the world.
comprev
11 hours ago
I'm on a Platform team of <8 people and only 3 of us (most experienced too) come from sysadmin backgrounds. The rest have only ever known containers/cloud and never touched (both figuratively and literally :-) bare metal servers in their careers.
They've never used tools like Ansible (or Anaconda) or been in situations where they couldn't destroy the container and start afresh instantly.
eek2121
11 hours ago
I once moved a small site from AWS to Digital Ocean + Cloudflare.
$100-$300 on AWS -> $35/mo for DO + CF. Coincidentally, AWS had an outage soon after, which was avoided thanks to the move.
I have used DO for both clients and myself, and have not had any huge problems with them.
axegon_
7 hours ago
I am not - I hate AWS(and cloud in general) with a passion - overpriced, you are getting locked in by a closed ecosystem the moment you say "hey this feature is neat it will save me so much work", only to realize that you are stuck paying for it for years if you decide to move away from it. But people are inclined to jump on a hype train and become evangelists for life. Truth is AWS(or GCP or Azure or anything else) is a viable option in two cases:
1. You are making a product with 3 friends on evenings and you want to ship asap without having the capacity to invest and setup infrastructure. 2. You are a huge corporation with tens of thousands of employees and hardware needs that you simply cannot source yourself easily or sort out the collocation of the hardware.
Everyone else - get a dozen second-hand servers, shove them in a rack in a data center and you will own the hardware and everything associated with it at half the price of what you'd be paying AWS in a year.
realitysballs
16 hours ago
For my org. I don’t have budget for a dedicated in-house opsec team, so if I on-prem it triggers additional salary burden for security . How would I overcome this?
Ensorceled
15 hours ago
You can't. That's the use case FOR AWS/GCP. Once the differential between having a in-house team and the AWS premium becomes positive is when you make the switch.
A lot of the discussion here is that the cost of the in-house team is less than people think.
For instance: at a former gig, we used a service in the EU that handled weekends, holidays and night time issues and escalated to our team as needed. It was pretty cheap, approximately $10K monthly fee for availability and hourly rate when there were any issues to be resolved. There were a few mornings I had an email with a post-mortem report and an invoice for a hundred euros or so. We came pretty close to 5 9's uptime but we didn't have to worry about SLA's or anything.
spwa4
15 hours ago
There is also the factor that the idea that you don't need administrators for AWS is bullshit. Cool idea, bro. Go to your favorite jobs portal. Search for "devops" ... 1000s of jobs. I click on the first link.
Well, well, they have a whole team doing "devops administration" on AWS and require extra people. So not having the money for an in-house team ... no AWS for you.
I've worked for 2 large-ish firms in the past 3 years. One huge telco, one "medium" telco (still 100s of people). BOTH had a team just for AWS IAM administration. Only for that one thing, because that was company-wide (and was regularly demonstrated to be a single point of failure). And they had AWS administrator teams, yes teams, for every department (even HR had one, though in the medium telco all management had a shared team, but the networking and development departments still had their own AWS teams, who, btw, also did IAM. The company-wide IAM team maintained an AWS IAM and some solution they'd bought that also worked for their windows domain and ticketing system (I hate you IBM remedy), and eqiupment ordering portal and ...)
AND there were "devops" positions on every development team, and on the network engineering team, and even a small one for the building "technics" team.
Oh and they both had an internal cluster on top of AWS, part on-premise, part rented DC space, which did at least half the compute work (but presumably a lot less of the weird edge-cases), that one ran the company services that are just insane on AWS like any kind of video.
Ensorceled
13 hours ago
Yeah, you need less admin, depending but not none. And AWS pushes you towards devops heavy solutions.
1oooqooq
13 hours ago
Exactly. this is the margin aws trives from.
they sell "you don't need a team"... which is true om your prototype and mvp phase. and you know when you grow you will have an ops team and maybe move out.
but in the very long middle time... you will be supporting clients and sla etc, and will end up paying both aws AND an ops team without even realizing.
izacus
15 hours ago
Use the same people who are now maintaining your complex AWS setup. It's not like that doesn't need maintenance or oncall.
Msurrow
15 hours ago
Familiarize yourself with your company’s decision process on strategic decisions like this. Ensure you have a way to submit a proposal for a decision on making the change (or find someone who has that access to sponsor your proposal), build a business case that shows cost of opsec team, hardware and everything else is lower than AWS (or if cost is higher then some other business value is gained from making the change — currently digital sovereignty could be a strong argument if you are EU based).
If you cant build a positive business case then its not the correct move. Cash is king. Sadly.
vidarh
15 hours ago
If you don't have budget for someone to handle this for you, you can't afford AWS either, as you still need to handle the same things and they're generally more complex when you use AWS.
ownagefool
16 hours ago
The consequence of running ingress and DNS poorly is downtime.
The consequence of running a database poorly is lost data.
At the end of the day they're all just processes on a machine somewhere, none of it is particularly difficult, but storing, protecting, and traversing state is pretty much _the_ job and I can't really see how you'd think ingress and DNS would be more work than the datastores done right.
Now with AWS, I have a SaaS that makes 6 figures and the AWS bill is <$1000 a month. I'm entirely capable of doing this on-prem, but the vast majority of the bill is s3 state, so what we're actually talking about is me being on-call for an object store and a database, and the potential consequences of doing so.
With all that said, there's definitely a price point and staffing point where I will consider doing that, and I'm pretty down for the whole on-prem movement generally.
vidarh
15 hours ago
I'm generally strongly in favour of bare metal (not so much actually on prem) but your case is one of the rare cases wher AWS makes sense. Even for cheap setups like that, bare metal could likely be cheaper even factoring in someone on call to handle issues for you, but the amounts are so small it's a perfectly reasonable choice to just pick whatever you're comfortable with.
That's the sweet spot for AWS customers. Not so much for AWS.
The key thing for AWS is trying to get you locked in by "helping you" depend on services that are hard to replicate elsewhere, so that if your costs grow to a point where moving elsewhere is worth it, it's hard for you to do so.
jagged-chisel
16 hours ago
Forget? You have to hire people for that. We are a software organization. We build software. If we rent in the cloud, there is less HR hassle - hiring, raises, bonuses, benefits, firing … none of that headache involved with the cloud.
Technically? Totally doable. But the owners prefer renting in the cloud over the people-related issues of hiring.
jsiepkes
16 hours ago
This is exactly the rhetoric Microsoft used in the 00's with it's "Get the facts" marketing campaign against Linux and open-source: "Never mind the costs, think about the people hours you are saving!".
It wasn't as simple as that then, at it's still not as simple as that now.
ecshafer
13 hours ago
This is true, but also really funny considering that even today the average windows sysadmin can still barely use powershell and relies on console clicking and batch scripts. A good unix admin can easily admin 10-100x the machines as a windows admin, and this was more true back in the early 00s. So the marketing on getting the facts was absolutely false.
bigstrat2003
8 hours ago
Citation needed on that one. I've only worked with a minority of Windows sysadmins who are as incompetent as you say. And yeah, of course a good unix admin can run circles around a bad windows one, but the converse is just as true. A good Windows admin can run circles around a bad unix one. It has nothing to do with the operating systems and everything to do with technical competence of the individual.
ecshafer
3 hours ago
There are a LOT more bad windows admins than bad unix admins though. The floor of being a unix admin is so much higher that it already filters out a lot of people. There are so many MSPs and small businesses with a windows admin that does everything through a console its crazy. You are right its all about the admin, but on average, the average linux admin is far more comfortable scripting than the windows admin.
noir_lord
16 hours ago
Nope and never has been but to (some of) both sides “it depends” means you are on the other side.
It’s become polarised (as everything seems to).
I’ve specced bare metal, I’ve specced AWS, which is used entirely a matter of the problem/costs and relative trade-offs.
That is all it is.
foldr
15 hours ago
In fairness to Microsoft, this argument should have been correct. It ought to be possible for Microsoft to offer products with better polish and better support than open source alternatives, and that ought to more than compensate for any licensing costs. Whether Microsoft actually managed to do this is debatable, but the principle is sound enough.
ghaff
15 hours ago
It sort of was especially with respect to desktop software. The licensing costs associated with Microsoft Office etc. were probably not really that much compared to the disruption with switching offices of people who just wanted to do their job to open source alternatives.
9cb14c1ec0
16 hours ago
This is the fallacy that Amazon sold everyone on: that the cloud has no headache or managment needed. This is manifestly untrue. It's also untrue that bare metal takes lots of management time. I have multiple Dell rack servers colocated in several different datacenters, and I don't spend any time at all managing them. They just run.
al_borland
16 hours ago
> This is the fallacy that Amazon sold everyone on
I’ve been working at a place for a long time and we have our own data centers. Recently there has been a push to move to the public cloud and we were told to go through AWS training. It seems like the first thing AWS does in its training is spend a considerable amount of time on selling their model. As an employee who works in infrastructure, hearing Amazon sell so hard they the company doesn’t need me anymore is not exactly inspiring.
After that section they seem to spend a considerable amount of time on how to control costs. These are things no one really thinks about currently, as we manage our own infra. If I want to spin up a VM and write a bunch of data to it, no one really cares. The capacity already exists and is paid for, adding a VM here or there is inconsequential. In AWS I assume we’ll eventually need to have a business justification for every instance we stand up. Some servers I run today have value, but it would be impossible to financially justify in any real terms when running in AWS where everything has a very real cost assigned to it. What I do is too detached from profit generation, and the money we could save is mostly theoretical, until something happens. I don’t know how this will play out, but I’m not excited for it.
whstl
14 hours ago
I can confirm this.
The AWS mandatory training I did in the past was 100% marketing of their own solutions, and tests are even designed to make you memorize their entire product line.
The first two levels are not designed for engineers: they're designed for "internal salespeople". Even Product Managers were taking the certification, so they would be able to recommend AWS products to their teams.
snoman
9 hours ago
As a business owner that pays the hardware bill, what you see as the benefit of your current environment - or a downside of moving to the cloud - I see in a completely different light. To some extent I’d be upset with arbitrary amounts of paid-for capacity just lying around with zero accountability for that spend.
tonypapousek
9 hours ago
> I don't spend any time at all managing them
Who does, then? Even with automatic updates, one can assume some level of maintenance is required for long-term deployments.
Don’t get me wrong, I love running stuff bare metal for my side projects, but scaling is difficult without any ops.
9cb14c1ec0
7 hours ago
No one. I have automatic backups with proxmox backup server. Updates are automatic and deployments are automated.
JackSlateur
13 hours ago
You miss the good time spent debugging a firmware issue, which leads to packet drop on the NIC (or data corruption on the nvme)
I do not miss that crap
chasd00
16 hours ago
Every company I’ve consulted for has hired a team dedicated to just setting up and monitoring AWS for the software devs. Hell, you’d probably reduce headcount running on bare metal.
papichulo2023
15 hours ago
Pretty much this. Most companies have the "devops" folks fully dedicated to maintaining the cloud stuff.
JackSlateur
13 hours ago
In more than 15 years of experiences, in various compagnies, the number of people who can build and run an on-premise infrastructure sanely can be counted on my right hand fingers
These people exist, but we have far more stupid "admins" around here
When you are not in the infrastructure business (I work in retail at the moment), the public cloud is the sane way to go (which is sad, but anyway)
hobs
16 hours ago
I have spent about 1 day waiting for every 5 days doing stuff at my last 3 jobs all of which were growing companies thinking that they needed the power of the cloud, but they sure as hell were not paying to make it fast or easy to use.
Pay some "devops" folks and then underfund them and give them a mandate of all ops but with less people and also you need to manage the constant churn of aws services and then deal with normal outages and dumb dev things.
vidarh
15 hours ago
I help people run their systems.
Clients that use cloud consistently end up spending more on devops resources, because their setups tends to be wastly more complex and involve more people.
whstl
14 hours ago
I've worked on both kinds of companies in almost 25 years and I can confirm this is true.
The biggest ops teams I worked alongside were always dedicated to running AWS setups. The slowest too were dedicated to AWS. Proportionally, I mean, of course.
People here are comparing the worst possible of Bare Metal with "hosting my startup on AWS".
nostrebored
4 hours ago
This is a toupee situation. Every effective company I've worked at has a slim platform team that might make some nice company specific templates for how to deploy, but individual teams were responsible for creating and owning their infra. The idea of having an AWS ops team is absurd if you're not at a truly massive company (XX,000+)
sgarland
2 hours ago
I have never, ever seen dev-created infra that was well done, much less with repeatable IaC. It’s always résumé-driven nonsense based on whatever someone read on blogs, and they have no clue how any of it works, only that the output what they expect.
wredcoll
14 hours ago
> The biggest ops teams I worked alongside were always dedicated to running AWS setups. The slowest too were dedicated to AWS.
I wish I could come up with some kind of formalization of this issue. I think it has something to do with communication explosions across multiple people.
bryanlharris
12 hours ago
Increases in complexity exponentially increases mistakes + MS Teams meetings are just a glorified game of telephone.
Don't make perfect the enemy of the good.
qaq
15 hours ago
Just because AWS abstracted something doesn't mean you don't need people who understand all the quirks of the black box you supposedly don't have to worry about. Guess what those people are expensive. You also have to deal with a ton of crap like hard resource account limits that on any meaningful size project will push complexity up by forcing you to use multiple accounts.
embedding-shape
15 hours ago
> We build software
Right, doesn't that include figuring out the right and best way of running it, regardless if it runs on client machines or deployed on servers?
At least I take "software engineering" to mean the full end-to-end process, from "Figure out the right thing to build" to "runs great wherever it's meant to run". I'm not a monkey that builds software on my machine and then hands it off to some deployment engineer who doesn't understand what they're deploying. If I'm building server software, part of my job is ensuring it's deployed in the right environment and runs perfectly there too.
canucktrash669
15 hours ago
Ultimately these owners hire me to cut their 6-figure AWS bill by 50%. It's mostly rearchitecting mistakes. Amongst them is taking AWS blog propaganda at face value. Those savings could be 80% if they chose managed bare metal (no racking and stacking).
bilekas
16 hours ago
> Forget? You have to hire people for that. We are a software organization. We build software.
You don't need to hire dedicated people full time. It could even be outsourced and then a small contract for maintenance.
It's the same argument you could say for "accounting persons", or "HR persons" - "We are a software organisation!" - Personally I don't buy the argument.
Foobar8568
16 hours ago
Outsourcing and cloud cost are always underestimated.
theta_d
16 hours ago
> It could even be outsourced and then a small contract for maintenance.
Yeah, those people we outsourced to happen to work at AWS.
vidarh
15 hours ago
They don't though. You still need devops when you use AWS, and most organisations end up needing more time spent on devops when they use AWS.
base698
15 hours ago
Until you factor in the legions of devops writing terraform, iam, and cicd scripts.
ericd
6 hours ago
You can just set up your own cloud on leased machines, and pocket the huge difference in cost. Devops languages are pretty easy to learn, IME, and the infra stuff takes less maintenance than the AWS proponents seem to think. I guess it depends on your usage profile, but like bandwidth especially is ruinously expensive compared to what you get with leased machines.
Aldipower
15 hours ago
Forgot? Driving something on AWS needs also a lot of people. In my experience even more. The term SRE did not exist before.
rcxdude
15 hours ago
I really dislike the fallacy that just because you're buying something it means that you're not building anything. In practice this is never true: there's always some people-in-your-org time cost of buying something just as much as there's some giving-money-to-other-orgs cost to building something. So often organisations wind up buying something and spending way more time in the process than it would cost for them to build it themselves.
With AWS I think this tradeoff is very weak in most cases: the tasks that you are paying AWS for are relatively cheap in time-of-people-in-your-org, and AWS also takes up a significant amount of that time with new tasks as well. Of the organisations I'm personally aware of, the ones who hosted on-prem spent less money on their compute and had smaller teams managing it, with more effective results than those who were cloud-based (to various degrees of egregousness from 'well, I can kinda see how it's worth it because they're growing quickly' to 'holy shit they're setting money on fire and compromising their product because they can't just buy some used tower PCs and plug them in in a closet in the office')
shishcat
16 hours ago
Don't you have cloud architects and similiar figures already?
dumbledoren
11 hours ago
> when did people forget how to run a baremetal server ?
Bigger question: When did people forget that doing that is much easier than AWS...
UltraSane
12 hours ago
I'm not going to argue that AWS can be expensive but in my experience its biggest advantage is SPEED. In every company I worked for that ran their own data centers ever damn thing took FOREVER. new servers took months to buy and rack. any network change like a new VLAN took days to weeks. It was so annoying. But in AWS almost anything is just an API call and a few minutes at most from being enabled. It is so much more productive.
mberning
16 hours ago
It’s expensive and the “design” of the services, if you could call it that, is such that you are forced to pay a lot, or play a lot of games to get around it. If you are going to spend your engineering time working around their ridiculous pricing schemes, you might as well spend the money on building things out yourself.
Perfect example - MSK. The brokers are config locked at certain partition counts, even if your CPU is 5%. But their MSK replicator is capped on topic count. So now I have to work around topic counts at the cluster level, and partition counts at the broker level. Neither of which are inherent limits in the underlying technologies (kafka and mirrormaker)
j45
11 hours ago
The cloud is incredibly profitable for the efficiencies and improvements its introduced and held onto.
Easy to push back against what is now the unknown (bare metal), when the layers extending bare metal to cloud service have become better and better, as well as more accessible.
citizenpaul
11 hours ago
The "value add" of AWS has never been what it can do or does. It has always appealed to weak/incompetent/sociopathic managers and execs desire to not have to deal with capable employees.
As far as they are concerned AWS is taking care of computing AND hiring for them.
I've never worked anywhere that at least some sort of power holder would instantly go to consultants or outsourcing rather than in house because they believe that if you work for the company you must be incompetent, dumb or below average. If you don't work for them you must be exceptional.
zjaffee
14 hours ago
AWS (along with the vast majority of B2B services in the software development industry) is good because it allows you to focus on building your product or business without needing to worry about managing servers nearly as much.
The problems here are no different than using SaaS anywhere else in a business, you can also run all your sales tracking through excel, it's just that once you have more than a few people doing sales that becomes a major bottleneck the same way not having an easier to manage infrastructure system.