voodooEntity
16 hours ago
As someone who worked with serverless for multiple years (mostly amazon lambda but others too) i can absolutly apporove the authors points.
While it "takes away" some work from you, it adds this work on other points to solve the "artificial induced problems".
Another example i hit was a hard upload limit. Ported an application to a serverless variant, had an import API for huge customer exports. Shouldnt be a problem right? Just setup an ingest endpoint and some background workers to process the data.
Tho than i learned : i cant upload more than 100mb at a time through the "api gateway" (basically their proxy to invoke your code) and when asking if i could change it somehow i just was told to tell our customers to upload smaller file chunks.
While from a "technical" perspective this sounds logical, our customers not gonne start exchanging all their software so we get a "nicer upload strategy".
For me this is comparable with "it works in a vacuum" type of things. Its cool in theory, but as soon it hits reality you will realice quite fast that the time and money you safed on changing from permanent running machines to serverless, you will spend in other ways to solve the serverless specialities.
akdev1l
15 hours ago
The way to work around this issue is to provide a presigned S3 url
Have the users upload to s3 directly and then they can either POST you what they uploaded or you can find some other means of correlating the input (eg: files in s3 are prefixed with the request id or something)
I agree this is annoying and maybe I’ve been in AWS ecosystem for too long.
However having an API that accepts an unbounded amount of data is a good recipe for DoS attacks, I suppose the 100MB is outdated as internet has gotten faster but eventually we do need some limit
voodooEntity
15 hours ago
Well i partly agree, and if i would be the one building the counterpart, i prolly had used presigned s3 urls also.
In this specific case im getting oldschool file upload request from software that was partly written before the 2000s - noones gonne adjust anything any more.
And ye, just accepting giant size uploads is far from good in terms of "Security" like DoS - but ye we talking about stupidly somewhere between 100 and 300mb CSV files (called them "huge" because in terms of product data 200-300mb text include quite alot) - not great but well we try to satisfy our customers needs.
But ye like all the other points - everything is solvable somehow - just needs us to spend more time to solve something that technickly wasn't a real problem in first place.
Edit: Another funny example. In a similar process on another provider i downloaded files in a similar size range from S3 to parse them - which died again and again. After contacting the hoster, because their logs litearlly just stopped no error tracing nothing) they told me that basically their setup only allows for 10mb local storing - and the default (in this case aws s3 adapter for PHP) always downloads it even if you tell it to "stream". So i build a solution that used HTTP ranged requests to "fake stream" the file into memory in smaller chunks so i could process it afterwards without completely download it. Just another example of : yes its solvable, but annoying.
conductr
10 hours ago
I find with these types of customers it’s always easier to just ask them to save files locally and grant me privileges to read the data. Sometimes they’ll be on Google, Dropbox, Microsoft, etc and I also run a SFTP for this in case they want to move them over to my service.
Then I either batch/schedule the processing or give them an endpoint to just to trigger it (/data/import?filename=demo.csv)
It’s actually so common that I just have the “data exchange” conversation and let them decide which fits their needs best. Most of it is available for self service configuration.
reactordev
10 hours ago
Uploads to an S3 bucket can trigger a lambda… don’t complicate things. The upload trigger can tell the system about the upload and the client can continue on their day.
Uploader on the client uses presigned url. S3 triggers lambda. Lambda function takes file path and tells background workers about it either via queue, mq, rest, gRPC, or doing the lift in workflow etl functions.
Easy peasy. /s
stuartjohnson12
8 hours ago
> Uploads to an S3 bucket can trigger a lambda… don’t complicate things.
I read this and was getting ready to angrily start beating my keyboard. The best satire is hard to detect.
Dylan16807
7 hours ago
I don't really get the joke. S3 triggering a lambda doesn't sound meaningfully more complicated than using a lambda by itself. What am I missing?
akdev1l
5 hours ago
It gets really complex in this workflow to even achieve something like “file coprocessor successfully” on the client side with this approach
how will your client know if you backend lambda crashed or whatever? All it knows is the upload to s3 succeeded
Basically you’re turning a synchronous process into asynchronous
reactordev
6 hours ago
Solving a serverless limitation with more serverless so you can continue doing serverless when you can’t FormUpload a simple 101mb zip file as an application/octet-stream. Doubling down on it for a triple beat.
Dylan16807
5 hours ago
I wouldn't really call it "more" severless to rearrange the order a bit. Which makes it "solving a serverless limitation so you can continue doing severless". And that's just a deliberately awkward way of saying "solving a serverless limitation" because if you can solve it easily why would you not continue? Spite?
So I still don't see how it's notably worse than the idea of using serverless at all.
reactordev
3 hours ago
The controversy here is the fact that the API Gateway limits the upload resulting in having to engineer a workaround workflow using s3 and triggers (even if this is the serverless way) when all you want to do is upload a file. A POST call with an octet-octet stream. Let http handle resume. But you can’t and you end up going around the side door, when all you really want is client_body_max_size
The sarcasm of correctness yet playing down its complexity is entirely my own. We used to be able to do things easily.
MrDarcy
6 hours ago
If you don’t do it this way you fail the system design interview.
reactordev
3 hours ago
Nope, you didn’t use terracottax so you failed anyway. 6 months before you can reapply in case the first humiliation wasn’t enough. Boss was looking for AWS Glue in there and you didn’t use it.
isoprophlex
8 hours ago
Every day we stray further from the light
themafia
7 hours ago
> Easy peasy. /s
It actually is though. I don't need to build a custom upload client, I don't need to manage restart behavior, I get automatic restarts if any of the background workers fail, I have a dead letter queue built in to catch unusual failures, I can tie it all together with a common API that's a first class component of the system.
Working in the cloud forces you to address the hard problems first. If you actually take the time to do this everything else becomes _absurdly_ easy.
I want to write programs. I don't want to manage failures and fix bad data in the DB directly. I personally love the cloud and this separation of concerns.
sunrunner
7 hours ago
> Working in the cloud forces you to address the hard problems first.
It also forces you to address all the non-existent problems first, the ones you just wish you had like all the larger companies that genuinely have to deal with thousands of file upload per second.
And don't forget all the new infrastructure you added to do the job of just receiving the file in your app server and putting it into the place it was going to go anyway but via separate components that all always seem to end up with individual repositories, separate deployment pipelines, and that can't be effectively tested in isolation without going into their target environment.
And all the additional monitoring you need on each of the individual components that were added, particularly on those helpful background workers to make sure they're actually getting triggered (you won't know they're failing if they never got called in the first place due to misconfiguration).
And you're now likely locked into your upload system being directly coupled to your cloud vendor. Oh wait, you used Minio to provide a backend-agnostic intermediate layer? Great, that's another layer that needs managing.
Is a content delivery network better suited to handling concurrent file uploads from millions of concurrent users than your app server? I'd honestly hope so, that's what it's designed for. Was it necessary? I'd like to see the numbers first.
At the end of the day, every system design decision is a trade off and almost always involves some kind of additional complexity for some benefit. It might be worth the cost, but a lot of these system designs don't need this many moving parts to achieve the same results and this only serves to add complexity without solving a direct problem.
If you're actually that company, good for you and genuinely congratulations on the business success. The problem is that companies that don't currently and may never need that are being sold system designs that, while technically more than capable, are over-designed for the problem they're solving.
themafia
6 hours ago
> the ones you just wish you had
You will have these problems. Not as often as the larger companies but to imagine that they simply don't exist is the opposite of sound engineering.
> if they never got called in the first place due to misconfiguration
Centralized logging is built into all these platforms. Debugging these issues is one of the things that becomes absurdly easy.
> likely locked into your upload system
The protocol provided by S3 is available through dozens of vendors.
> Was it necessary?
It only matters if it is of equivalent or lessor cost.
> every system design decision is a trade off
Yet you explicitly ignore these.
> are being sold system designs
No, I just read the documentation, and then built it. That's one of those "trade offs" you're willingly ignoring.
sunrunner
4 hours ago
> You will have these problems. Not as often as the larger companies but to imagine that they simply don't exist is the opposite of sound engineering.
A lot of those failure mode examples seem well suited to client-side retries and appropriate rate limiting. If we're talking file uploads then sure, there absolutely are going to be cases where the benefits of having clients go to the third-party is more beneficial than costly (high variance in allowed upload size would be one to consider), but for simple upload cases I'm not so convinced that high-level client retries aren't something that would work.
> if they never got called in the first place due to misconfiguration
I find it hard to believe that having more components to monitor will ever be simpler than fewer. If we're being specific about vendors, the AWS console is IMHO the absolute worst place to go for a good centralized logging experience, so you almost certainly end up shipping your logs into a better centralized logging system that has more useful monitoring and visualisation features than CloudWatch and has the added benefit of not being the AWS console. The cost here? Financial, time, and complexity/moving parts for moving data from one to the other. Oh and don't forget to keep monitoring on the log shipping component too, that can also fail (and needs updates).
> The protocol provided by S3 is available through dozens of vendors.
It's become a de facto standard for sure, and is helpful for other vendors to re-implement it but at varying levels of compatibility.
> It only matters if it is of equivalent or lessor cost.
This is precisely the point, I'm saying that adding boxes in the system diagram is a guaranteed cost as much as a potential benefit.
> Yet you explicitly ignore these
I repeatedly mentioned things that to me count as complexity that should be considered. Additional moving parts/independent components, the associated monitoring required, repository sprawl, etc.
> No, I just read the documentation, and then built it.
I also just 'read the documention and built it', but other comments in the thread allude to vendor-specific training pushing for not only vendor-specific solutions (no surprise) but also the use of vendor-specific technology that maybe wasn't necessary for a reliable system. Why use a simple pull-based API with open standards when you can tie everything up in the world of proprietary vendor solutions that have their own common API?
pluto_modadic
5 hours ago
this kinda proves the point that you have to know a silly workaround
hinkley
9 hours ago
We became the flagship customer for a division of AWS that was responsible for managing SSL certificates. We were doing vanity URLs and vanity URLs generally require individual SSL certificates for each domain name. We needed thousands and AWS tools for cert management at the time was really only happy with hundreds and they had backlog items to fix it but those were behind a year or two of other work. It took them about three months to get far enough along for our immediate needs. It's surprising the parts of AWS that have not adjusted to outliers that don't seem really to be that exceptional.
markstos
2 hours ago
I also thought Lambda looked promising at first, but we ultimately abandoned all our Lambda projects and started using containers as needed.
Lambda still requires that you need to update the Node runtime every year or two, while with your own containers, you can decide on your own upgrade schedule.
mulmen
10 hours ago
The hardest problem in computer science is coping a file from one computer to another.
hinkley
9 hours ago
Some architectural arguments I kick myself for not establishing a bibliography of all of my justifications. The thing with mastering something is that you copy the rules into the intuitive part of your brain and you no longer have to reason through it step by step like Socrates's lectures. You just know and you do.
The biggest one I regret is "communicating through the file system is 10x dumber than you think it is, even if you think you know how dumb it is." I should have a three page bibliography on that. Mostly people don't challenge you on this, but I had one brilliant moron at my last job who did, and all I could do was stare at him like he had three heads.
jasonjayr
8 hours ago
Just to help future readers, there is an ecosystem of "tus" uploaders and endpoints, that chunk uploads, and feature resumeable uploads, that would be ideal for this kind of restriction: