arter45
2 days ago
>But building this taught me something that I think about constantly: technical correctness is worthless if you’re solving the wrong problem.
>You can write perfect code. You can build flawless systems. You can optimize the sh*t out of your cost function. And you can still end up with something that sucks.
>The important part isn’t the optimization algorithm. The important part is figuring out what you should be optimizing for in the first place.
>Most of the time, we don’t even ask that question. We just optimize for whatever’s easy to measure and hope it works out.
>Spoiler: it probably doesn’t.
As tech people, it's kinda hard to admit it, but it's totally correct. Sometimes you actually have to optimize for X, sometimes you don't. It's totally ok to optimize stuff just for passion or to try out new stuff, but if you expect external validation you should do this for things people actually care about.
As an aside, this is also related to the way random companies carry out technical interviews, cargo-culting FAANG practices.
FAANGs tend to care about optimizing a lot of stuff, because when you have billions of customers even the smallest things can pile up a lot of money.
If you are a random company, even a random tech company, in many domains you can go a long way with minimal tuning before you have to turn to crazy optimization tricks.
For example, one day has almost 100k seconds, so if you have 100k daily requests (which is still a lot!), even if you have 10x peaks during the day, you are most likely getting <= 10 requests per second. It's not that much.
silvestrov
2 days ago
Yep.
You can take it one step further: imagine you live in a smallish country (10 million people).
If your market share is 10% of the population and they make 1 request per day, that is just 10 requests per second.
10% is a large market share for everyday use. So you can use 1% market share and 10 requests and it will still be just 10 reqs/sec.
In fact, 1% market share of 10 million people and you can use the number of requests each user makes as the number of requests that your server will get (on average) per second.
There is a lot of business in small countries that never need to scale (or business in narrow sectors, e.g. a lot of B2B).
arter45
2 days ago
Exactly. In those cases, getting 100 reqs/sec or 1000 reqs/s probably means you're getting DOS'd. Any rate limiter is enough.
Or you could still use multiple instances, not for scaling but for patching without downtime and so on.
Availability can be way more important than sheer performance or number of concurrent requests.
johnmwilkinson
2 days ago
Of course, they make 90% of requests between 6 and 7 PM, with a general peak of 4 thousand req/s.
arter45
a day ago
If an application gets 4 thousand req/s for an hour, and an additional 10% requests in the rest of the day, it is handling nearly 15 million reqs/day, which is completely different and of course requires scaling in most cases.
That said, even then, there are a lot of business cases where you are not constrained by the time required to sort or traverse a custom data structure, because you spend more time waiting for an answer from a database (in which case you may want to tune the db or add a cache),or the time needed to talk to a server or another user, or a third party library, or a payment processing endpoint.
There are also use cases (think offline mobile apps) where the number of concurrent requests is basically 1, because each offline app serves a single user, so as long as you can process stuff before a user notices the app is sluggish (hundreds of milliseconds at least) you're good.
What do you do with those 4 thousand req/s? That's what makes the difference between "processing everything independently is fast enough for our purposes", "we need to optimize database or network latency", or "we need to optimize our data structures".
glemion43
2 days ago
I don't need to optimize if my design is well though of from the getgo and it happens when Im good at it.
I would just never write code which struggles with n.
And having some hashmap added at one point because I know how stuff works properly doesn't cost me anything.
arter45
2 days ago
>And having some hashmap added at one point because I know how stuff works properly doesn't cost me anything.
Sure if it costs nothing, go for it.
With that said,
1) time complexity is just one kind of complexity. In real life, you may be interested in space complexity, too. Hashmaps tend to use more "space" than regular arrays, which might be an issue in some cases. Also, some data have a lot of collisions when managed using hashmaps, which may not be ideal.
A well thought design with respect to performance and scalability relies on a few assumptions like these, which could lead to one solution or another.
2) a real-world application is not necessarily constrained (in space or time) by traversing an array or a hashmap. Unless your application is mostly processing, sorting,... data structures, this is probably not the case.
For example, consider a simple application which lets users click and reserve a seat at a theater/conference/stadium/train/whatever.
The application is essentially a button which triggers a database write and returns a 'Success' message (or maybe a PDF). In this case, you are mostly constrained by the time needed to write on that database and maybe the time needed for a PDF generation library to do its things. You are in fact interacting with two "APIs" (not necessarily Web, REST APIs!): the database API and the third-party PDF library API. I don't have any special knowledge about PDF libraries, but I suspect their performance depends on the amount of data you have to convert to PDF, which is more or less the same for every user. And when it comes to databases your performance is mostly limited by the database size and the number of concurrent requests.
If you think this is too simple, consider additional features like authentication, sending an email notification, or maybe choosing between different seat categories. In most cases, your code is doing very little processing on its own and it's mostly asking stuff to other libraries/endpoints and getting an answer.
Consider another example. You want to find out the distance between a user (which is assumed to have a GPS receiver) and a known place like Times Square or whatever. What you have is a mobile app which gets the GPS position from the phone and computes the distance between the user and the known coordinates, using a known formula. The input size is always the same (the size of the data structure holding GPS coordinates), the formula to compute the distance is always the same, so processing time is essentially constant.
Now let's say you have a bunch of well known places, let's say N. The app computes the distance for all N places, effectively populating an array or an array of dicts of whatever, with length N. Maybe the app also sorts the data structure to find the 5 closest places. How long will that take? How many places you need to compute and sort before a user notices the app is kinda slow (i.e. before, say, 200 milliseconds) or exceedingly slow (let's say above 1 second, or even 500 ms)?
There are a lot of scenarios and real-world applications where, using modern hardware and/or external APIs and reasonable expectations about clients (users don't care about microseconds, sending an email or push notification in one second is totally acceptable in most cases,...), you are not constrained by the data structure you are using unless you're working at a large scale.