hackernews client

Ask HN: When has a "dumb" solution beaten a sophisticated one for you?

55 pointsposted a month ago

Item id: 46572329

72 Comments

atrettel

a month ago

I recently wrote a command-line full-text search engine [1]. I needed to implement an inverted index. I choose what seems like the "dumb" solution at first glance: a trie (prefix tree).

There are "smarter" solutions like radix tries, hash tables, or even skip lists, but for any design choice, you also have to examine the tradeoffs. A goal of my project is to make the code simpler to understand and less of a black box, so a simpler data structure made sense, especially since other design choices would not have been all that much faster or use that much less memory for this application.

I guess the moral of the story is to just examine all your options during the design stage. Machine learning solutions are just that, another tool in the toolbox. If another simpler and often cheaper solution gets the job done without all of that fuss, you should consider using it, especially if it ends up being more reliable.

[1] https://github.com/atrettel/wosp

zahlman

21 days ago

> I choose what seems like the "dumb" solution at first glance: a trie (prefix tree).

> There are "smarter" solutions like... hash tables.... A goal of my project is to make the code simpler to understand and less of a black box, so a simpler data structure made sense, especially since other design choices would not have been all that much faster or use that much less memory for this application.

Strangely, my own software-related answer is the opposite for the same reason.

I was implementing something for which I wanted to approximate a https://en.wikipedia.org/wiki/Shortest_common_supersequence , and my research at the time led me to a trie-based approach. But I was working in Python, and didn't want to actually define a node class and all the logic to build the trie, so I bodged it together with a dict (i.e., a hash table).

bawis

a month ago

What body of knowledge (books, tutorials etc) did you use while developing it?

atrettel

a month ago

Before I started the project, I was already vaguely familiar with the notion of an inverted index [1]. That small bit of knowledge meant that I knew where to start looking for more information and saved me a ton of time. Inverted indices form the bulk of many search engines, with the big unknown being how you implement it. I just had to find an adequate data structure for my application.

To figure that out, I remember searching for articles on how to implement inverted indices. Once I had a list of candidate strategies and data structures, I used Wikipedia supplemented by some textbooks like Skiena's [2] and occasionally some (somewhat outdated) information from NIST [3]. I found Wikipedia quite detailed for all of the data structures for this problem, so it was pretty easy to compare the tradeoffs between different design choices here. I originally wanted to implement the inverted index as a hash table but decided to use a trie because it makes wildcard search easier to implement.

After I developed most of the backend, I looked for books on "information retrieval" in general. I found a history book (Bourne and Hahn 2003) on the development of these kind of search systems [4]. I read some portions of this book, and that helped confirm many of the design choices that I made. I actually was just doing what people traditionally did when they first built these systems in the 1960s and 1970s, albeit with more modern tools and much more information on hand.

The harder part of this project for me was writing the interpreter. I actually found YouTube videos on how to write recursive descent parsers to be the most helpful there, particular this one [5]. Textbooks were too theoretical and not concrete enough, though Crafting Interpreters was sometimes helpful [6].

[1] https://en.wikipedia.org/wiki/Inverted_index

[2] https://doi.org/10.1007/978-3-030-54256-6

[3] https://xlinux.nist.gov/dads/

[4] https://doi.org/10.7551/mitpress/3543.001.0001

[5] https://www.youtube.com/watch?v=SToUyjAsaFk

[6] https://craftinginterpreters.com/

bawis

a month ago

Thanks for detailing, how much time you invested in it?

atrettel

a month ago

I spent around 170 hours on this so far, with only 60% of that being coding. The rest was mostly research or writing.

an-allen

21 days ago

Similar I have a script that has the following format: “q replace all onstances of http: with https: in all txt files recurisvely”

And it goes the ChatGPT comes back with and runs the appropriate command.

jakevoytko

21 days ago

When I was on Google Docs, I watched the Google Forms team build a sophisticated ML model that attempted to detect when people were using it for nefarious purposes.

It underperformed banning the word "password" from a Google Form.

So that's what they went with.

demaga

20 days ago

I wonder if this is just an example of Goodhart's law. How did they measure performance of those models? I would imagine they tried measuring against known cases of forms misuse, aka those forms that contained 'password' field.

eastoeast

a month ago

I’m mostly a hardware engineer.

I needed to test pumping water through a special tube, but didn’t have access to a pump. I spent days searching how to rig a pump to this thing.

Then I remembered I could just hang a bucket of water up high to generate enough head pressure. Free instant solution!

trueismywork

21 days ago

You would have made maximum faget proud

Zanfa

21 days ago

When working at an influencer marketing company a while ago, back when Instagram still allowed pretty much complete access through their API. As we were indexing the entire Instagram universe for our internal tooling, we had this graph traversal setup to crawl Instagram profiles, then each of their followers etc. We’d need to keep track of visited profiles to not loop and had an Apache Storm cluster for the entire scraping pipeline. It worked, but was cumbersome to work with and monitor as well as couldn’t reach our desired throughput.

Given there were about a billion IG profiles total at the time, I just replaced the entire setup with a single Go script that iterated from 1 to billion and tried to scrape every id in between. That gave us 10k requests per second on a single machine, which was more than enough.

anon_cow1111

21 days ago

>an influencer marketing company

I really, really, really wish this sequence of words did not exist in modern society.

/my unsubstantiated reddit-tier comment which I'm only posting because I'm sure someone will piggyback off of it with something related and actually insightful.

sjducb

21 days ago

People forget that a billion rows isn’t big data anymore.

nojs

20 days ago

How long ago? I’m surprised you got anywhere near their servers with 10k requests per second from a single machine.

userbinator

21 days ago

Several times I have rewritten overly-multithreaded (and intermittently buggy) processes with a single-threaded version, and both reduced LoC to roughly 1/20th and binary size to 1/10th, while also obtaining a few times speedup and reduced memory usage, and entirely eliminating many bugs.

conditionnumber

a month ago

Still happens all the time in certain finance tasks (eg trying to predict stock prices), but I'm not sure how long that will hold. As for why that might be, I don't think I can do any better than linking to this comment about a comment about your question: <https://news.ycombinator.com/item?id=45306256>.

I suspect that locating the referenced comment would require a semantic search system that incorporates "fancy models with complex decision boundaries". A human applying simple heuristics could use that system to find the comment.

In the "Dictionary of Heuristic" chapter, Polya's "How to Solve it" says this: *The feeling that harmonious simple order cannot be deceitful guides the discover in both in mathematical and in other sciences, and is expressed by the Latin saying simplex sigillum veri (simplicity is the seal of truth).*

viraptor

21 days ago

It was a very long time ago, but during a programming competition one of the warm-up questions was something to do with a modified sudoku puzzle. The naive algorithmic solution was too slow, the fancy algorithm took quite a bit of effort... and then there were people who realised that the threshold for max points was higher than you needed for a brute force check of all possible boards. (I wasn't one of them)

This generalises to a few situations where going faster just doesn't matter. For example for many cli tools it matters if they finish in 1s or 10s. But once you get to 10ms vs 100ms, you can ask "is anyone ever likely to run this in a loop on a massive amount of data?" And if the answer is yes, "should they write their own optimised version then?"

al_borland

a month ago

I have a silly little internal website I use for bookmarks, searching internal tools, and some little utilities. I keep getting pressure to put it into our heavy and bespoke enterprise CICD process. I’ve seen people quit over trying to onboard into this thing… more than one. It’s complete overkill for my silly little site.

My “dumb” solution is a little Ansible job that just runs a git pull on the server. It gets the new code and I’m done. The job also has an option to set everything up, so if the server is wiped out for some reason I can be back up and running in a couple minutes by running the job with a different flag.

zahlman

21 days ago

Aside from https://news.ycombinator.com/item?id=46665611, way back in my engineering classes in university we had this design project... I'm not sure I've ever told the story publicly before and it brings a smile to remember it more than 20 years later.

My group (and some others) had to design a device to transport an egg from one side of a very simple "obstacle course" to the other, with the aid of beacons (to indicate the egg location and target, each along opposite ends) and light sensors. There was basically a single obstacle, a barrier running most of the way across the middle. The field was fairly small, I think 4 metres across by 3 metres wide.

The other teams followed tutorials, created beacons that emitted high-frequency light pulses and circuitry to filter out 60Hz ambient light and detect the pulse; various robots (I think at least one repurposed a remote-control car) and feedback control to steer them toward the beacons, etc. There were a few different microcontrollers on offer to us for this task, and groups generally had three people: someone responsible for the mechanical parts, someone doing circuitry, and someone doing assembly programming.

My group was just the two of us.

I designed extenders for the central barrier, a carriage to straddle the barrier, and a see-saw the length of the field. The machine would find the egg, scoop it into one end, tilt the see-saw (the other person's innovation: by releasing a stop allowing the counterweighted far side to fall), find the target and release the scoop on the other end. Our light sensors were pointed directly at the ceiling (the source of the "noise"), and put through a simple RC circuit to see that light as more or less constant. Our "beacons" were pieces of construction paper used to block the light physically. All controlled by a 3-bit finite state machine implemented directly in TTL/CMOS (I forget which).

And it worked in testing (praise for my partner; I would never have gotten the mechanics robust enough), but on presentation day the real barrier (made sloppily out of wood) was noticeably wider than specified and the carriage didn't fit on it.

As I recall, in later years the obstacle course was made considerably more complex, ruling out solutions like mine entirely. (There were other projects to choose from, for my year and later years, that as far as I know didn't require modification.)

jmholla

20 days ago

Mine was my senior design project as well. My group got assigned to a competition to wirelessly harness energy in the GHz range. The competition used a ratio of energy to size (power / (longest edge * mass)) to rate the entries and so we decided to focus on the denominator making ours as small as possible.

We finished design and production in the first month using off the shelf parts. That left just presentations as our work for the rest of the semester. The professors kept telling us to design large complicated antennas but we double checked that a small denominator against the minimum power requirement was a solid strategy and stuck it out. At the end of the semester, our final presentation and demonstration had them applauding our decision to focus on the size over energy.

I took our tiny little thing to the competition and we hit middle of the pack against larger and much more complicated designs, some of which couldn't even support themselves (but the supports didnt calculate into your size). And most of the competitors were graduate teams. We probably would've done even better if the banana clips we had to use weren't part of the size calculations; they were significantly bigger than the rest of our contraption.

seanhunter

21 days ago

A famous example of this is the “lego batch scheduler” that is the stuff of hacker legend, but I’m struggling to find a writeup about it online.

The story goes some company/university/whatever in the early days of computing wanted a batch scheduler[1] to run jobs at specific times on their big IBM mainframe. They spoke to IBM who quoted them an eye-watering amount for it and said it would take months to implement. The main system operator told them to just hold fire and he’d see what he could come up with. The next day they had a working batch scheduler for zero dollars. He had set up the jobs so they would run on a keypress on a particular keyboard, then taken some of his kids’ lego and made a long finger on a hinge. He wrapped some string around the winder of a wind-up alarm clock then attached it to the lego and set the alarm clock to go off at the time they wanted to run the job. This had the effect of unwinding the string, lowering the finger that then pressed the key on the keyboard, running the job.

Not only that, but the jobs had a problem if you tried to run them twice, so he made it so the lego brick snapped off when pressing the key, making the job idempotent.

[1] Think “cron”, but for a mainframe

hliyan

21 days ago

I may have written about this before on HN, but once I wrote a simple Perl script that could run the daily trade reconciliation for an entire US primary exchange. It could run on my laptop and complete the process in under 20 minutes. Ten years later, I watched a team spending days setting up a Spark cluster to handle a comparable amount of data in a somewhat simpler business domain.

acheong08

a month ago

For me, CP-SAT is the "dumb" solution that works in a lot of situations. Whenever a hackathon has a problem definable in constraints, that tends to be the first path I take and generally scores top 5

avidiax

21 days ago

Heuristics often work well enough that an AI/ML approach isn't needed. If it is needed, you still need the heuristics. If you were writing a chess engine, you wouldn't just pass the board state and history to a model. You'd still work with chess experts to come up with scores and heuristics for the material and strategic state of the board. You'd come up with detectors for certain conditions or patterns that experts have noted. Along with the board state, that's the input. And you'd still have a long way to go.

----

For storage, people often overcomplicate things. Maybe you do need RAID 5 in a NAS, etc. Maybe what you need is a simple server with a single disk and an offsite backup that rsyncs every night. That RAID 5 doesn't stop 'rm -rf' from destroying everything.

For databases, people often shove a database into an app or product much too early. The rule of thumb that I use is that you should switch to a database (from flat files) when you would have to implement foreign keys, or when data won't fit in memory anymore and memory-mapped files aren't sufficient. Using a database before that just complicates your data model, introducing ORM too early seriously complicates your code.

For algorithms, there are an awful lot of O(nLogn) solutions deployed for problems with small n. An O(n) solution is often faster to write, and still solves the problem. O(n) is often actually faster when things fit in L1 or L2 cache.

For software architecture, we often forget that the client has CPU and storage (and network) that we can use. Even if you don't trust the client, you can sign a cache entry to be saved on the client, and let the client forward it later. Greatly reduces the need for consistency on the backend. If you don't trust the client to compute, you can have the server compute a spot check at lower resolution, a subset, etc.

groundzeros2015

21 days ago

- before ML try linear or polynomial regression

- buying a bigger server is almost always better than distributed system

- Few lines of bash can often wipe out hundreds of lines of python.

kotaKat

20 days ago

> Few lines of bash...

Windows admin in the room -- you'd be amazed what can batch together from DOS batch. I provision APC UPS monitoring cards with a sub-15 line script to bring them into our management.

PowerShell? Hardly knew 'er.

iamflimflam1

a month ago

I wrote a clone of battle zone the old Atari tank game. For the enemy tank “AI” I just used a simple state machine with some basic heuristics.

This gave a great impression of an intelligent adversary with very minimal code and low CPU overhead.

AndrewStephens

21 days ago

Game design is filled with simple ideas that interact in fun ways. Every time I have tried to come up with complex AIs I ended up scrapping them in favor of "stupid" solutions that turned out to be more enjoyable and easier to tune.

zahlman

21 days ago

I can vouch from my experience of turn-based games that exploiting a dumb AI often makes the game more fun (and gives the developer license to throw more/tougher enemies at the player), and noticing the faults really doesn't degrade the experience like you'd expect.

Unless enemies have entirely non-functional pathing. Then it's just funny.

acomjean

20 days ago

I worked as a intern for the government many years ago. They were doing a study on "ice dams" in Maine. Ice flows down river gets stuck and causes flooding an property damage.

https://en.wikipedia.org/wiki/Ice_jam

they had about 5 options for mitigating with technical solutions (structures in the rivers), a price analysis for each. The last option was to buy the very rurual land that were flooding. It was deemed cheaper. I'm not sure what they ended up doing.

rented_mule

21 days ago

Nearly 20 years ago, I was working on indexing gigabytes of text on a mobile CPU, before smart phones caused massive investment in such CPUs. Word normalization logic (e.g., sky/skies/sky's -> sky) was very slow, so I used an in-memory cache, which sped it up immensely. Conceptually, the cache looked like {"sky": "sky", "skies": "sky", "sky's": "sky", "cats": "cat", ...}.

I needed cache eviction logic as there was only 1 MB of RAM available to the indexer, and most of that was used by the library that parsed the input format. The initial version of that logic emptied the entire cache when it hit a certain number of entries, just as a placeholder. When I got around to adding some LRU eviction logic, it became faster on our desktop simulator, but far slower on the embedded device (slower than with no cache at all). I tried several different "smart" eviction strategies. All of them were faster on the desktop and slower on the device. The disconnect came down to CPU cache (not word cache) size / strategy differences between the desktop and mobile CPUs — that was fun to diagnose!

We ended up shipping the "dumb" eviction logic because it was so much faster. The eviction function was only two lines of code, plus a large comment explaining all this and saying something to the effect of "yes, this looks dumb, but test speed on the target device when making it smarter."

peo1306

21 days ago

I redesigned the protocol by which the Mercurial DVCS discovers the common DAG subset between the client and the server.

Firstly, my approach ("set discovery") was simply to take relatively dumb samples of nodes from the leaves towards roots and ask the other party if they knew these nodes, and then iteratively refine with more roundtrips. In practice, this by far beat the previous sophisticated approach ("tree discovery") which tries to use the structure of the DAG to cleverly select "highly informative" nodes.

Secondly, I had a symmetric setup where the client sent samples to the server, and the server responded with information about those samples, and samples of its own. It worked great, saving sometimes 100-eds of network roundtrips. However, computing the samples is relatively expensive. Another contributor suggested that it would work almost as well if the server was kept dumb and would just respond for each sample node whether it knew it or not. This massively reduced server load and kept the protocol much simpler.

https://repo.mercurial-scm.org/hg/file/tip/mercurial/setdisc... https://repo.mercurial-scm.org/hg/rev/cb98fed52495

LowLevelBasket

21 days ago

I once on a project where we couldn't use third party libs. We needed a substring search but the needle could be 1 of N letters. My teammate loves SIMD and wanted to write a solution. I took a look at all of our data and the most strings were < 2kb with many being empty and < 40 letters. SIMD would have been overkill. So I wrote a simple dumb for loop checking each letter for the 3 interesting characters (`";\n`)

helix90

a month ago

The common one I fought long ago was folks who always use regular expressions when what they want is a string match, or contains, or other string library function.

Minor49er

21 days ago

I've seen a lot of the opposite, especially having done a lot of string parsing in PHP: some developers would nest half a dozen string functions just to prepare and extract a line of data while a simple regular expression would have handled the operation much more concisely and accurately

khaledh

20 days ago

Several years ago we had a data processing framework that allowed teams to process data incrementally, since most datasets were in the range of terabytes/day. The drawback is that it's append-only; i.e. you can't update previously processed output; you can only append to it. One team had a pipeline that needed to update older records, and there was a long discussion of proposals and convoluted solutions. I took a look at the total size of the input dataset and it was in the range of a few gigabytes only. I dropped into the discussion and said "This dataset is only a few gigabytes, why don't you just read it in full and overwrite the output every time?" Suddenly the discussion went quiet for a minute, and someone said "That's brilliant!". They only needed to change a few lines of code to make it happen.

abhgh

21 days ago

I once modeled user journeys on a website using fancy ML models that honored sequence information, i.e., order of page visits, only to be beaten by bag-of-words (i.e., page url becomes a vector dimension, but order is lost) decision tree model, which was supposed to be my baseline.

What I had overlooked was that journeys on that particular website were fairly constrained by design, i.e., if you landed on the home page, did a bunch of stuff, put product X in the cart - there was pretty much one sequence of pages (or in the worst case, a small handful) that you'd traverse for the journey. Which means the bag-of-words (BoW) representation was more or less as expressive as the sequence model; certain pages showing up in the BoW vector corresponded to a single sequence (mostly). But the DT could learn faster with less data.

oldnewthing

21 days ago

We were working on the data powering the Xbox console frontend for searches. For example, the metadata that powers a search like "romantic movie". The data was stored in Azure tables. We were all thinking about backup strategies for the data, serialization and deserialization etc. My suggestion was to simply create timestamped copies of the table. That is, if the table was X, the backup would be X_2026-01-18-14-25-00. This required no serialization and deserialization, could run entirely in memory and could shard the processing and was brutally fast. Also, by distributing the copies across multiple regions we could be more reliable. A simple and dumb solution vs a complex one :)

estimator7292

20 days ago

Pretty much any time I cross domains with another engineer. I'm an embedded engineer in a team of highly advanced RF and FPGA engineers. Every time I'm given a task it ends up something like "this board has 6 active ICs and I'm spinning up an FPGA to read an encoder knob, can you find an LED driver?" And then I replace the entire circuit with a single microcontroller.

Conversely, my uninformed suggestions on their work often winds up being incredibly overcomplicated because I don't understand the domain as well.

That's one of the benefits of being in a well balanced team. We can collectively converge on ideal solutions where individually we couldn't.

austin-cheney

a month ago

I occasionally see people complaining about long TypeScript compile times where a small code base can take multiple minutes (possibly 10 minutes). I think to myself WTF, because large code bases should take no more than 20 seconds on ancient hardware.

On another note I recently wrote this large single page app that is just a collection of functions organized by page sections as a collection of functions according to a nearly flat typescript interface. It’s stupid simple to follow in the code and loads as fast as an eighth of a second. Of course that didn’t stop HN users from crying like children for avoiding use of their favorite framework.

commandersaki

a month ago

I remember Scalyr, at least before they were bought by SentinelOne basically did parallel / SIMD grep for each search query and consistently beat data that was continually indexed by the likes of Splunk and ElasticSearch.

imron

a month ago

They had a great article on this too.

yomismoaqui

21 days ago

As engineers we have to aspire to be like that old martial arts master that wins the fight with the simplest move.

When we are learning difficult techniques we want to show them (who doesn't like to show others that he can execute a perfect "Kick of the crescent Dragon from the West"?). But the old master knows that moving aside and sticking out a foot is enough to defeat that rival. More so, maybe that master knows that not fighting is the best solution for solving that problem.

As I'm getting old I want to be more like this.

hereonout2

21 days ago

I often favour low maintenance and over head solutions. Most recently I made a stupidly large static website with over 50k items (i.e. pages).

I think a lot of people would have used a database at this point, but the site didn't need to be updated once built so serving a load of static files via S3 makes ongoing maintenance very low.

Also feel a slight sense of superiority when I see colleagues write a load of pandas scripts to generate some basic summary stats Vs my usual throw away approach based around awk.

AndrewStephens

21 days ago

Great question, I could answer with many stories but here are two:

The (deliberately) very limited analytics software I wrote for my personal website[0] could have used database but I didn't want to add a dependency to what was a very simple project so I hacked up an in-memory datastructure that periodically dumps itself to disk as a json file. This gives persistence across reboots and at a pinch I can just edit the file with a text editor.

Game design is filled with "stupid" ideas that work well. I wrote a text-based game[1] that includes Trek-style starship combat. I played around with a bunch of different ideas for enemy AI before just reverting to a simple action drawn off the top of a small deck. It's a very easy system to balance and expand, and just as fun for the player.

[0] https://sheep.horse/visitor_statistics.html

[1] https://sheep.horse/voyage_of_the_marigold/

compsciphd

18 days ago

years ago on an ACM programming contest we had a problem that was given a starting and ending position on a chess board, find the minimum # of moves it takes for a knight to go from one to the next.

the proper way to solve is this via breadth first search. while my team was working on other problems, I basically solved all possibilities by hand. (i.e. the it was basically just a delta-x/delta-y matrix with a few exceptions if the knight would have to go off the board).

As you had (have?) less computer than team members, when I finally got a chance to a computer, instead of programming out a breadth first search, I just inputted my hand calculated matrix with some scaffolding and submitted quickly and got a validation message quickly back.

I still wonder if that was the fastest turnaround time for a single problem they have seen :)

gethly

21 days ago

Small pieces of code replacing long pieces of code is a daily routine.

But more on the topic, i would say calling ffmpeg as external binary to handle some multimedia processing would be one of those cases where simple is better.

Generally, I would say that implementing your own solution over an external one(like a library, service or product) will always fall under this umbrella. Mostly, because you can implement only what you need and add other things that might be missing or adjust things that do not work exactly as you'd needed them to, so you can avoid any patches, translation layers or external pipelines.

For example, right now I am implementing my own layouting library, because Clay bindings for Go were not working or manual translations were missing features or were otherwise incomplete or non-functional. So I've learnt about Clay and on what principles it was built, by Nic Baker, and wrote my own version in Go. It has little over 2k lines of code right now and will take me about two weeks to finish, with all or most features I wanted. Now I have a fully native Go layouting library that does what I need and I can use and modify it ad infinitum.

So I would say that I equal a "dumb solution" with "my own" solution.

PS: looking back, when I used to work in advertising/marketing/web agency, we used to make websites in CMSs(I did Drupal, colleague did Wordpress). Before my departure from the job in general, I came to a conclusion that if we would be using static website generators, we could have saved unimaginable amount of work hours and deliver the same products, as 99% of clients never managed their websites as by nature of the job, we were doing presentational websites, not complicated functional ones. And when they did, they only needed such tiny changes that it would make way more sense to do it for them for free upon request. For example, imagine you charge someone 5000€ for a website that takes you two months to ship, because you need to design it, build it functionally, fit the visual style and tweak whatever is needed. If you'd use static website generator, the work would take two weeks - a week for the design and a week for coding the website itself. Now you've saved yourself 6 weeks of work while getting paid the same amount of money. Unfortunately, I did not have a chance to try this out and force a new direction at the company as it was at the end of my career.

papanoah

21 days ago

I wrote a tiny game that was basically a dice war clone and needed to implement an enemy AI. I researched the probability formula for throwing a higher number with N dice versus M dice and spent days on the math. In the end I simulated every possible combination aka. fight up to 12 dice (which was the max amount) with an simple python script and stored the results in a key value table. It was soo much easier.

gdulli

20 days ago

I was asked for a web app for two business users to be able to create arbitrary/flexible data driven rule sets through a custom UI. I quickly gave them a "temporary" Django admin app where they could upload Excel spreadsheets representing the actual data use cases they had. They were ecstatic and never needed the fuller system they specced.

saimiam

20 days ago

I made an email mailbox using S3 object versioning.

Every email address was an s3 object so every new email sent to that email address was saved as a new object version.

Presenting that email as an mailbox was just a matter of reading all the versions of that object.

It worked!

I used this contraption as a domain level catch—all inbox for a while until cloudflare started supporting email forwarding.

hahahahhaah

a month ago

Seen people tripped up with dynamodb like stores, especially when they have a misleading sql interface like Azure tables.

You cant be "agile" with them, you need to design your data storage upfront. Like a system design interview :).

Just use postgres (or friends) until you are webscale. Unless you really have a problem amenible to key/value storage.

doix

21 days ago

That's fun to read, I remember when NoSQL was getting cargo-culted, it was specifically because it was more "agile". The reason being you don't need to worry about a schema. Just stick your data in there and figure it out later.

Interesting to hear now that the opinion is the opposite.

efortis

20 days ago

Renamed the "Sign In" button on the website to "Launch App". That way there’s no need to check if the user is authenticated to show the username.

IOW, I can serve the website statically. So no iframe is needed for dynamic parts, or allowing the cookie from a subdomain on the www.

dulakian

21 days ago

I recently needed AI memory and instead of setting up a vector db and RAG, I just used git as a history graph and a knowledge graph in one.

https://github.com/michaelwhitford/mementum

avidiax

20 days ago

I am surprised how terse this prompt is.

> [phi fractal euler tao pi mu] | [Δ λ ∞/0 | ε/φ Σ/μ c/h] | OODA > Human ⊗ AI

Is this some kind of priming incantation?

dulakian

20 days ago

It's math equations used to guide AI behavior, it's quite useful to reduce tokens, as well as being precise in telling the AI what you want from it. I have it fully documented in it's github repository.

https://github.com/michaelwhitford/nucleus

politelemon

21 days ago

It's not glamorous or punchy, I've often seen teams spin up k8s infrastructure to run a few containers, and spending more time maintaining and patching the infrastructure than getting useful work done. We moved then onto lambdas and... everything got better.

ramon156

20 days ago

Most optimisations I do that I think is an improvement just ends up slowing down the app. Eventually we get there, but the initial "oh this is easy" is never an improvement, just my ego thinking they're better than past me

addaon

20 days ago

This was 20+ years ago, so the "sophisticated" baseline wasn't ML or AI.

I was looking into an initial implementation and use of order files for a major platform. Quick recap: C (and similar languages) define that every function must have a unique address, but place no constraints on the relative order of those addresses. Choosing the order in which functions appear in memory can have significant performance impact. For example, suppose that you access 1,000 functions over a run of a program, each of which is 100 bytes in size. If each of those functions is mixed in with the 100,000 functions you don't call, you touch (and have to read from disk) 1000 pages; if they're all directly adjacent, you touch 25 pages. (This is a superficial description -- the thousand "but can't you" and "but also"s in your mind right now are very much the point.)

I went into this with moderately high confidence that runtime analysis was going to be the "best" answer, but figured I'd start by seeing how much of an improvement static analysis could give -- this would provide a lower bound for the possible improvement to motivate more investment in the project, and would give immediate improvements as well.

So, what are all the ways you can use static analysis of a (large!) C code base to figure out order? Well, if you can generate a call graph, you can do depth first or breadth first, both of which have theoretical arguments for them -- or you can factor in the function call size, page size, page read lookahead size, etc, and do a mixture based on chunking to those sizes... and then you can do something like an annealing pass since a 4097 byte sequence is awful and you're better off swapping something out for a slightly-less-optimal-but-single-page sequence, etc.

And to test the tool chain, you might as well do a trivial one. How about we just alphabetize the symbols?

Guess which static approach performed best? Alphabetization, by a large margin. This was entirely due to the fact that (a) the platform in question used symbol name prefixes as namespaces; (b) callers that used part of a namespace tended to use significant chunks of it; and (c) call graph generation across multiple libraries wasn't accurate so some of these patterns from the namespaces weren't visible to other approaches.

The results were amazingly good. I felt amazingly silly.

(Runtime analysis did indeed exceed this performance, significantly.)

Havoc

20 days ago

Not sure about dumb but sometimes brute force is the way if the problem space is small

efortis

20 days ago

Prefetching critical API data on the index.html of an SPA instead of using SSR.

https://github.com/ericfortis/aot-fetch-demo

This doesn't mean a LLM can't build such things however.

tom

20 days ago

There are tons of proven, tested libraries for this.

The dumb, successful approach would be to use one of them.

user

21 days ago

[deleted]