Ask HN: Thoughts on /etc/hosts instead of DNS for production applications?

10 pointsposted 21 hours ago
by notepad0x90

Item id: 45751579

11 Comments

tacostakohashi

4 hours ago

Try it. It probably works pretty well, up to a certain point.

At a big enough scale, though, with thousands of people making changes many times / day, the file becomes a single point of contention, and the logistics of updating it without losing or clobbering other people's changes are going to become a problem.

SurceBeats

16 hours ago

/etc/hosts works until you need to change an IP across 10,000 servers in under a minute. Then you understand why DNS exists.

DNS isn't just name resolution, I'd say it's kind of load balancing, service discovery, caching, and dynamic configuration "all in one".

The FAANGs do minimize external DNS calls, but they run massive internal DNS infrastructures because the alternatives (config management pushing files) are actually slower and more fragile at scale.

notepad0x90

15 hours ago

I never tought someone would want DNS record updates that fast to be honest, that's a good insight into the scale of things. I'm only left wondering if DNS is the right solution for those use cases you mentioned, is it being abused because it's easy to setup, aren't there more specailized protocols for those use cases?

ActorNightly

11 hours ago

The thing is, when it comes to AWS, its not like everyone is going to suddenly migrate off because of a DNS issues. If a company that runs on AWS is not making money, its very likely that their competitors are neither. So more optimized solutions are not really worth it.

In a perfect world, everything would have a unique IPV6 address, and we wouldn't need DNS. Instead of NAT, you would just have any computer/vm that wants to be connected to the internet tacked on to existing address space, and then instead of DNS record being synced, you just use the address directly, and routing would take care of everything.

JohnFen

21 hours ago

That's how it was done on the internet before DNS was developed. It's also how I still do it for a lot of the machines on my home network. As you note, it's faster and reduces network traffic.

You do give up some good stuff, though. Load-balancing can be more tricky, for instance. And if any of the machines change their IP addresses, or you add new machines to the network, then you have to distribute a new hosts file to all of the machines that aren't using DNS.

notepad0x90

20 hours ago

> And if any of the machines change their IP addresses, or you add new machines to the network

That should (TM) only happen as part of your IaC process anyways, the code/task you have that changes the IP should also change the hosts files everywhere.

Bender

19 hours ago

In my opinion if there is no overlapping networks or the Infrastructure as Code understands pods, k8's and such then /etc/hosts can speed up resolution leaving things outside of the data-center to utilize DNS then it makes sense but requires some critical thinking about how all the inter-dependencies in the data-center play together and how fail-overs are handled.

Why aren't cloud providers and FAANGs doing this already

This probably requires that anyone touching the Infrastructure as Code are all critical thinkers and fully understand the implications of mapping applications to hosts including but not limited to applications having their own load balancing mechanisms, fail-over IP addresses, application state and ARP timeouts, broadcast and multicast discovery. It can be done but I would expect large companies to avoid this potential complexity trap. It might work fine in smaller companies that have only senior/principal engineers. Using /etc/hosts for boot-strapping critical infrastructure nodes required for dynamic DNS updates could still make sense in some cases. Point being, this gets really complex and whatever is managing the Infrastructure as Code would have to fully aware of every level of abstraction, NAT's, SNAT's, hair-pin routes, load balanced virtual servers and origin nodes. Some companies are so big and complex that one human can not know the whole thing so everyone's silo knowledge has to be merged into this Inf as Code beast. Recursive DNS on the other hand only has to know the correct up-stream resolvers to use or if they are supposed to talk directly to the root DNS servers. This simplifies the layers upon layers of abstraction that manage their own application mapping and DNS.

Another complexity trap people get lured into is split-views which should be avoided due to growing into a complexity trap over time and breaking sites when one dependency starts to interfere with another. Everyone has to learn the hard way for themselves on this topic.

My preference would be to instead make DNS more resilient. Running Unbound [1] on every node pointing to a group of edge DNS resolvers for external IP addresses with customized settings to retry and keep state up the fastest upstream resolving DNS nodes, also caching infrastructure addresses and their state, setting realistic min/max DNS TTL times is a small step in the right direction. Dev/QA environments should also enable query logging to a tmpfs mount to help debug application misconfigurations and spot less than optimal uses of DNS within the infrastructure and application settings before anything gets to staging or production. Grab statistical data from Unbound on every node and ingest it into some form of big-data/AI web interface so questions about resolution, timing, errors may potentially be analyzed.

This is just my two cents based on my experience. If it seems like I was just spewing words I was watching Shane Gillis and did not want to turn it off.

[1] - https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound...

notepad0x90

16 hours ago

Thanks for the well thought response friend :)

You made some really good points. But here is my follow up: With /etc/hosts, there is no need to complicate things, for example:

10.0.0.1 sql.app.local storage.local lb.corp.net

This line could be present on every host on every network, everywhere. The only thing that should matter in my opinion is that the name portion needs to be very specific. Even if you have NAT, SNAT, etc..., /etc/hosts is only relevant to the host attempting to resolve a name, it already knows what name to use.

So long as you have one big-and-flat /etc/hosts everywhere, you just have to make sure that whenever you change an IP for a service, the global /etc/hosts reflects that change. and of course the whole devops tests, reviews,etc... ensure you don't screw that up.

Back in the day, this was a really bad idea because the problem of managing /etc/hosts at scale wasn't solved. But it is just a configuration file for which IaC is best-suited.

DNS on the other hand is a complex system that has hierarchies, zones, different record types, aliases, TTLs, caches, and more. in a closed private network, is DNS really worth it when you have already invested in IaC?

Bender

15 hours ago

So long as you have one big-and-flat /etc/hosts everywhere

I get where you are coming from and in a small to almost medium company that might work but at some point there will eventually be conflicts when networks and environments managed by many different teams will start to conflict or not be able to resolve things until someone opens a ticket to update the other departments Infrastructure as a Service. In my experience teams and orgs want to have control over their thing and while they could logically all share commit access to a big flat thing it will start to introduce artificial problems.

I could be wrong. Perhaps in your company it will work out just fine. As nobody here on HN knows the logical and physical structure of your company maybe pull a meeting together consisting of leaders from each team/org that currently influence DNS records and ask them to pick apart your idea after documenting it in a way everyone can visually understand how the code repositories and multi-department git permissions would be laid out and how each team would be able to independently add, change and delete records whenever they need to and review audit logs both in the repositories and possibly on each node. My views could be skewed by all the politics that naturally occur as organizations grow. For what it's worth I was in a company that had multi-data-center wide /etc/hosts and it was just dandy when the company was small. We outgrew that by the second iteration of our data-centers.

notepad0x90

15 hours ago

You make a good point, I'm still a bit stuck on the conflict part since you can have multiple names. but i can envision where multiple teams want to use db.local or something, and if you're providing services internally, that could be hard to scale for sure. I'd like to think that those people avoiding pesky tickets and all that end up causing outages by moving their conflict to DNS? but what do I know.

In the end, I trust your experience over my opinion. Thank you.

Bender

15 hours ago

but i can envision where multiple teams want to use db.local or something

They could just use service1.region1.db.local but the trick is to get all the teams to agree to this or have a top-down decision from leaders in a new greenfield data-center design. Only you and your coworkers can really decide if this works. I hope it works out.