N_Lens
4 days ago
OTEL as a set of standards is admirable and ambitious, though in my experience actual implementation differs significantly between different vendors and they all seem to overcomplicate it.
eurekin
4 days ago
Plus that tens of terabytes of data you have to store for a week worth of traces
c2h5oh
4 days ago
That's why you sample just enough instead of storing everything
voidfunc
4 days ago
That sounds great until you have a massive issue that costs the company real money and leadership asks why you weren't logging everything in full fidelity?
We run with Debug logging on in prod for that reason too. We also ingest insane amounts of data but it does seem to be worth it for a sufficiently complex and important enough system to really have it all.
majormajor
4 days ago
> That sounds great until you have a massive issue that costs the company real money and leadership asks why you weren't logging everything in full fidelity?
You should have an answer, right? Like, in your case, you run a lot of logging, and you know why. So if it's off, you say "because it would cost X/million dollars a year and we decided not to do it."
Course, if you're the one who set it up, you should have the receipts on when that decision was made. This can be tricky sometimes because a lot of software dev ICs are strangely insulated from direct budgets, but if you're presented with an option that would be helpful but would cost a ton of money, it's generally a good thing to at least quickly run by someone higher up to confirm the desired direction.
TYPE_FASTER
3 days ago
I’ve used feature flags to manage logging verbosity and sample rate. It’s really nice to be able to go from logging very little to incrementally pump up the volume when there’s an incident.
evidencetamper
4 days ago
> and leadership asks why you weren't logging everything in full fidelity?
I haven't been asked this question ever. In a way, I wish I was. I wish leadership was engaged in the details of the capabilities of the systems they lead.
But I don't anyone asking me this question any time soon either.
no_wizard
4 days ago
Have you ever been asked “why didn’t we catch this sooner?”. I feel like it’s the same question worded differently
voidfunc
4 days ago
Its really two questions:
1. Why didn't we catch this sooner
2. Why did it take so long to mitigate
Without the debug logging #2 can be really tricky sometimes as well as you can be flying blind to some deep internal conditional branch firing off.
vlovich123
4 days ago
Sampling unconditionally at the start of the request is worth less than sampling at the end (so that your sample 1% of successful traces and be 100% of traces with issues).
eurekin
4 days ago
We do. 0.5%