nh2
3 days ago
Can you clarify: Does the full-text search for logs linearly search all logs like Loki does, or can it speed it up with an index?
The docs at https://www.hyperdx.io/docs/search don't seem to talk about this key design decision.
I have a couple 100 GB to few TB logs (all from `journald` or JSON lines), just want to store them forever, and find results fast when searching for arbitrary substrings.
Loki does not use an index, so it's pretty slow at finding results in TB-sized logs (does not return results within a few seconds, so it's not interactive).
https://quickwit.io is one thing I'm looking at integrating, that can solve much of the index-based log search.
(Note I'm not super familar with the capabilities of ClickHouse itself regarding indexed full-text search.)
mikeshi42
3 days ago
You'd generally add an index to your logs in Clickhouse to do searching (via ngram or token bloom filters typically: https://clickhouse.com/docs/en/optimize/skipping-indexes#blo...). There's other ways of indexing as well but that's generally the best for full text search. We use token bloom filter indexes today and find them quite effective (it can skip whole chunks of logs due to the bloom filter being able to say that a word did not appear in the chunk of logs).
Indeed Loki is incredibly slow - Clickhouse is deployed for logging at scale (ex. trip.com is running a 50pb logging solution that allowed them to 4x their old ES cluster volume while also running queries 4-30x faster)
nh2
3 days ago
Thanks! When using full open-source HyperDX (beyond the Kibana part), inclusive of your choices of ingestion and controlling Clickhouse, does it set up the recommended indexes automatically?
That is, is it a full drop-in for a typical Grafana + Loki deployment?
For context, I'm currently following the approach described in https://xeiaso.net/blog/prometheus-grafana-loki-nixos-2020-1... where with ~40 lines of NixOS config it pushes my entire server cluster's systemd journald logs into Grafana.
Roughly how much effort would one have to put in to achieve the same with HyperDX? If it's not too hard, I might get around to package it as a NixOS service.
mikeshi42
3 days ago
yes! the full stack includes our recommended schema which has the indexes set up - it's a drop in replacement for anything that would ingest Otel-based telemetry! If you already have Promtail setup - you might want to set up a collector or tweak the existing collector to take in Promtail via the Otel Loki Receiver: https://github.com/open-telemetry/opentelemetry-collector-co...
Overall it doesn't sound very hard to me!
valyala
2 days ago
If you want fast searching for some unique word or phrase across terabytes of logs (aka "needle in the haystack" type of search), then take a look at VictoriaLogs [1] (I'm the core developer). It uses bloom filters for quick skipping of data blocks, which do not contain the given word or phrase. Contrary to other open-source solutions for log storage and analysis, VictoriaLogs works efficiently with any types of logs containing any sets of fields, without the need in any configuration and tuning.