Show HN: Vince – A self hosted alternative to Google Analytics

174 pointsposted 14 hours ago
by gernest

53 Comments

skeptrune

13 minutes ago

Cool that there are so many of these now. Currently self hosting plausible and it does seem quite barebones. Will have to give this a shot!

zoidb

6 hours ago

My go-to self hosted GA alternative is goatcounter https://www.goatcounter.com. It would be interesting to know what advantages it has over it.

huhtenberg

38 minutes ago

Does it allow filtering visited page list by a specific referrer and vice verse?

written-beyond

an hour ago

Code quality is pristine, really great job! I see that you've used protocol buffers, can you expand on why? I am aware of the benefits it offers but I think it adds a bit of mental overhead initially due to it being an additional type system you have to understand.

Also why are you using pebble exactly? I was interested in seeing how you're managing your geo databases because that's usually the most mind numbing part of handling analytics if your cloud provider doesn't add that information into the request header already. However, I can't understand why you'd use pebble over something like sqlite.

pdyc

6 hours ago

Looks exactly like plausible, may be change the ui a bit to avoid legal issues.

carlosjobim

5 hours ago

I was going to say that it looks exactly like BeamAnalytics, and now I'm confused to who's copying who...

serial_dev

2 hours ago

I'm wondering when copying becomes just following industry best practices...

Twitter, Threads, Mastodon, Blusky all look the same. Project management apps all reuse the same UI patterns. The "AI" logo looked pretty much the same for all companies for a while. Video sharing websites all use YouTube's layout. Forums like Reddit and HN share quite a lot in their looks.

If you want to display website analytics, you will want to show the most important metrics at a glance, you'll need graphs showing visitors over time, top sources and pages... There is only so much you can do to display those and have users understand what's going on on your website.

paradite

an hour ago

Not sure why I would use this over Plausible CE on docker. Does it consume less memory/CPU?

Also I am pretty sure Plausible CE doesn't limit number of sites / events, unlike what's listed in "Comparison with Plausible Analytics".

just-tom

9 hours ago

The screenshot on your homepage looks very similar to plausible's https://plausible.io/ which is also open-source analytics software. Is it based on it? What are the differences?

Edit: Just noticed the feature comparison in the readme.

dewey

3 hours ago

Also Plausible is almost stock TailwindUI elements + including the default color, so many sites look like that.

brokegrammer

9 hours ago

This is amazing! I self host Plausible but don't like depending on Clickhouse and Postgres because they're annoying to upgrade.

What kind of database is this using though? I don't know enough Go to figure it out from the source.

akshayshah

9 hours ago

It uses Pebble, the key-value store that backs CockroachDB.

colesantiago

8 hours ago

Just saw this notice:

> WARNING: Pebble may silently corrupt data or behave incorrectly if used with a RocksDB database that uses a feature Pebble doesn't support. Caveat emptor!

Slightly worrying for now running this in prod if there is a risk for silent data corruption, but hopefully in a few years Vince would have drivers for Postgres / Clickhouse.

rickette

8 hours ago

This just warns about using Pebble with an existing RocksDB which isn't the case here. Pebble powers CockroachDB which is a Serious Database.

dangoodmanUT

4 hours ago

Reread the sentence, it says if you mix it with RocksDB (another database that has compatible file formats)

t0mas88

9 hours ago

It says GDPR compliant and no cookies on the project page. How are unique visitors calculated? And I'm assuming it can't link conversions to campaigns without some cookie-alternative?

withinboredom

7 hours ago

No idea, but generally, a bloom filter would get you there without any identifying information being stored. The counts would merely be estimates at that point, not exact values.

beeb

5 hours ago

At least for Plausible, they state this (https://plausible.io/blog/google-analytics-cookies):

> Instead of tagging users with cookies, we count the number of unique IP addresses that accessed your website. Counting IP addresses is an old-school method that was used before the modern age of JavaScript snippets and tracking cookies.

Since IP addresses are considered personal data under GDPR, we anonymize them using a one-way cryptographic hash function. This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next. We never store IP addresses in our database or logs.

chrismorgan

4 hours ago

> Since IP addresses are considered personal data under GDPR, we anonymize them using a one-way cryptographic hash function.

Um... hashing IPv4 addresses, even with salt, does literally nothing to anonymise (assuming the output space is at least ~32 bits, which I think is safe to assume): they’ll still be PII. IPv6 addresses I’m not so confident about; maybe it would be sufficient for some parts, but it’s definitely inadequate for some concerns.

(For IPv4, enumerating all four billion inputs is so completely practical that “one-way” is nonsense.)

I’m almost certain this is legal theatre.

Semaphor

4 hours ago

One way if you have a salt? Enumerating won’t help, you need to know the salt, which gets deleted.

That said, the whole IP thing is weird to me. Not only are we allowed to log IPs directly for security reasons, we even *have* to log IPs in certain cases (newsletter subscriptions).

kadoban

2 hours ago

> That said, the whole IP thing is weird to me. Not only are we allowed to log IPs directly for security reasons, we even have to log IPs in certain cases (newsletter subscriptions).

The point of designating something as PII isn't that we then _never_ store or use it, it's to carefully consider if we actually need it or not (and what protections we can add for the values we do need to store/use).

We're meant to stop the practice of just collecting and storing all data, without consideration for the harms that causes.

jszymborski

2 hours ago

What matomo does is mask parts of the IP address (you choose how much).

kadoban

4 hours ago

If what they're doing is using a secure salt and then throwing the salt away once a day that _might_ be doing something.

chrismorgan

3 hours ago

What I understand they’re doing is storing the salt in one place, a set of hashed IP addresses in another place, then daily trashing the lot after counting the number of elements in the set and storing that.

Information-theory-wise, this is no different to just storing the actual IP addresses (and deleting them daily after tallying, as before). It does mean that you need to obtain two things instead of just one, but if you get access to it all, it’s straightforward to reverse the lot (though computationally a little expensive), and easy to check a single value for a match.

The technique may be considered reasonable effort at protecting against casual abuse, but it’s not technically effective of itself, and it doesn’t stop the data from being PII. The important aspect is that the PII is deleted within 24 hours. My personal opinion is that the hashing part should probably be considered snake oil and whitewash, at least for what they’re claiming—I don’t say it’s useless, but it definitely doesn’t do what they’re touting it for.

Unless they’re actually keeping the hashed values for some reason after one day, and associating them with other records? In which case, disregard part of what I say, it’s obviously better than persisting IP addresses long-term! But also it’s extremely dubious to call that anonymisation as they do, because you can so often tie things together, behavioural patterns and such, to deanonymise. It’s frighteningly effective.

tingletech

2 hours ago

If you throw away the daily random salt (but keep the obscured IP address), how can you check a single value for a match the next day?

gizzlon

4 hours ago

hm.. are you saying they need scrypt or something similar?

chrismorgan

4 hours ago

The “PII” label is taint that is probably impossible to dispel completely/perfectly, and difficult to dispel sufficiently (and deanonymising is an arms race).

Lossless techniques do nothing to dilute that taint.

Lossy techniques are necessary to get anywhere, such as disregarding certain bits of the address, or Bloom filters.

kadoban

4 hours ago

The problem, in general with hashing IP addresses (especially ipv4) is that there's not that many of them.

If I tell you the value is either 1 or 2, but I hashed it with sha256 to make it secure, that's bullshit, right? You can just hash both and see which it is.

Same concept applies regardless of the hash algo, and still applies if you have more than 2 possible values, 4 billion or so possible ipv4 addresses is _not_ that many values to a computer.

Other common places this problem occurs is with any other restricted set of values, eg phone numbers and email addresses (most are at like 5 domains and are easy to guess/know).

pdyc

6 hours ago

most likely through one way ip hashing bounded by time duration. If you have utm's in your url than it can track otherwise probably not.

aaronbrethorst

11 hours ago

Looks interesting. What sort of memory requirements does it have and how does it persist data?

cebert

14 hours ago

If you haven’t checked it out yet, Serverless Website Analytics, is a great solution for this too. It’s easy to deploy and very inexpensive to run. I’ve been using it and am quite happy with it. https://github.com/rehanvdm/serverless-website-analytics

gernest

14 hours ago

Interesting, I just checked the readme. Very similar but looks like it only works with AWS and has a lot of moving pieces.

How do you deal with location data, do you purchase maxmind db license or use their free versions.

Both maxmind and db-ip free versions of city data miss city geo id values, rendering city data useless for many cases.

With vince, I had to index embed the whole city data from geonames database to work around this.

samdung

8 hours ago

This is great. I'm def going to use it.

Minor bug: "See Live Demo Dashboard" url is wrongly pointed.

rasso

4 hours ago

Does this work on your average 10,-/month shared hosting server? If so, it might really be „for everyone“. Otherwise, we are stuck with matomo.

diggan

4 hours ago

> Does this work on your average 10,-/month shared hosting server?

Since they usually offer software via cPanel and alike, seems unlikely unless you give it lots of time for the project to first get popular enough to get on the "admin panels" mind, and secondly for them to integrate it.

Besides, do people really pay 10 USD/month for shared hosting? Sounds really expensive when you can grab VPSes for half that price and run whatever software you want, not just what they've packaged for you. I guess ongoing maintainace is included in that price, but still sounds kind of expensive for what you get.

rasso

2 hours ago

I don‘t know… around here (Germany), that‘s pretty common. No need to manage anything, no usage-based cost, … my favourite is https://all-inkl.com. OG no-bs hosting for boring tech.

cpursley

5 hours ago

How would y’all go about building analytics into a professional marketplace type of app where you can provide the professional with their own profile page stats (in a reliable way)?

notRobot

6 hours ago

The dashboard demo isn't working :(

manishsharan

2 hours ago

I think the reason some of us continue using Google Analytics is its demographic data. That information is not available elsewhere as far as I know , which I admit is not a lot.

colesantiago

10 hours ago

Great project keep it up it's good to see competition in this space.

Plausible gets crazy expensive on their hosted option and it complex to setup (needs elixir + high memory requirements)

If Vince gets 1:1 parity with plausible and has the option to use clickhouse, I'll consider moving a few servers and people I know over.

Love that Vince is also a single binary as well.

Oras

5 hours ago

If you don't have plans to offer saas, what are you trying to achieve from it?

I mean, it is quite nice to have binary installation hosted on a single VPS, but will you support it?

drchaim

8 hours ago

this is great, congrats!