wongarsu
5 hours ago
If anything, this tool tracks with my general opinion on sentiment analysis: it would be awesome if it actually worked, but most algorithms just predict everything as neutral.
For example if you search for bitwarden it ranks three comments as negative, all others as neutral. If I as a human look at actual comments about bitwarden [1] there are lots of comments about people using it and recommending it. As a human I would rate the sentiment as very positive, with some "negative" comments in between (that are really about specific situations where it's the wrong tool).
I've had some success using LLMs for sentiment analysis. An LLM can understand context and determine that in the given context "Bitwarden is the answer" is a glowing recommendation, not a neutral statement. But doing sentiment analysis that way eats a lot of resources, so I can't fault this tool for going with the more established approach that is incapable of making that leap.
1: https://hn.algolia.com/?dateRange=pastMonth&page=0&prefix=tr...
team-o
2 hours ago
I haven't looked in the specific classifications of this particular model, but what your comment shows is the importance (IMO) of having a "no sentiment" class when classifying sentiment. E.g. if someone says "John doe is an average guy", the sentiment to John is neutral. But if someone says "John doe is my uncle" there's no sentiment and it should be classified as that. Perhaps the classifier here already takes this into account, but just thought it was worth mentioning the importance of having this extra class, or a separate pre-filter classifier. In your example I also see many that could be filtered out. E.g. "I store them in Bitwarden not in dotfiles" doesn't contain negative/neutral/positive sentiment, or at least you're not able to tell from just this sentence. I appreciate it's a fine line between neutral and no sentiment though.
Mockapapella
4 hours ago
I think this is fair criticism for where it's at and mirrors my experience while building the tool. For generative AI at least, the smartest models + a good prompt will waffle stomp our tool in terms of quality.
For example, while testing it on "Founder Mode" there were a couple comments that mentioned something like "I hate founder mode but I really really like this other thing that is the opposite of founder mode..." and then just continues for a couple paragraphs. It classified the comment as positive. While _technically_ true, that wasn't quite the intention.
We think there are some ways around this that can increase the fidelity of these models that won't involve using generative AI. Like you said, doing it that way eats a ton of resources.
wongarsu
an hour ago
Just spitballing, but maybe a good tradeoff is to use NLP to find good candidate comments that are likely to contain a sentiment, and then analysing a small number of them with a more expensive model (say a quantized 3B or 7B LLM with a good prompt). The quality over quantity approach
zzleeper
4 hours ago
BTW, which algo did you use to classify sentiment? bert or something related?