Show HN: Nomadic – Minimize RAG Hallucinations with 1 Hyperparameter Experiment

95 pointsposted 2 months ago
by mustafabal

Item id: 41459121

37 Comments

add-sub-mul-div

2 months ago

Lots of grassroots interest in this from a flood of new accounts created in the last few hours.

========

baileyw6 2 hours ago [flagged] [dead] | prev | next [–] excellent work!

r0sh 3 hours ago [flagged] [dead] | prev | next [–] cracked team!

mlw14 3 hours ago [flagged] [dead] | prev | next [–] Interesting library, is it like unit testing for RAGs? Can't wait to try it out!

lncheine 2 hours ago [flagged] [dead] | prev | next [–] Interesting library, can't wait to try it out!

Linda_ll 2 hours ago [flagged] [dead] | prev | next [–] Congrats on the launch! Excited for what’s to come :)

bmountain17 3 hours ago [flagged] [dead] | prev | next [–] Great new platform to boast AI performance, can't wait to try the Python library!

jjBailey 1 hour ago [flagged] [dead] | prev | next [–] Cool library, I’ll test it out

sidkapoor39 3 hours ago [flagged] [dead] | prev | next [–] Congrats on the launch! Excited to see how this streamlines Hyperparameter optimization. Keep up the great work!

brucetry 1 hour ago [flagged] [dead] | prev | next [–] Ver interesting, similar to unit test for RAGs? Love to try it out

jjBailey 1 hour ago [flagged] [dead] | prev | next [–] Very interesting library!! Can’t wait to try it!

luxxxxx 1 hour ago [flagged] [dead] | prev | next [–] Interesting library! Is it like unit testing for RAGs? Can’t wait to try it out!

kangjl888 2 hours ago [flagged] [dead] | prev | next [–] Huge congratulations to the NomadicML team on the launch of Nomadic! The platform looks like a game-changer for optimizing AI systems, excited to see how it transforms hyperparameter search for the community.

nishsinha2345 21 minutes ago [flagged] [dead] | prev | next [–] Excited to try out this library! would this help make unit testing easier? Or be used instead of unit testing?

greysongy5 19 minutes ago [flagged] [dead] | prev | next [–] Wow, this seems like it would really help automated RAG testing. What are the top use cases today?

sidvijay10 5 minutes ago [flagged] [dead] | prev [–] We're looking for a RAG testing framework for searching UGC. So far we've just been running evals manually w/o a library. Will try out Nomadic and see if it's more convenient.

mutant

2 months ago

Llm answer bros. I was wondering when we'd start seeing this more.

_eric_z_lin

2 months ago

This looks like a really useful tool for keeping AI systems optimized, especially as models and data evolve over time. I'm curious, have you considered how Nomadic might integrate into CI/CD pipelines? It seems like it could be valuable for automatically re-tuning parameters and ensuring performance doesn't degrade with new model versions or data updates. Any plans for features that would support this kind of continuous optimization workflow?

simbasdad

2 months ago

Thank you so much. Yes, we believe CI/CD pipelines are a treasure trove of data for continuous ML system optimization (these are non-deterministic systems run repeatedly with new evaluation results at each run), where you get to learn about your own ML systems. Nomadic integrates well here to continuously collect data, that it can then use to better identify optimal HP configs on the same systems. We envision this as: every time you run CI/CD pipeline, you get more data with which you can learn about your ML system better, and Nomadic is your engine to realize this.

elizabethhu

2 months ago

to add on-- if you're more interested in real-time optimization (where the best configs are automatically set and iterated on in your system), Nomadic can integrate directly at the application level within your production code. You can then make queries like nomadic.get_optimal_value(experiment_id="...", default=...) to fetch the most recent and optimal hyperparameters for your system. This approach lets you continuously refine and set the best versions of your production system using both your CI/CD pipeline and historical production data

mustafabal

2 months ago

Hi all, we've received a few questions offline on what type of developers can benefit the most from Nomadic's offerings. We believe the Nomadic SDK & Workspace can benefit a wide range of developers:

1. Solo ML practitioners looking to streamline their workflows, 2. MLEs in small to mid-size companies wanting FAANG-level capabilities, 3. Data science teams aiming to productionize models more efficiently, 4. Startups needing to quickly deploy ML features without a large engineering team.

Our goal is to provide tools that let any team serve high-quality ML features, regardless of size or resources. We're trying to bridge the gap between cutting-edge ML research and optimized, deployable solutions.

If you want to dig deeper, please peruse our Nomadic Docs (https://docs.nomadicml.com), Workspace Demo (https://demo.nomadicml.com), and contact at info@nomadicml.com.

elizabethhu

2 months ago

Hey HN! I'm Lizzie, one of the cofounders of NomadicML - excited to get your thoughts on our demo and repo.

We started working on Nomadic because we saw people wanted to ship out powerful and reliable systems but very often didn't have a map of it:

Which embedding model works best for my RAG? What temperature to set? What threshold for similarity search?

We wanted a tool to make the decision process of answering these types of questions systematic and affordable instead of resorting to intuition or something like a single expensive grid search, then set it and forget it... give us your most honest feedback!

bonnet_clement

2 months ago

This is a really cool library built in such a short time! I'm very excited to try it out! Small feature suggestion: I could be wrong, but having the standard deviation or some statistical significance alongside the score (whether it's retrieval, inference, or overall) would strengthen the decision-making around parameter optimization. Easier to know the confidence around a hyper-parameter choice. Great work!!

user

2 months ago

[deleted]

Jadiker

2 months ago

It looks like the hallucination score is somewhat related to perplexity in the sense that it relies on specific tokens. This could cause issues because rephrasing or using slightly different terms could lead to a higher hallucination score. E.g. if the correct answer is "John Smith is the world's best baker" then "Mary Kay is the world's best baker" would have a better score (lower hallucination) than "Leading maker of baked items across all the continents: John Smith" according to your metric.

Are there any plans to make updates to this score or add in different metrics for more accurately detecting hallucinations that don't penalize rephrasing?

varunkrishnan17

2 months ago

Thanks for the well-thought out question Jadiker!

This is a potential limitation of N-gram precision with context matching, which we were using in the RAG demo for simplicity (though even with this, I don't think it would be so extreme :-) )

We already offer two other different hallucination detection approaches which should mitigate this problem - an LLM-as-a-judge model for evaluation, and semantic similarity matching. We've also considered, for example, using metrics such as BertScore. Do you have other ideas? :-)

piinuma

2 months ago

Interesting! This looks great. Would want to see it develop more

wwang4768

2 months ago

Can't wait to try out the library and hear more about the use cases/updates!

Jadiker

2 months ago

Looks interesting! How does it compare to things like Optuna or RayTune or Weights and Biases?

varunkrishnan17

2 months ago

Really appreciate it Jadiker! We're obviously in a similar space, but we think we offer some strong differentiators: (1) Functionality that is more specific to your LLM use cases (for example, being able to easily kick off a RAG Retrieval / Inferencing Experiment). (2) Ability to easily customize and visualize your results - for example, through custom evaluators, and carefully curated heatmaps - both through our SDK and our managed service.

user

2 months ago

[deleted]

altairmn

2 months ago

Our customers use our platform to build low-latency voice and video pipelines. They utilize RAG in voice bots to improve response accuracy.

Is it possible to programmatically interface with Nomadic’s hyperparameter search through an authenticated endpoint, with the ability to generate user-specific tokens for secure access?"

mustafabal

2 months ago

Certainly!

The Nomadic SDK supports 1st-party integrations with various open & closed-source ML/LLM providers. These are done through authenticated endpoints for interfacing securely with your models. Also, as noted in the Custom Evaluation section of our docs (https://docs.nomadicml.com/features/evaluations), you can provide your custom objective_functions and detail your model access logic, which may include custom authentication & access rules. A sample of this is present in our "Basic RAG" cookbook (link: https://colab.research.google.com/drive/1rv2f-qxgoN_eVDFu6Um...).

When integrated with the upcoming Nomadic Workspace, you can obtain your Nomadic API key and sync your local Nomadic models, experiments & experiment results with our managed service. The demo of this model/experiment/experiment result visualizatio is live at https:demo.nomadicml.com, please check it out and let us know your thoughts!

varunkrishnan17

2 months ago

Hi I'm Varun - one of the cofounders of Nomadic!

Been a pleasure to work with Mustafa and Lizzie on this! Hopefully you can solve a pain point I personally have had for so long - how can you easily verify that your model continues to perform well?

rnvarma

2 months ago

For my company, we don't have complex chains, but generally are giving a large context and looking to get structured outputs. Curious how this could help with that? We don't currently use any eval frameworks.

varunkrishnan17

2 months ago

That's a great use case of Nomadic! We support many Eval frameworks in the optimization, but one is a LLM-as-a-Judge model, where you can input custom weights based on your metrics of interest! Adhering to a proper structure could be one of them :-)

r0sh

2 months ago

[flagged]

mlw14

2 months ago

[flagged]