paddy_m
7 days ago
This looks like many LLM assisted data projects which help and are flexible, but aren't repeatable and aren't fast enough to be interactive. Nao is a good execution of the concept.
I built Buckaroo as a data table UI for Jupyter and Pandas/Polars, that first lets you look at the data in a modern performant table with histograms, formatting, and summary stats.
Yesterday I released autocleaning for Buckaroo. This looks at data and heuristically chooses cleaning methods with definite code. This is fast (less than 500ms). Multiple cleaning strategies can be cycled through and you can choose the best approach for your data. For the simple problems we shoudn't need to consult an LLM to do the obvious things.
All of this is open source and extensible.
[1] https://youtube.com/shorts/4Jz-Wgf3YDc
[2] https://github.com/paddymul/buckaroo
[3] https://marimo.io/p/@paddy-mullen/buckaroo-auto-cleaning Live WASM notebook that you can play with - no downloads or installs required
accurrent
6 days ago
Whelps! Ive been building something really similar. Mines nowhere as complete as Buckaroo I really think embedded apps in notebooks can be very useful.
ClaireGz
7 days ago
Thanks for sharing. I like the view you built to visualize the profiling of your data, I think that's indeed key to understand your data.