Show HN: Sourcetable – AI Spreadsheet and Data Platform

130 pointsposted a day ago
by mceoin

Item id: 41590682

63 Comments

primitivesuave

15 hours ago

This is incredible. I uploaded a CSV with ~6000 rows containing campaign finance data for a particularly corrupt local politician and asked "what was the total contributed amount in [year]". Not only did it produce the correct answer (in around the same amount of time it took me to calculate it on my end) but it also seemed to understand that the spreadsheet was related to campaign finance in the "summary" portion of the response.

The most useful aspect was that I could ask "what was the total contributed amount between January and June of 2020" and get an accurate answer for that as well. Since the date column is provided as an "MM/DD/YYYY" string, I would normally have to do some boilerplate work to sanitize this.

For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output. But overall I was truly blown away that something like this is even possible for a small team to build.

mceoin

15 hours ago

> For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output.

Insert it as a table on the page (you should see a button), it will then print the whole table result from that query into the spreadsheet. Also, you can check the SQL first and validate it, then print to table after that.

Try a few million rows and see what happens!

dioptre

15 hours ago

Also keep an eye out on the limit - we default to 10,000 to keep it snappy but if you want to make it larger its a click away. The "summarize table" button should auto limit to 1B+ rows.

jadbox

3 hours ago

Can Office 365 Copilot do this too?

mmckelvy

17 hours ago

Interesting. I think you're on to something here. I fully agree that a combination of spreadsheets and SQL are the ideal tools for data analysis -- not a SaaS GUI.

> Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

With the rise of AI, companies like Tembo that help you set up all in one databases, and tools like this, I'm increasingly of the mind that many companies should start bringing things like analytics and observability in-house. I don't see the need to pay Mixpanel or Datadog thousands of dollars per month when a self-serve solution that relies on tried and true tech is more or less at your fingertips.

threeseed

17 hours ago

Minus the AI part tools like this have existed for decades.

And companies are not dumping their SaaS tools and switching to them en masse.

Because (a) data silos have dramatically increased pushing dreams of a unified data schema out of reach, (b) technology stacks have become far more complex necessitating tools like Datadog and (c) competition is stronger than ever meaning that skimping on paying for tools like MixPanel is often short sighted and counter productive.

Companies like this will do fine and there will be always be a demand for them especially in the SMB space. But there simply isn't the business value in bringing a lot of analytics and observability in-house in almost all cases.

mmckelvy

15 hours ago

Not yet. But in the analytics case, suppose you could build a tool that collected data on your own infrastructure, allowed you to write plain SQL against a PostgreSQL database to get whatever analytics data you need, had an AI-driven text-to-SQL option so non-technical users could get whatever analytics data _they_ need, and output everything to a universal interface, i.e. a spreadsheet? No vendor flavored DSL, GUI, or workflows to learn. That product would be tough to beat. It wasn't built in the past because it was hard. But with AI and something like Tembo or Timescale, is it actually hard anymore?

theGnuMe

an hour ago

Managed services are useful.

mceoin

17 hours ago

Agree. A general thesis I have is that the API-ification of the web fragmented business information, and with every new SaaS tool we fragment our company's data further. The trend at all company sizes is to be increasingly analytical, but for SMBs it's too hard to get access to your data (mainly due to technical limitations). So it makes sense to centralize data somewhere, and we think that somewhere is inside the data tool that everyone actually uses: the spreadsheet.

Many other advantages of this data centralization too. Data + spreadsheets + compute is a nice application base for agents.

threeseed

16 hours ago

> So it makes sense to centralize data somewhere

Modelling and integrating datasets that you don't own is extremely hard.

Shopify for example updates their API every 3 months.

How much time and money do you think an SMB can afford to spend on this before the ROI becomes so poor that they abandon it entirely.

mceoin

16 hours ago

There is a separate answer here which is many (most?) SMBs can't afford technical folk, so the ability integrate data at all, talk to it and model it (using SQL or AI), is already a big step forward for them.

My personal use case tends to involve a lot of Postgres data and transaction events for my reporting. We see "simple" businesses like parts manufacturers, print shops, vineyards, etc. all doing something similar.

mceoin

16 hours ago

Yes some integrations are excellent (hey Stripe : ), some are terrible (no comment on who). We're finding that LLMs increasingly able to fill the gap around organizing data schema for that initial data prep piece where someone has to build the data tables that others consume. To your specific question/problem set, when a schema updates you end up with a "fuzzy schema matching problem"; we are solving that separately anyways for a separate product feature requirement.

Strong note here that the current state of technology is much better for SMB scale data and not enterprise scale data with messy schemas.

jacobjellyfish

7 hours ago

This looks great! Well done. My concern is that there's not a single mention of data privacy. Which is is a red flag for any one coming from an enterprise world. Get that sorted and I'd consider using your tool for actual work.

aerosmile

18 hours ago

It’s amazing that Microsoft - given their focus on AI and decades of experience in spreadsheets - doesn’t offer this type of functionality. Corporate bureaucracy vs startup agility!

mceoin

18 hours ago

At risk of poking the bear, they should have done this decades ago. Except for LLMs they have had everything they needed to bundle this stack into a single product solution; this would be much better for users.

And yes! We're definitely of the opinion that as a startup we can outcompete the two trillion-dollar death stars when it comes to product experience. AI is a platform shift!

luke-stanley

7 hours ago

Actually Microsoft do now have Copilot and Python in Excel recently released last week. Maybe a bit slow.

temac

2 hours ago

I dont know if the Python in Excel architecture as changed but last time i saw it, it was insane and unusable for me (data sent to MS servers where a linux container executes python: you need both a subscription and that the data in question not be regulated)

mceoin

2 hours ago

Platform wise, the equivalent would be if they combined Excel, PowerBI, Data Factory and Azure into a single tool.

Technically you can combine these, but it’s a cumbersome experience and difficult for most people. Vertically integrating their equivalents simplifies things a lot.

(Small note: we don’t currently offer Python to users but likely will at some point)

zeptian

2 hours ago

very nice app. just the front-end browser component alone is super-slick. but expecting users to bring their data to your platform is a barrier to adoption.

sim_123

a day ago

This is amazing. I’ve been scouting for such a solution as we’ve outgrown excel. Giving it a spin

mceoin

a day ago

A very common use case we see is SMBs having outgrown their spreadsheet but not wanting to move to a full-blown BI tool. They want the power, but not the change in interface/medium.

I didn't go into details above but a nice thing is that we leverage cloud compute and storage, so you can query billion-row data in sub-second time. (Courtesy of Duck!)

Brajeshwar

16 hours ago

You might want to check who is blacklisting you and request to unblock. AdGuard blocked sourcetable.com as "Scam".

https://www.dropbox.com/scl/fi/np92pyo0eb0zphysc9wwz/screens...

bschmidt1

4 hours ago

It's likely Sourcetable's CTO who has a bad reputation online (I've met him he's a dick).

SoulAuctioneer

15 hours ago

Thanks for reporting! Taking a look now.

dioptre

15 hours ago

Hey do you mind removing this comment? Seems it might have caused us to be blacklisted?

Brajeshwar

14 hours ago

I'm sorry, I've missed the "delete" window. But may I know how a comment here (after it being blacklisted) about it being blacklisted will be the reason to be blacklisted?

mceoin

11 hours ago

¯\_(ツ)_/¯ deciphering magic algorithms.

Very much appreciate the bug report. Thank you!

longstaff2009

15 hours ago

Thats a spicy example dataset!

I like that it's able to infer information from the context of the cells, e.g. being able to run a query across continents when the data only contains the country.

Being able to ask it to interpret the results is helpful, it would be cool if it automatically told you if there was enough data to have statistical significance in the conclusions it was presenting.

mceoin

15 hours ago

You may see that we try to suggest follow-up questions or question improvements where we think better context-in will result in a better result-out.

Curious what will happen if you modify the question to be more explicit?

I have seen that PMs and data-trained folk tend to be very articulate in asking for exactly what they want and that tends to lead to significantly better LLM responses.

yawnxyz

a day ago

> Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

I'm already using Retool for these kinds of tasks- what does sourcetable do that I can't already do with Retool?

edit: also, did you build your own spreadsheet engine, or use an off-the-shelf one? (also will it be open source ;P)

mceoin

a day ago

Category Comparison (table-based solutions): "How are you different than Retool/Airtable/Coda/Notion/Zapier Tables, etc."

The primary difference vs table-based solutions is that Sourcetable is a spreadsheet in the common sense of the word, similar to Excel and Sheets. We have A1 notation and cell-based referencing. This is what most users expect, and this flexibility/familiarity has a big impact on the breadth of users and use cases within a team.

The formula referencing system of these table-based solutions is usually very limited both to columns/rows (not cells), and is a set of SQL-based queries which are much more limited than that 500+ formulas and functions spreadsheet users commonly expect.

Retool specifically: I tend to think of Retool as a lightweight custom-ERP software system, whereas Sourcetable more like Excel + PowerBI + Data Warehouse, so we will generally be much stronger for reporting and analysis. We definitely have some overlap in potential users since technical operators should like us both. FWIW - Retool is an excellent product.

dioptre

a day ago

Hi I'm Andy, Cofounder & CTO @ Sourcetable.

We use a heavily modified licensed engine that prevents us from open sourcing everything (for now). We have plans to open source our agentic/plugin framework, and other parts of the system. We also have a strong ethos of contributing back to open source where we can (contributed back to Arrow, DuckDB etc.).

I'd also add that while everyone knows how to use and work with spreadsheets, we also provide a SQL layer on top that you can use to query data sources as an advanced user (we developed a nomenclature to work within sheets/across sheets/files/our data-warehouse). This allows more technical users to work side-by-side in the same environment as non-technical users without crossing pythonic or reporting boundaries.

On top of this, the AI assistant can answer most of the questions you might have of all this data.

I think as ML gets more sophisticated, we will in general need to be less technical. The "tooling" might even disappear, but we will still need something to communicate important data centric decisions. Whether you like it or not spreadsheets are the foundation of human research and operations and have been for thousands of years, and I feel humanity will need less complicated "tools" and we will keep to our roots.

topicseed

8 hours ago

An improved and more interactive version of Google Sheets' explore tab. Looks good!

sammysidhu

16 hours ago

Congrats on the launch! It's been great working with you from the Daft side

alooPotato

18 hours ago

Cool.

How did you build so many integrations so fast?

Selfishly, would love to see Streak (CRM) integration as well.

mceoin

18 hours ago

Mostly Fivetran, a little Airbyte, and a few custom integrations. Would love to add Streak (can you get it into Fivetran? We can usually crank those integrations out within an hour.)

mceoin

18 hours ago

p.s. I was a massive Streak user at a previous (sales-driven) startup. Big fan!

samymov

15 hours ago

Huge congrats on the Launch ! You guys crushed it with all the thought and hustle behind creating such a valuable tool. Wishing you nothing but success on the ride ahead!

djbiggs

18 hours ago

Awesome, have you got any mining specific worked examples or spatial examples? Thinking about lidar point clouds and running deltas for stock pile management. Looking at building a new mine and typically there at any mine site there are excel macros which might take an hour to run embeded in the operations. Often developed by older engineers, who will default to excel. Any suggestions on how best drive technical user adoption (asides from dropping it on the kids in the engineering deparments, can't wait that long) ?

dioptre

17 hours ago

The underlying datatypes we support in our data-warehouse support 3d and 4d data. So we can do vector queries on these and do transformations over different spaces. I think given what you need we can put your data in our data-warehouse, and then present it to the older engineers in an excel format with 3d plotting. We might want to chat about the details though, give me a holler at andrew@sourcetable.com

mceoin

17 hours ago

Yes actually! My cousin is a mining engineer so I spent a bunch of time playing around with mining data during testing. Turns out all New South Wales government data is public. Right now you can talk to any CSV or database using LLMs. I've also played around with a bunch of marine biology datasets too!

(p.s. I think Andrew, CTO, is going to jump in here as he has more experience in this space.)

mceoin

17 hours ago

Can you email me -- eoin@sourcetable.com -- more about the Excel macros? This might be easy to help you out with agents. A lot of compute-intensive stuff that takes ages in Excel is nearly instant in Sourcetable because we are leveraging cloud compute, but it really depends on your use case.

mg1973

16 hours ago

Brilliant work team, great to see this being launched.

HeralFacker

17 hours ago

What external checks are included to verify the chatbot output?

SoulAuctioneer

17 hours ago

Wherever possible, the chatbot output is deterministic, in that to answer a query, we're realtime generating and running code or SQL against your data. Our LLM orchestrates that, and finally evaluates whether the output correctly and adequately answers the question.

We also extensively use synthetic data and examples to guide and constrain our models.

Another way we're ensuring good-quality output is to ensure good-quality _input_ -- by enriching the detail and specificity of the user's question, and asking the user to disambiguate when we determine the question is too broad.

smcleod

16 hours ago

Are you open sourcing the product for non-commercial use?

mceoin

16 hours ago

Would love to but unfortunately there are pieces we can't open-source for various reasons. We'll open source bits and pieces over time, and generally are excited to start blogging about AI & technical learnings now that the product is out of stealth mode.

Small plug for the analytics tracker we are using which Andrew (CTO) built and is open source: https://github.com/sfproductlabs/tracker

escot

a day ago

Very cool. It would be great to have auto complete across cells.

mceoin

a day ago

Yes we don't yet have the full auto-suggest magic that Sheets offers, but you can click-drag for auto-complete the same way Excel offers.

We released Sourcetable today with the AI chatbot & AI data analysis features, but a very limited cell-based AI (only "summarize" and "fix formula"). We'll be releasing a big AI-based magic-autofill solution in the coming weeks.

SMAAART

17 hours ago

Looks interesting, commenting so that I can remember.

_hfqa

a day ago

Congrats on the launch! It’s wild to see AI stepping into spreadsheets like this. Pretty soon there won’t be a part of our workflow AI hasn’t touched.

mceoin

a day ago

Thanks _hfqa! We think there's massive potential here. It's a big platform shift, and spreadsheets weren't really impacted by the mobile or cloud compute waves, so it's a space long-overdue for disruption. (The last shift was back when Google Sheets took spreadsheets to the browser 17 years ago!!)

petergreen

18 hours ago

great product. congrats on the launch

halfcat

15 hours ago

I always wonder where these spreadsheet/database apps will land. Usually it falls flat for one of a few reasons I’ve observed:

- Fundamental gap in skillset, in that if you want to have ultimate flexibility to slice and dice the data and report on whatever you’re seeking, you’ve ultimately needed SQL skills in the past (which isn’t rocket science, but also isn’t something most accounting users can run with on their own).

- Fundamental desire of users to work with unstructured data. This goes back at least as far as Excel vs Lotus Improv in the early 90’s. Joel Spolsky talked about this, how they were terrified that Lotus Improv was going to kill Excel, because Improv was built to work with structured data, which users could then query and ask questions of to get any answer they want. But it turned out, as they observed people using both apps, there were zero users that used 100% normalized, structure data.

- Imperfect translation between spreadsheet and database. I’ve seen these work well 99.9% of the time, but at some point a column gets added or something that throws off formulas. And 0.1% error is basically catastrophic in accounting.

Maybe LLMs help overcome these challenges. Wish you luck.

SoulAuctioneer

15 hours ago

Agree with you, and we're definitely trying to thread the needle!

We're generating the SQL to answer natural language questions, so folks can just get answers and results tables if that's all they need, with the option for power users to fiddle with the SQL either directly or via a query editor GUI.

There's a ton of use cases for working with unstructured and semi-structured data and that's coming down the pipe!

mceoin

15 hours ago

This is 100% the correct insight in my experience.

TL;DR, most technical people massively overestimate the technical / data abilities of regular spreadsheet users. We find simple use cases are best, and with each new LLM release the UX around more complex data improves significantly.

The reason we chose to build as a full-blown spreadsheet instead of just a table-based solution was that we saw that most people want the flexibility of a regular spreadsheet, but access to their (structured) business data. Table-based solutions wedge you into AI and you can never get out of that.