IRS open sources its fact graph

218 pointsposted 5 hours ago
by ronbenton

56 Comments

vineyardmike

4 hours ago

Am I being dumb or does this not actually contain the facts about the tax code? Is the /demo/all-facts file supposed to be the “real” facts? Are the XML fact files provided in another location?

It’s pretty cool to see the way that the IRS handles defining and maintaining its tax calculations, but also a machine-readable tax code seems cool too.

ronbenton

4 hours ago

I believe the actual IRS tax code implementation is in a separate repo here: https://github.com/IRS-Public/direct-file while the originally linked repo is the fact graph tooling decoupled from the tax implementation.

MangoToupe

3 hours ago

As far as I am aware, fact just means shared assumption. This seems entirely reasonable for a tax code.

hedayet

3 hours ago

I’ve had frustrating experiences with TurboTax due to its overly complex interface, aggressive data collection under the guise of saving money (which it doesn’t deliver), and a convoluted pricing structure that rivals the IRS’s own complexity.

I hope this initiative is good enough to enable domain experts and good people to build transparent, user-friendly alternatives to challenge TurboTax’s market grip.

Has anyone encountered promising tools or approaches that tackle these pain points?

willis936

3 hours ago

DirectFile was quite good for the one year I was able to use it and addressed your concerns. Don't worry, that's since been taken care of.

https://apnews.com/article/irs-direct-file-tax-returns-free-...

hedayet

12 minutes ago

I can totally see the minds running TurboTax spending a lot of money to make this happen.

j_bum

3 hours ago

Just a heads up, your URL 404’s

willis936

3 hours ago

Thanks. Fixed. I stripped what I thought was a tracker without testing.

somehnguy

2 hours ago

TurboTax’s advertising is borderline fraudulent in my opinion.

Freetaxusa.com (no affiliation) is just as good and legitimately free.

eclipticplane

24 minutes ago

Note: Freetaxusa.com has not done a good job with Form 3921 (ISO grant exercises) and AMT carryover. If you have exercised ISO grants or later sold stock purchased in years in which you paid AMT, do not use freetaxusa.com. You will lose more in tax costs vs. finding a real CPA willing to go through your nuanced math.

babelfish

2 hours ago

FreeTaxUSA is legitimately fantastic!

chamomeal

an hour ago

I love em, I don’t know why anybody uses TurboTax. Good product, generous freemium model with transparent pricing

plun9

an hour ago

Cash App Taxes

Spooky23

2 hours ago

The H&R Block software is better imo.

aliljet

4 hours ago

I wonder how this can be used with an LLM to provide interesting tax advice? I'd love to regularly ask questions of the tax code...

Jach

3 hours ago

patio11's already saved over $2k apparently, maybe he'll do a more formal write-up at some point. (A couple threads here https://x.com/patio11/status/1977425626584711668 and here https://x.com/patio11/status/1978168404793037087 )

kccqzy

an hour ago

I've also saved a bit of money on taxes just by thinking about possible deductions and asking LLMs whether they exist. Of course to actually claim such deductions I need to follow instructions from the IRS/state tax agencies so it's hallucination proof: I'm still manually reading the instructions from the tax agencies to understand how to claim them.

koolba

3 hours ago

Any idea what the actual deduction it supposedly found for private school?

You can pay for K-12 with 529 or Coverdell ESA funds. But neither allows deductions for contributions. Only growth in either is tax free (assuming it’s spent on education expenses).

wrigby

23 minutes ago

Many states allow a state tax deduction for 529 contributions, which could net you up to an 8ish% discount if you’re in a high tax locality (e.g. NYC).

ryandrake

4 hours ago

I guess as long as it's for entertainment purposes only. I'm going to file "actually following tax/legal advice from a potentially hallucinating LLM" under NOPE.

hahahacorn

4 hours ago

The super obvious workflow is to query for an idea in natural English and then verify or ask the LLM to provide the paths it was following.

It begs the question why you assume the parent comment was going to blindly follow the LLMs output.

ronbenton

4 hours ago

Makes me wonder if someone has already trained a model on the tax code. Would be interesting for sure.

astrange

4 hours ago

Model training data already contains all the text there is[0], so they can already answer questions like this (especially with web search), but they aren't good at tax calculations.

https://arxiv.org/abs/2507.16126v1

[0] but it's quite possible the conversion from HTML to text is bad

kevin_thibedeau

3 hours ago

The problem is that the text of US tax code isn't enough to know the correct action to take. The IRS has semi-formal policies based on how it has chosen to interpret the statutes. There are areas of gray that they don't clearly specify. Some of this is in supplementary publications but it still has subjective elements. One example is that settlements for "serious injuries" are regarded as non-taxable income. What constitutes serious is a squishy concept.

cco

3 hours ago

Yeah you'd have to pull in a lot of case law and perform a lot of fine tuning on expert tax advice (you'd probably have to create this training data).

Would be neat (and still legally fraught!).

TZubiri

3 hours ago

You can technically use the language model as a data model. That was the quick hack that started it all, autocomplete on a question produces the answer, yes.

However it's clear that we are moving towards separating the data and the language model. Even base chatgpt is given Search Tools and python Tools instead of producing them by text, the tool call itself may be generated by the model though.

You can for sure use a pure LLM to ask it questions about tax code, but we'll probably see specific tools that only contain canon law and kosher case law, and sources it properly. Y'know instead of halucinating

tallowen

5 hours ago

It's nice to see an open sourced implementation of the US tax code! This was part of the IRS Direct File codebase that allowed people to file their taxes for free, directly with the IRS. It was canceled earlier this year by the Trump administration. It looks like the Fact Graph was already opensourced a couple months ago and that version of the factgraph lives here: https://github.com/IRS-Public/direct-file/tree/main/direct-f...

I'm curious why a second repository was created for this.

infotainment

4 hours ago

I'm still disappointed that they got rid of Direct File, such a promising start...

ronbenton

4 hours ago

Big W for the tax lobby, big L for the rest of us

astrange

4 hours ago

It's still there. They like saying things and not doing them.

https://directfile.irs.gov

So it's always possible they'll just forget to shut it off.

shrinks99

2 hours ago

Having talked at length with one of the developers from 18F at a conference who was fired along with many of the other folks that worked on Direct File, I can assure you that it's no longer being worked on.

The 2024 site remains up so people can file their taxes for that year, but it will no longer be updated.

hk1337

3 hours ago

My eyes read Scala but my brain was thinking Clojure, so I was a bit confused on why there weren’t any parentheses for the first couple of seconds looking at the source.

alberth

4 hours ago

> As a work of the United States Government, this project is in the public domain within the United States.

What does it mean for the license to say "within the US"?

Does this mean this software cannot be used outside the US?

dragonwriter

4 hours ago

> What does it mean for the license to say "within the US"?

It means exactly what it says; you have to read the whole thing (or at least the two sentences before the CC 1.0 Universal text, which is the operative mechanism by which the second sentence is effected), not a fraction of the first sentence.

> Does this mean this software cannot be used outside the US?

No. The license explains two things:

(1) Without any license, this is automatically public domain in the US because it is a federal government work.

(2) The federal government (as the owner of the copyright at creation outside the United States, at least anywhere that applies the common rules underlying the Berne Convention) waives copyright worldwide, and does so via the CC 1.0 Universal declaration (the text of which is then included.)

So, it is, to the extent that this is legally possible, copyright-free globally.

jandrewrogers

3 hours ago

Some countries don't recognize the concept of Public Domain works. In the US, many government works are Public Domain as a matter of law. This creates complications internationally in those countries that don't recognize the legitimacy of Public Domain as a legal concept. Nonetheless, the US still wants to make it available internationally.

To satisfy these conflicting requirements, the US government places it in the Public Domain in the US to satisfy US law. Additionally, they make it available internationally under a license that approximates the intent of Public Domain while still being recognized as a legally valid thing.

ronbenton

4 hours ago

Good question. Copyright laws are country-specific, right? So perhaps it is just trying to be clear that there is no license being asserted outside of the US.

dragonwriter

3 hours ago

Licenses are offered or granted (they are permissions from the copyright holder), not asserted.

bickfordb

4 hours ago

Surprised to learn we still have an IRS

ok123456

5 hours ago

Why would I want to use this over Prolog/Datalog?

NoahZuniga

4 hours ago

Because prolog/datalog don't offer a list of questions that you can ask based on context to calculate someone's US taxes.

ok123456

4 hours ago

That's the database you consult(). Doing income taxes is well-suited to traditional logic programming.

akerl_

4 hours ago

This is a bit like asking "why would I use my car's schematics instead of a wrench".

This is the rules engine's details. You could use it to build the logic and traversal in whatever language you like.

nerdponx

2 hours ago

I think they're asking why you would build a rules engine and fact graph instead of "just" encoding it in Datalog.