zX41ZdbW
4 days ago
I host a publicly open database with Hacker News data at https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
So you can create any sort of similar services in a single SQL query and an HTML page.
I also hosted it as a publicly accessible data lake, which you can query from everywhere: https://github.com/ClickHouse/ClickHouse/issues/29693#issuec...
It is also updated in real-time.
jstrieb
3 days ago
This is awesome!
I do want to point out that the data in that ClickHouse playground only seems to go as far back as April 6, 2024 according to the query below:
SELECT * FROM hackernews_history ORDER BY update_time ASC LIMIT 10
This is of course still extremely useful, and generous! It just wasn't obvious from the comment that this isn't querying against all Hacker News data.linmer
4 days ago
Thank you for providing this, you are a hero!!! I'm gonna try to do cool stuff with it!
tgv
4 days ago
It probably also got swamped in real-time...
linmer
4 days ago
Do you mean it's not updated? You gotta sort by update_time column. Looks sorted, but you gotta sort it with a query like:
SELECT * FROM hackernews_history
ORDER BY update_time DESC
LIMIT 100;
And yeah, I got that from deepseek because I don't have a brain.
GeoAtreides
4 days ago
oh hey, per HN terms and conditions I license my HN data only to HN. Can you please remove my data from the set? Thank you!
snowwrestler
4 days ago
Not sure if joking, but if this product is not republishing the text of your contributions (to which you hold copyright), you’re probably not going to convince a court to do anything here.
Generally speaking it is not a violation to scrape, index, and analyze web content as long as you don’t republish copyrighted content without a license, or violate access controls. For example: search engine indexes.
moralestapia
4 days ago
By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.
@zX41ZdbW, you can safely ignore this guy.
@GeoAtreides, next time read the actual terms of service before hallucinating.
codingdave
4 days ago
> for any Y Combinator-related purpose
That is actually the key phrase. HN can provide the API, no problem. People can consume the API, no problem.. But I'd ask an attorney if API consumers can then re-release the data for purposes not related to YC. By my reading, they cannot.
moralestapia
4 days ago
You might want to read it again, then:
codingdave
4 days ago
That is about the software, not the data.
moralestapia
4 days ago
While a literal reading of the MIT license refers to "software", many datasets have been released under it.
In particular, if someone releases something that is only a dataset along with an MIT license file, the most reasonable interpretation is that the rights holder intended to release the data under the terms of that license.
I looked for copyright cases involving this specific distinction, whether "data" versus "software" makes a legal difference, but didn’t find anything.
So the question remains open (for you, for me it's pretty clear the dataset is released under MIT).
You might want to sue and find out. It sounds like an interesting experiment.
nairboon
4 days ago
What exactly is released under MIT license?
GeoAtreides
4 days ago
>Y Combinator and its affiliated companies
is zX41ZdbW either?
moralestapia
4 days ago
Oh, now I see my comment might be a bit harsh.
I didn't consider you might now know about:
GeoAtreides
4 days ago
yes, and per HN terms and conditions only YC and YC affiliated (as you quoted) can use the api legally. I don't license my content to anyone else and so it shouldn't be use by anyone else, even if it's available on a free-for-all API (nice move HN, btw).
moralestapia
4 days ago
https://github.com/HackerNews/API/blob/master/LICENSE
It's right there, you just have to click the link I shared ...
GeoAtreides
4 days ago
that's the license for the API, not the content/data the API serves
jupr
4 days ago
>including without limitation the rights to use
'use'...arguably the sole purpose of the API is to fetch the data.
You are grasping at straws.
jrflowers
4 days ago
Steve Carrell yelling “I DECLARE BANKRUPTCY!!” in The Office dot gif
rvba
2 days ago
Is this GDPR territory with fines up to EUR 10 million or 2% of a company’s global annual turnover? Not sure what are the fines for some random person though
pelagicAustral
4 days ago
You must be fun at parties
linmer
4 days ago
Wait, so I have to ask for every single person's permissions to use this data?
uhhhhhhhhhhhhhhhhhhhhhh