Show HN: Yikes imagine trusting Google's documentation

11 pointsposted 3 months ago
by aa_y_ush

19 Comments

yash_5339

3 months ago

This definitely solves a good problem. Company don’t keep generally good confluence docs and documentation. Somehow if there is common source of truth it helps to entire org in a company. But I was wandering if it will be helpful or not for external world because I feel companies usually double check any information before releasing it publicly ..especially related to code base.(Just a thought).

Portoaj

3 months ago

Totally agree - we've found a handful of conflicts on every large company whose public docs we've looked at but a lot of the value is definitely in internal docs where there's no technical writer double checking everything that comes out.

notyawn

3 months ago

Love the idea. Detecting when company truths differ between tools is a tricky technical problem, especially when the wording could differ between each tool.

mtyagi

3 months ago

Makes sense now why integrations sometimes break unexpectedly. Conflicting info in official docs is a real problem

aa_y_ush

3 months ago

have you seen a similar situation before?

mtyagi

3 months ago

Seen it a lot in enterprise environments. Teams maintain parallel Confluence spaces and internal API docs. They drift constantly. The newer page is correct, but search still surfaces the old one first

ayushman_gupta_

3 months ago

Love the concept. Just curious if and how you guys determine which source is correct in case of a conflict

aa_y_ush

3 months ago

We do! An org can define precedence rules, but the engine looks at things like recency, authority, majority voting etc. We also flag criticality to raise manual reviews when needed.

wibbily

3 months ago

Huh? The first "conflict" you list isn't a conflict.

> The snippet from "search docs crawling indexing pause online business" states that adding a Disallow: / rule for Googlebot in robots.txt will keep Googlebot away permanently as long as the rule remains. "search help office hours 2023 june", however, advises against disallowing all crawling via robots.txt, warning that such a file "may remove the website's content, and potentially its URLs, from Google Search." This directly contradicts the claim that a full-disallow rule safely blocks Googlebot without negative consequences, creating a true conflict about the effect and advisability of using a disallow rule to block Googlebot.

If you want to block Googlebot "permanently", why would you expect to stay listed in Search? The first page actually agrees with the second - if you only want to temporarily block crawling, it recommends not blocking Googlebot.

Actually, your last "conflict" is bad too. A 503 fetching robots.txt does stop crawling the site, for at least twelve hours and possibly forever (if other pages return errors). The only crawling Google will continue to do is to keep trying to fetch robots.txt.

I appreciate what you're trying to set up here but 2/4 is a pretty bad record for a demo.

Portoaj

3 months ago

Hi wibbily, your point on the first conflicts is fair enough, we'll update that.

I somewhat disagree with you on the last conflict: you have one document stating pretty clearly that returning a 503 for the robots.txt "blocks all crawling". The other document states there's a 12 hour block after which Google may decide to crawl the other pages (not just the robots.txt like you said).

Thanks for the feedback though, definitely some work to be done on validating the conflicts we surface.

rathinshah

3 months ago

Thisis really good. I wonder if these "truths" propagate anywhere or not.

aa_y_ush

3 months ago

thanks! yes, we enable auto updating documentation, automatic conflict resolution, and accurate search indexing.

SchmitzAndrew

3 months ago

would be cool to extend this to enable auto-creating a pr to update a docs repo!

aa_y_ush

3 months ago

Hey Andrew, that is a 100% in play.