hackernews client

Britannica11.org – a structured edition of the 1911 Encyclopædia Britannica

352 pointsposted 7 days ago

by ahaspel

(britannica11.org)

134 Comments

ahaspel

7 days ago

I rebuilt the 1911 Encyclopædia Britannica into a clean, structured, navigable site:

https://britannica11.org/

What it does:

– ~37k articles reconstructed from the original volumes – section-level structure (contents are clickable within articles) – cross-references extracted and linked – contributors indexed and searchable – original volume + page references preserved and shown while reading – links to the original scans for each page – ancillary material included (prefaces, abbreviations, etc.) – topic index reproduced and cross-linked – full-text search with article metadata (length, volume, etc.)

Most of the work was in parsing and reconstruction: headings, multi-page articles, tables, math, languages, footnotes, plates, and all the small edge cases that come up in a work like this.

The goal was to make something that feels like the original, but is actually usable.

I’d especially appreciate feedback on: – search quality – navigation (sections, cross-references) – anything that looks structurally off

Happy to answer questions about the pipeline or data model

zozbot234

7 days ago

You might want to add The Reader's Guide to the Encyclopaedia Britannica, PD text available at https://www.gutenberg.org/ebooks/74039 and scans at https://archive.org/details/readersguidetoen00londuoft - It would fit naturally with the Ancillary material that includes the topic-based index.

ahaspel

7 days ago

It would indeed. I will see about working this in, it's highly pertinent.

user

4 days ago

[deleted]

ahaspel

4 days ago

The Reader's Guide has been added to the ancillary material. Thanks for the excellent suggestion.

zozbot234

4 days ago

Thanks for adding this! Do you plan to add back-links in the article pages (and perhaps in contributors pages) pointing to the chapters in the Reader's Guide that mention them, similar to what's done for the subject-based index?

ahaspel

4 days ago

Not a bad idea. I'll see what I can work out on that score. But I imagine the far more common path is from the Guide to the encyclopedia than the reverse.

logicallee

7 days ago

Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data.

zozbot234

7 days ago

Wikisource has the original scans available in the public domain, and their enriched text under CC-BY-SA: https://en.wikisource.org/wiki/EB1911

ahaspel

7 days ago

Thanks!

The underlying text (1911 edition) is public domain, but the structured version here — the parsing, reconstruction, and linking — is something I put together for this site. Right now there isn’t a bulk download available. I’m considering exposing structured access (API or dataset) in some form, but haven’t decided exactly how that will work yet.

If you have a specific use case in mind (especially for training), I’d be interested to hear more.

logicallee

7 days ago

Regarding the specific use case, I was thinking this: I had Gemma 4 (a small but highly capable offline model released by Google) make a public domain cc0 encyclopedia of some core science and technology concepts[1]. I thought it was pretty good.

Separately, I've fine-tuned the Gemma 4 model[2], it was very quick (just 90 seconds), so I think it could be interesting to train it to talk like 1911 Encyclopedia Britannica.

I would use the entries as training data and train it to talk in the same style. There isn't a specific use case for why, I just think it would be interesting. For example, I could see how it writes about modern concepts in the style of 1911 Britannica.

[1] https://stateofutopia.com/encyclopedia/

[2] To talk like a pirate! https://www.youtube.com/live/WuCxWJhrkIM

ahaspel

7 days ago

That’s a fun idea — I can see the appeal of that style.

The underlying text is public domain, but the structured version here is something I put together for the site. I haven’t released a bulk dataset yet.

If you end up experimenting with it, I’d love to hear how it turns out — and I’m still figuring out what structured access might look like.

hallole

7 days ago

I've wanted to do something like this for The Encyclopédie, a hugely relevant text to the Enlightenment. If you ever get around to adding a rough "How I (generally) Made This" section, that'd be appreciated! Site looks great :)

ahaspel

4 days ago

Thanks for the kind words. I've had a few requests for a technical appendix (i.e., "how I built this") and it is in the works.

realityfactchex

7 days ago

> Is there any way to download it? The reason someone might want to download it is for use as training data.

Another reason would be to able to keep running/using it even if the main site were to go down for whatever reason eventually; or, to operate a mirror of it, for redundancy (linking back to the original, of course).

bentley

6 days ago

There’s an escaping issue in tables of contents. See, e.g., “Roosevelt's” in the “United States” article. https://britannica11.org/article/27-0635-united-states-the/u...

ahaspel

4 days ago

This is now fixed, along with several more serious rendering errors in "United States". Thanks a lot for pointing it out.

huijzer

6 days ago

Really nice. Well done.

As a feature request, would it possible for your pipeline to also create an EPUB? Then people can easily access and search through the document even when your site would go down. EPUB by default uses compression so the file size might even not be too bad for the full encyclopedia.

nyc_pizzadev

7 days ago

Very nice. I actually spent a bit of time browsing a few topics, which is something I rarely do these days!

A few things... when I click an article and try to jump to a new topic, the top search box (labeled "Search titles and full text...") doesn't work. Second, when I first came to the site, I was a bit stuck. It took a bit of time to realize I need to click on "Articles" or even "Topics" to start browsing. Not sure why, maybe I expected the image to let me enter the site somehow...?

gnerd00

7 days ago

legal terms question here also -- several major world economies are operating under very different rules regarding datasets and publication rights. I am in the USA / California.. will there be terms for me, given that I am not a giant deep-pockets FAANG, just a book person ? commercial use terms for "small business" scale ?

ahaspel

7 days ago

The 1911 text itself is public domain, so anyone is free to use it.

What I’ve built here is a structured edition — the parsing, reconstruction, linking, indexing, etc. I haven’t published a formal license for that yet.

For casual or small-scale use there’s no issue at all. For bulk use (e.g. dataset / training / redistribution), I’d prefer people get in touch so I can figure out a sensible way to support that.

Kerrick

6 days ago

> What I’ve built here is a structured edition — the parsing, reconstruction, linking, indexing, etc. I haven’t published a formal license for that yet.

If you live in the U.S. I recommend you read No Sweat of the Brow Copyright: https://www.gutenberg.org/help/no_sweat_copyright.html

dessimus

7 days ago

It's been on Project Gutenburg for over 20 years: https://www.gutenberg.org/ebooks/13600

They only release books that are in the public domain.

bentley

6 days ago

> They only release books that are in the public domain.

Not necessarily. Project Gutenberg does provide some works still under US copyright, such as F. P. Walter’s 1999 translation of Twenty Thousand Leagues Under the Seas: https://gutenberg.org/ebooks/2488

gnerd00

6 days ago

better link here https://www.gutenberg.org/ebooks/search/?query=Encyclopaedia...

TremendousJudge

7 days ago

I guess such an old edition is in the public domain

ks2048

6 days ago

Nice job. How about wikipedia-style links to other articles for topics mentioned within another article?

Soluod

7 days ago

[dead]

realityfactchex

7 days ago

Very, very cool. Hats off. I've considered attempting a more limited form of this for years.

For those who don't know, the 1911 Britannica is heralded for several reasons (and rightly criticized for regrettable others), but the most well-known is that it was the last encyclopedia before The Great War, and hence had a good amount of steam/optimism coming from the first and second industrial revolutions and the "Progressive Era", not sullied yet by thoughts of "the war to end all wars".

Trying https://britannica11.org specifically, it quickly found and displayed the article I searched for, chosen (to search for) at random: Portuguese East Africa, at https://britannica11.org/article/22-0177-portuguese-east-afr...

A question/idea for nice-to-haves, most respectfully. I don't know if it would be feasible. It's probably perfect as it is, simply linking to the image-page in unobtrusive text for each section. But I would love an option (emphasis on option) to see the text side by side with the page images. That parallel view would load all of the page images on the same page as the full article text. That way, I could "confirm" or "fact check" the faithfulness of the OCR, and also see the beautiful printing, at once, without opening each page separately and managing the images/windows myself. Most likely, I would use the site to jump to the articles, and read them mainly as images, only switching to the text form to verify what something said, or to copy-paste cleanly, etc. (As it is, initially, I thought I read the original images were available, but had to visit the page three (3!) times before finding where the side-links to them were.) Maybe thumbnails could be a middle-ground option (again, optional) for salience.

Very, very well done. And it's fast!

ahaspel

7 days ago

Thanks — really appreciate that, and glad it worked well for a random article.

That’s a great suggestion. A side-by-side text + page view would be very nice for exactly the reasons you mention (verifying the text and seeing the original layout). I haven’t built that yet, but I’ve considered it.

Also helpful to hear that the links to the scans weren’t immediately obvious — I should probably make them a bit clearer. This may also not be obvious, but you can click the vol:page links in the left margin and go directly to the scan of whatever page you're reading.

Thanks again.

aragonite

7 days ago

> But I would love an option (emphasis on option) to see the text side by side with the page images. ... That way, I could "confirm" or "fact check" the faithfulness of the OCR.

You can already do that on Wikisource. For example, here's p. 658 from the entry on "Molecule":

https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

Also OP: I noticed some fidelity issues in your version (at https://britannica11.org/article/18-0684-s2/molecule). For example parts of the math formula under the line that ends with "the molecules of other kinds" ([1]) are missing (compare [2]). Also, in your version fn. 1 of this article is attached to "as they have always done" ([3]) but it should actually be attached to "Atom" on p. 654 ([4]):

[1] https://britannica11.org/article/18-0684-s2/molecule#:~:text...

[2] https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

[3] https://britannica11.org/article/18-0684-s2/molecule#:~:text...

[4] https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

realityfactchex

7 days ago

That's cool about the WikiSource parallel text+image page view, TIL. Thanks!

As an example flow (since it took a minute to figure out): we can start at https://en.wikisource.org/wiki/1911_Encyclopædia_Britannica then click to navigate/browse volume > section > topic to get to a text page, then click Source tab, then click a Page Number (maybe hunt around for the correct page number), and see the parallel view, text + image. With previous and next page buttons available, retaining the parallel text + image view.

neonscribe

7 days ago

You can discover beliefs that are shocking today, such as this excerpt from the article "Adolescence":

"In the case of girls, let them run, leap and climb with their brothers for the first twelve years or so of life. But as puberty approaches, with all the change, stress and strain dependent thereon, their lives should be appropriately modified. Rest should be enforced during the menstrual periods of these earlier years, and milder, more graduated exercise taken at other times. In the same way all mental strain should be diminished. Instead of pressure being put on a girl’s intellectual education at about this time, as is too often the case, the time devoted to school and books should be diminished. Education should be on broader, more fundamental lines, and much time should be passed in the open air."

yieldcrv

7 days ago

It’s only shocking to write it, or declare it as sacrosanct

Many people practice it, and women’s movements that put most energy on doing the opposite have since dialed back to pointing out that they were fighting for choice, including that choice of not being in a workforce. An option of a “soft life” that is wildly popular, and timeless. People just needed a new way to say it.

If it was culturally supported for men to be subsidized by another, a large percentage of men would immediately take that graduated and intellectually diminished role too. This is not a reliable option and is rare.

If common, it would unironically solve representation imbalances in other fields, since it would no longer be about shoehorning women into them, because enough men would leave on their own. A level of enlightenment still missing from Women in <field> fireside chats at every industry conference worldwide

ahaspel

7 days ago

No doubt. That’s one of the reasons I find the 1911 edition interesting — the authors have more license to express their own opinions, which naturally reflect those current at the time.

genghisjahn

7 days ago

"Not, of course, that there is any magic about the past. People were no cleverer then than they are now; they made as many mistakes as we. But not the same mistakes. They will not flatter us in the errors we are already committing; and their own errors, being now open and palpable, will not endanger us."

On Reading Old Books C. S. Lewis

https://bradleyggreen.com/attachments/article/97/Lewis.On-Re...

zozbot234

7 days ago

You can nowadays paste the text from pretty much anything that's in the public domain into a near-SOTA LLM such as Kimi or GLM and it will give you a pretty nice summary of what it's about in modern language (Extremely useful: the LLM tendency to go overboard on formatting nicely balances out the wall-of-text format from historical publications, which was aimed at saving paper and minimizing manual layout effort), and then gladly tell you about all the things in the historical text that would be absolutely beyond the pale today. (Sometimes you have to nudge it by prompting "How would this text be received today?" or something like it after it has put its nice summary in context, but once you do that it tends to be quite thorough.)

doubletwoyou

7 days ago

I beg of thee, use that brain of yours and read a text that was made scarcely more than a century ago, a blink of an eye in the grand scale of the changes of the linguistic features of English, and interpret it for yourself.

mapontosevenths

7 days ago

I'm in favor of using all the tools available to better yourself, including LLM's. However, for things like this the I would argue that one should first try to understand it on their own.

Sometimes the work is the POINT. We read things like this not just to learn about the past, but for novelty and to exercise our critical thinking powers. To outsource that labor before even trying is like going to the gym and having your butler lift the weights. The weights got lifted, but what was really accomplished?

zozbot234

7 days ago

Historically, these texts were often consumed (especially in formal or semi-formal settings) by either having them read aloud for you or reading them aloud yourself. They were more like a written-down formal speech to be slowly pondered upon than something to be read smoothly and silently on one's own, which is how we now regard almost all texts. There was "labor" involved but that labor was not really about being more literate or exercising more critical thinking: it was simply about slowly recreating in one's mind the kind of broad structural scaffold we now expect to see in a text as a matter of course. It's in fact easier to think critically about a text when its sections and structure are clearly laid out, and having a LLM do this for you is a nice way of avoiding personal tendencies and biases that might lead one to misinterpret what the text is really about.

mapontosevenths

6 days ago

>Historically, these texts were often consumed (especially in formal or semi-formal settings) by either having them read aloud for you or reading them aloud yourself.

In the middle ages this was true, mostly because few people were literate at all and the words didnt have spaces between them. The ability to read silently was regarded as impressive.

By 1911 reading silently to yourself was the expectation of a normal literate adult. Only hillbillies and their ilk could not.

This is a simple text, intended to be legible even to school children of the era. It's also very structured already.

Their contemporary English was a bit different, but not so far removed that you should need assistance.

zozbot234

5 days ago

It was very much the norm in formal and semi-formal gatherings. They didn't have conference talks with PowerPoint slide decks, their own equivalent was to read out articles or papers. This often extended to university-level lectures, in a practice that was arguably carried over from the middle ages as you mention, but was very much still in use.

> It's also very structured already.

It's definitely not very structured by modern standards. The length of paragraphs alone would be described as "wall of text". Again, this was an ordinary practice back in the day, aimed at saving costly paper and reducing the manual effort involved in physically laying out the work on the page. It was far from exceptional: to a first approximation, most texts from the early 20th c. or before will look like that.

simonklitj

7 days ago

Yes, let the LLM bias and misinterpret it instead.

stereolambda

6 days ago

Entertaining to think that "that's too difficult to read for us nowadays" and "look at these unacceptable things" already sound pretty much like some poor Medieval literates who got their hands on Ovid or Lucretius, while under the rule of king Theodoric or something.

I don't have to say I don't question that we are very civilized and powerful.

quamserena

7 days ago

You can also read the text yourself and draw your own conclusions...

BigTTYGothGF

7 days ago

How is that not "modern language"?

smallerize

7 days ago

You didn't really explain what that does for you. Why do you paste it into an LLM?

zozbot234

7 days ago

I'm not sure if you're familiar with public domain texts from around the 19th or early 20th century, but they were not intended to be skimmed or speed-read the way we'd skim a modern text prior to getting into a more attentive close-reading. Even their short magazine articles were actually the near-equivalent to our scholarly papers, and were often read aloud at length in parlor gatherings. So having a LLM split the text into manageable sections for you and provide a hint of what each lengthy wall-of-text paragraph will be about is actually a huge gain in readability.

smallerize

7 days ago

Oh well that was the whole point to me. If I wanted to read something that's not from 1911 I could just do that lol

BigTTYGothGF

7 days ago

The trick is to have a basic level of literacy and then you don't need the machine to chew it up for you like a mother bird.

keane

7 days ago

Mostly from a bit further back but you might enjoy https://earlymoderntexts.com/texts

Dylan16807

6 days ago

So before you were talking about summarizing whole articles and asking the LLM to find the things that would be "beyond the pale", but now you're just suggesting using it to insert paragraph breaks and section headings?

zozbot234

6 days ago

The LLM will easily do both for you. Particularly the thinking it does when constructing the summary generally involves a structured close reading of your text, and you can easily think of it as providing "paragraph breaks and section headings".

Dylan16807

6 days ago

Sure it could do both, but the question is what are you suggesting?

If you're suggesting it alter the text beyond organizing it, people are going to be upset. And your first suggestion sounded like that.

smallerize

6 days ago

I think the word "summarization" might be throwing people off. This is like an expansion.

qmr

6 days ago

> So having a LLM split the text into manageable sections for you and provide a hint of what each lengthy wall-of-text paragraph will be about is actually a huge gain in readability.

Perhaps your attention span needs improvement.

spudlyo

7 days ago

I'm curious how the information is structured under the hood. I just recently learned about how folks in the digital humanities use the XML-TEI format for semantic markup of works like this. I've recently been exploring the Latin-English Lewis & Short dictionary encoded in XML-TEI.

I've had a ton of fun playing learning about BaseX and XQuery to ask questions like "Which classical authors are responsible for writing words that appear only once in the entire corpus (hapax legomena)" or "what are longest hapax words" (usually the funniest ones) and that kind of thing. Shout out to Tufts University for making this available!

I would love to be able to load the 1911 Britannica into BaseX and and see what interesting things I could learn about it via XQuery!

ahaspel

7 days ago

Under the hood it’s not XML-TEI — it’s a relational/data-pipeline approach, with article boundaries, sections, contributors, cross-references, and source-page provenance all reconstructed into structured records. The text itself is public domain, but I haven’t released a bulk structured export yet.

People asking for dataset access has definitely been one of the themes of this thread. I’m taking that seriously. If I do expose it, I’d want to do it in a form that preserves the structure and doesn't just dump plain text.

shantara

7 days ago

Interesting how different both the tone and the structure of the articles are compared to the modern texts.

Take the article about Copenhagen as an example: https://britannica11.org/article/07-0111-copenhagen/copenhag... The geography and key points of interest are described very accurately, but the authors aren’t shy about inserting emotionally charged adjectives and personal options on what they consider interesting or curious. Also, the huge portion about the Battle of Copenhagen in the bottom is a complete departure and shifts the genre from a geographical description to the shot-per-shot narration of a naval battle.

ahaspel

7 days ago

Yes, that’s one of the things I like most about it. The articles have a personal tone and are less homogenized.

You get that mix of geography, history, and sometimes quite opinionated description all in one place, which makes them much more readable, in my view. My introduction to this version discusses this and other related matters: https://britannica11.org/about.html

krige

6 days ago

Looking at Victor Hugo's entry I immediately spotted this

> After yet another three years’ space the author of La Légende des siècles reappeared as the author of Les Misérables, the greatest epic and dramatic work of fiction ever created or conceived: the epic of a soul transfigured and redeemed, purified by heroism and glorified through suffering; the tragedy and the comedy of life at its darkest and its brightest, of humanity at its best and at its worst.

Sure sounds like someone was a (fellow) fan.

robin_reala

7 days ago

A seriously trivial bug report, but the font you’ve chosen doesn’t support ℔, making articles like https://britannica11.org/article/22-0688-s2/putting_the_shot look odd. Potentially might be worth rewriting ℔ to a more normal (these days) lb?

ahaspel

7 days ago

Good catch — thanks. That’s a font coverage issue. I’ll either swap in a fallback font for missing glyphs or normalize those cases. This only sounds trivial, this project is full of items like that.

rustyhancock

7 days ago

I spent ages trying to work out if it would be possible to find a copy of the 2021 Encarta or Britannica.

Pre LLM And post COVID and perhaps the best we can hope for before AI taints all the info.

One of my prized possessions as a child was a CDROM based encyclopedia (well before the internet was common). I don't know why I liked it so much but on a rainy afternoon I'd kick up some of my favourite articles and read and learn more of them.

tezza

7 days ago

2004: https://archive.org/details/britannica-2004

2009: https://archive.org/details/britannica-multimedia-dvd-2009-d...

2012: https://archive.org/details/britannica-dvd_20230709

2013: https://archive.org/details/encyclopedia-britannica-dvd-2013

ahaspel

7 days ago

I know exactly what you mean — I had the same experience with CD-ROM encyclopedias. There’s something about just browsing and falling into articles that’s hard to replicate.

Part of the motivation here was to bring that kind of exploration back, but with the original 1911 text and structure.

pawsocks

7 days ago

Do you happen to use a language model to translate or format your comments?

ahaspel

7 days ago

Just me. I spent a lot of time thinking about this, so I like talking about it.

hoppyhoppy2

7 days ago

The final release of Encarta was in 2009.

entrepy123

7 days ago

Bravo. People who like the 1911 Encyclopedia Britannica might like https://OldEncyc.com to dig into the volumes (by letter range) of 22 editions of old encyclopedias dated 1728-1926 (though not searchable like the OP).

ahaspel

7 days ago

I hadn’t seen that before, it’s a great collection. I like the breadth across editions.

yodon

7 days ago

The most important entry I found in my physical copy of the 1911 Britannica is for Eavesdropping[0], detailing the original historical origins of the term and how it was thought about just before our modern era.

> Though the offence of eavesdropping still exists at common law, there is no modern instance of a prosecution or indictment.

Thanks for posting this resource, I've often wanted to share a link to this and other entries.

[0]https://britannica11.org/article/08-0867-eavesdrip/eavesdrip...

ahaspel

7 days ago

That’s exactly the use case I had in mind. The 11th is full of gems like that, but they’ve never been easy to point people to.

doctor_blood

7 days ago

Small world - I'm currently cleaning up scans of the EB 9th edition to put it online as a mediawiki site; I'm including all the illustrations and plates so I'm only a third of the way through.

I've been testing different OCR tools and so far I've been the most impressed with paddleOCR - it correctly split the text columns, labled the illustrations, and noted the maragin text.

Still, it's not perfect, so I'm having to hand-edit some tables. I plan to put the source pages online as well so you can switch between the scanned page and the electronic text.

doctor_blood

7 days ago

For those unfamiliar, the 1875 9th ed. was known as the scholar's edition due to how many eminent persons had contributed; it's a fascinating snapshot of the late 1800s.

Other material that would be fun to put online in a hyperlinked and indexed format include geographic and medical atlases and the Baedeker travel guides.

ahaspel

7 days ago

I'm looking forward to it. The 9th is great in its own right and a lot of it is in the 11th. Alfred Newton's nearly 200 articles on bird species and a few classic essays by Macaulay come to mind offhand.

xnobodyx

6 days ago

re: OCR of tables, would the work done on https://github.com/tabulapdf/tabula / https://tabula.technology/ be relevant?

8bitsrule

6 days ago

Some parts are ... amusing to read. For example the article on stars [0]...

"anything approaching a uniform distribution of the stars cannot extend Limits of the Universe. indefinitely. It can be shown that, if the density of distribution of the stars through infinite space is nowhere less than a certain limit (which may be as small as we please), the total amount of light received from them (assuming that there is no absorption of light in space) would be infinitely great, so that the background of the sky would shine with a. dazzling brilliancy ...."

[0] https://britannica11.org/article/25-0806-star/star#section-1...

Aransentin

6 days ago

The article about the Sun was quite fun; even though they didn't know about fusion, the article dismisses most theories about how it could generate such a large amount of energy (like chemical combustion or gravitational contraction).

IT says the most likely cause is some sort of "rearrangement of the structure of the elements' atoms" and "supposing a gaseous nebula is destined to condense into a sun, the elementary matter of which it is composed will develop in the process into our known terrestrial and solar elements, parting with energy as it does so". Pretty much as bang on as one could reasonably be given what they knew.

tim333

6 days ago

Searching for "computer" the only one was one Chauncey Wright, American philosopher and mathematician, who became became computer to the American Ephemeris and Nautical Almanac.

https://britannica11.org/article/28-0872-wright-chauncey/wri...

Times change.

8bitsrule

5 days ago

When I thought I recalled 'computer' used to be a job title, I found this on Wikipedia:

https://en.wikipedia.org/wiki/Computer_(occupation)

which says that "(the first known written reference dates from 1613)... often women from the late nineteenth century onwards, were used to undertake long and often tedious calculations; the work was divided so that this could be done in parallel."

8bitsrule

5 days ago

failingforward

6 days ago

Sounds like https://en.wikipedia.org/wiki/Olbers%27_paradox

Aardwolf

7 days ago

Very neat!

Some bugs I noticed:

Searching for Zurich allows you to go to the article for the canton of Zurich, not the city. Clicking the link "Zürich (city)" inside of this article, opens this same article again about the canton, rather than opening the actual article for the city

When viewing an article, the search for articles (leftmost search box) doesn't seem to work at all for me (in Firefox). When being on the main page, it does work

There's a small clickable 'home' button on the right, but muscle memory from how other websites work makes me expect that clicking the big title "Encyclopædia Britannica, 11th Edition" on the top left also goes to home

ahaspel

7 days ago

Excellent points. There are indeed two Zurich articles. One way to get to the city is to search for Zurich and open the second one, which goes to the city directly. The xref in Zurich (canton) is indeed a disambiguation bug (identically named articles); thanks for catching that.

I haven't tested the article search box on the article viewer in Firefox. I'll look into that as well.

Making the title linkable is a great idea and it will be implemented shortly. Thanks for catching all of this.

keane

7 days ago

Beautiful work! This is an amazing resource to have online. Reminds me a little of greensdictofslang.com or of Webster’s 1913, a perennial HN favorite: https://news.ycombinator.com/item?id=29733648

timciep

7 days ago

These projects came to mind for me, as well.

I actually took a recent crack at making a more modern website for Websters 1913: https://websters1913.timcieplowski.com/

masfuerte

7 days ago

That's lovely. I do like a site that just works without a pile of unnecessary js guff.

There's a bit of funkiness with "<?/" appearing here:

https://websters1913.timcieplowski.com/word/mathematic/

ahaspel

7 days ago

That’s high praise. Those are both great projects and this one is definitely in the same spirit.

peterldowns

7 days ago

I've been meaning to build ~exactly this experience, but for the 1952 Encyclopedia Brittanica Great Books of the World collection and its experimental index Syntopicon [0]. Would love to know more about how you OCR'd or otherwise ingested and parsed the raw material. I have a physical copy of the books, and I found some samizdat raw-image scans and started working on a custom OCR pipeline, but wondering if maybe I could learn from your approach...

[0] https://en.wikipedia.org/wiki/A_Syntopicon

ahaspel

7 days ago

I'm familiar with the Synopticon, which would be fun to structure.

I didn’t do OCR myself, except for the topic index and to fill in a few gaps. I started from existing Wikisource text and then built a pipeline around that: cleaning (headers, hyphenation, etc.), detecting article boundaries, reconstructing sections, and linking things back to the original page images. Most of the effort went into rendering the complex layouts, and handling the cross-linking, not the initial ingestion.

Glad to go into more detail if you’re interested, but that’s the gist of it.

xnobodyx

6 days ago

would love to hear more details. are you familiar with the semantic lab at pratt's work - https://semlab.io/projects (also see https://tools.semlab.io )?

peterldowns

7 days ago

Ah ok thanks very much!

zozbot234

7 days ago

That collection is not in the public domain, AIUI? You might be able to do it for the Harvard Classics, which has a nice collection-wide index of terms. https://en.wikisource.org/wiki/The_Harvard_Classics has links to the scans.

peterldowns

7 days ago

Oh no not in the public domain, I better not build something cool!

indigodaddy

7 days ago

Just as a random data point, I searched for Genghis and nothing came up. Was there not much knowledge on Genghis Khan in 1911 I wonder?

ahaspel

7 days ago

Try Jenghiz Khan. That's how they used to spell it then. Or just plain Khan and scroll the results.

ks2048

7 days ago

Yes. Here is the article,

https://britannica11.org/article/15-0341-jenghiz-khan/jenghi...

user

6 days ago

[deleted]

indigodaddy

7 days ago

Interesting! Thanks

user

6 days ago

[deleted]

ternaryoperator

6 days ago

I have the hard copy of this edition and it does contain some curious things.

For example, if you look up "boiling." You might expect to read about what happens to a liquid when it's heated to a certain temperature, or perhaps a way of cooking foods, or sterilizing equipment. But the entry covers none of those. Instead, the only entry for boiling describes a punishment for persons convicting of poisoning who were, in England, dipped into a large cauldron of boiling water.

And, in the ways that violence and torture were wantonly reveled in centuries ago, they wouldn't just submerge the criminal and let him die there. Instead, they would lower him into the boiling water for a while and then pull him out. They'd repeat the process until eventually they finally killed him. That is the EB 11 ed entry for boiling. Yow!

arichard123

7 days ago

This is good. I picked up a copy of the encyclopedia britannica from 1973 and quite enjoy browsing that rather than the internet. The articles seem well written, and as mentioned here, you have the fact and the history and everything all mixed in to some articles, and it's super interesting.

I highly recommend getting an old set of volumes.

bentley

6 days ago

The first article I looked up was New Mexico, because I knew, as does anyone familiar with New Mexico history, that it became a state in January 1912 (before which it was a territory). Arizona also became a state, in February. I was surprised to find both described as states of the United States in this 1911 encyclopedia. I suppose the editors just made a confident guess? The last sentence of both articles is, “In June 1910 the President approved an enabling act providing for the admission of Arizona and New Mexico as separate states.”

romperstomper

5 days ago

Sometimes cross-references don't have links, is this expected? For example, here https://britannica11.org/article/26-0828-theosophy/theosophy there is no links for BOEHME and SWEDENBORG but all other references have the links.

merryocha

6 days ago

A book in my collection that I love is The Treasury of the Encyclopaedia Britannica, a greatest hits collection from Britannica over the years. It has articles written by many famous names like James Maxwell, T.E. Lawrence, Einstein, JFK, Arthur Koestler, and many more. I love how Britannica articles are written by a single expert, giving the articles a bit of bias, humor, and character.

hax0ron3

6 days ago

Nice. Reading old books is a great way to be exposed to ways of thinking that have fallen out of fashion - some for (in my opinion) good reason, such as having been discovered to be incorrect or genuinely immoral, some for (in my opinion) bad reason, such as having become "politically incorrect", and some simply because they were forgotten.

But whatever the reason is why the ideas have fallen out of fashion, it can broaden the mind to encounter them.

orsenthil

6 days ago

Love this. I couldn't have imagined the quality of this Encyclopædia with this form that you have presented. Plus, the contributors! I love human race.

lobster45

7 days ago

Reading medical texts from 1911 is a great way to see how far psychiatry has advanced. there was a widespread medical and societal belief that masturbation was harmful to physical and mental health. https://britannica11.org/article/14-0628-insanity/insanity?q...

Mr_Minderbinder

6 days ago

In those days circumcision was the cure.

stephen_g

21 hours ago

I don’t get how anyone thought that would work… From (medically necessitated) experience it doesn’t really make much difference in that respect!

Mr_Minderbinder

9 hours ago

> I don’t get how anyone thought that would work…

The original reasoning can be found in medical texts from the mid to late 19th century when it was first discussed. My recollection is not strong enough to repeat it with confidence but it was to the effect of: the removal of the foreskin should restrict the ease of movement* and therefore restrict the ease of “self-abuse” as it was termed then.

* the foreskin functions as a sleeve which eases movement and reduces friction during sexual acts.

romperstomper

5 days ago

Something is wrong with images here https://britannica11.org/article/22-0043-s2/polymethylenes

kiproping

7 days ago

Excellent resource. Small bug to report, the table here is broken (BANTU NEGROIDS section) https://britannica11.org/article/01-0358-africa/africa#secti.... Its quite fascinating to read what they thought about Africans as an African.

ahaspel

6 days ago

Thanks, nice catch. The tables can be tricky and I appreciate the heads-up on this markup leak. It will be corrected shortly.

golem14

6 days ago

It's very insightful to look up fission, fusion, atom and find yourself ... definitely before the great war.

As a time travel machine for the mind, this is great!

It would also be an invaluable resource for any Dungeon Master aspiring to lead a campaign at the end of the 19th century (Sherlock Holmes, or PG Wodehouse style, as it were), as doubtless many here are ...

lkm0

6 days ago

A very simple addition that makes casual browsing much more fun is to add a menu with adjacent articles, as is done in this reconstruction of Littré's 19th century french dictionary: https://www.littre.org/ (see mots voisins)

ahaspel

6 days ago

I wanted to let everyone know that article search from articles is now working properly again. A path problem. Apologies.

user

6 days ago

[deleted]

user

6 days ago

[deleted]

zeckalpha

6 days ago

Note that the subsequent 12th edition (1922) may be in public domain in your jurisdiction.

SilentM68

6 days ago

Do you have access to the original 1958 Edition of The Encyclopedia Americana Volume 2?

Just to confirm if this is real or Memorex or just another hoax?

https://imgbox.com/f7MDjbKs

throw253245235

7 days ago

Interesting that the articles on Euler and Gauss are so much shorter than the ones on Kant and Schopenhauer. I guess authors of Britannica were not very interested in mathematics.

romperstomper

5 days ago

Great project, thanks! Is there a chance to add a "random article" feature similar to which Wikipedia has? :)

shevy-java

7 days ago

Already better than all AI wikipedias.

Quitschquat

7 days ago

Read the sections on nebula since this book predates the discovery of galaxies

bronlund

7 days ago

Interesting article about aether in there :)

ahmedfromtunis

7 days ago

No entry on the Great War? Really?!!!

Just kidding, of course. This is incredible and surprisingly nostalgic. Reading some of the entries took me right back to being a kid huddled in my room for hours pouring over an encyclopedia or even the dictionary.

And I still vividly remember the rush of installing Encarta for the first time on the family PC.

I couldn't believe that I, a mere kid, have now access to iconic historical footage and that I can watch anytime I felt like it. I can't describe how amazingly cool that felt at the time! It still gives me a hit of endorphins when I remember it today.

ahaspel

7 days ago

I feel exactly the same way about encyclopedias and dictionaries. And Encarta really was amazing. You'd be surprised how much modern criticism of the 11th amounts to "no entry on the Great War", except in earnest.

ahmedfromtunis

7 days ago

Thanks a lot for this incredible gem!

By the way, it looks like there's a bug where I can't search for articles when already inside one. To do so, I need to go back to home > articles and then search.

ahaspel

7 days ago

If you're reading an article, just go to the top and type in the left-hand search box. That will search for articles as well as text within articles. The right-hand box searches the text of the article you're reading.

ChrisArchitect

7 days ago

Please with the beige serif-font vibecoded sites......

est

6 days ago

Now someone please revive Microsoft Encarta ...

sammy2255

6 days ago

This encyclopedia is racist:

Mentally the negro is inferior to the white, The remark of F. Manetta, made after a long study of the negro in America, may be taken as generally true of the whole race: “the negro children were sharp, intelligent and full of vivacity, but on approaching the adult period a gradual change set in. The intellect seemed to become clouded, animation giving place to a sort of lethargy, briskness yielding to indolence.