sundarurfriend
5 days ago
> Indo-European languages: a language family native to the overwhelming majority of Europe, the Iranian plateau, and the northern Indian subcontinent. Widely spoken indo-european languages includes English, French, Portuguese, Russian, Dutch, and Spanish, etc.
> Indo-European languages typically use the Latin alphabet
After the first sentence, "Indo-European" seems to have transformed to just "European" in the author's mind. Hindi and Bengali, languages more widely spoken than half the language in that list, seem to have been forgotten, along with their Devanagri script.
(Over the course of the article, it's seeming like the author just wanted to say European languages, or languages using Latin script, and for some reason chose to use Indo-European instead, despite clearly stating the definition themself.)
xiaohanyu
5 days ago
Thanks for pointing this out.
Yes you are right, I am not a linguist so I have little knowledge for "Indo" languages.
Originally I adopted the word of "Germanic languages" then I found Spanish is not a Germanic language hence I then adopted "Indo European" language.
This needs a fix for sure.
messe
4 days ago
> Originally I adopted the word of "Germanic languages" then I found Spanish is not a Germanic language
Also worth noting that out of those you listed, Russian is also not a Germanic language (it's Slavic), and does not use the Latin alphabet.
cafard
5 days ago
Not just "European", Western European. A whole lot of people read and write with the Cyrillic alphabet.
Tainnor
5 days ago
And then there's Greek.
xiaohanyu
5 days ago
roger that, thanks!
chrismorgan
5 days ago
My understandings, as one very familiar with Indic scripts, very familiar with Unicode in general, but not a CJK user, so please correct me if I’ve blundered:
• Indic scripts need the renderer to support complex text shaping, or else the text will generally be illegible, as though you were drawing your letters wrong, stacking some vowels on top of each other, and other nonsense things like that. As an example, if you’re not familiar with Indic scripts, see the code points used to write my name in Telugu, and how they contribute to the rendering: https://temp.chrismorgan.info/%E0%B0%95%E0%B1%8D%E0%B0%B0%E0.... It’s basically “letter ka, delete the vowel, letter ra, delete the vowel, add vowel i, letter sa, delete the vowel”, but the “kri” will normally be joined together into a conjunct, with the vowel sign drawn on the first consonant, and the second consonant being drawn in a completely different way from normal, which may even affect layout by font—the r conjunct can be a semicircle below, as in that font, but it can also be a curve beginning on the left, shifting the k to the right. (Me, I like the curve style for no particular reason, but the semicircle seems more popular these days. If this concept seems weird to you, reflect that English has allographs too <https://en.wikipedia.org/wiki/Allograph>, though mostly not particularly affecting layout.)
• But as regards line breaking, Indic scripts are much the same as English.
• CJK shaping/rendering can have a bit of complexity because of Han unification <https://en.wikipedia.org/wiki/Han_unification>, and definitely has a lot more nuanced stuff like mixing horizontal and vertical writing modes, and what to do when you mix scripts (which happens much more than with Indic scripts), especially digits, and especially when combining vertical and horizontal. But if your engine doesn’t support any of this, your document should still at least be fully intelligible—just uglier.
• CJK line breaking is awful: where most languages have settled on using spaces to separate words, most CJK languages mostly don’t (Korean does, I believe), and so you pretty much need to know the language to avoid breaking in the middle of words. So you end up things a bit like hyphenation dictionaries to try to do a good-enough job of it. Again, if your engine doesn’t support this, your document should still be intelligible—just uglier.
nicoburns
5 days ago
That graphic of the Indic glyph is very interesting! Definitely explains why shaping is so complex for those scripts!
Regarding CJK line-breaking, my understanding was that it was only Thai and closely related languages that required dictionary-based line breaking, and the Chinese/Japanese had simpler rules mostly concerning punctuation. But I'm not certain about that.
doabell
5 days ago
Yes, for Chinese & Japanese, not breaking words is nice, but not always practical. Maybe if you’re writing a speech, so as not to mispronounce the word in the 5% of cases when that happens. The CSS line-break property pretty much sums up the actual rules. Some apps do ship a dictionary to allow for double-click selection of words. They don’t always get it right, though.
BeFlatXIII
4 days ago
> That graphic of the Indic glyph is very interesting! Definitely explains why shaping is so complex for those scripts!
So _this_ must be why the Affinity suite doesn't properly render Devanagari, yet Inkscape can.
hsfzxjy
4 days ago
> CJK line breaking is awful
It's not true for Chinese. Chinese allows line breaks after any characters.
chrismorgan
4 days ago
My impression (again, open to correction) was that, although that's true, there are many places where breaking is not preferable, like how you can hyphenate in English but should prefer not to. Many in Japanese, basically needing a dictionary, and fewer in Chinese but still some.