rikroots
a year ago
When I was building out a new text layout engine for my canvas library, handling soft hyphens were one of the most annoying parts of the work ... until I moved beyond western fonts and met the line layout issues that other languages impose on their scripts. Not helped, of course, by my severe lack of knowledge (beyond Google search) about how those scripts work.
For instance, CJK languages and the requirement to keep punctuation marks associated with the preceding character so they never start a line. I managed to implement some functionality to recognise the ⁠ word joiner character which can be placed between the punctuation and its preceding character, but only got it working for text laid out in horizontal lines. Things currently break down when the text is arranged in columns - I think there's an extra requirement in Japanese for the punctuation mark to also swivel 90deg relative to its preceding character? I've not yet recovered sufficienet will or strength to investigate further and fix.
As for Thai ... why does that culture not like adding spaces between written words? There is a zero-space character - ​ - which I've added into my layout engine's line layout calculations, but the dev/user has to add those zero spaces into the text themselves. Compare that to modern browsers, which seem to include functionality to automatically parse Thai text to correctly break the stream of glyphs into lines - but there's no way for me to access that functionality so I can replicate it in the canvas. Interestingly, the Thai language has its own dedicated W3C standard[1] so I expect there's lots of other Thai-related layout issues that I'm missing/ignoring.