Well, I vote for Han unification of #Unicode, and I rather think that more Chinese characters should have been unified (e.g., 高 & 髙, 產 & 産, 內 & 内).

Riley S. Faelan

@xarvos Why do you suppose the 五十音 table doesn't look at all like 注音符號 table, but does markedly resemble common renderings of the वर्णमाला table?

@hongminhee

Riley S. Faelan

@xarvos All the more reasons to unify the Fraktur letters with the ones that may not be in Fraktur!

Although, on a second thought, it would be kind of neat to have Mathematical Fraktur Katakana in Unicode ...

@hongminhee

Riley S. Faelan

@hongminhee So, why aren't these two characters unified?

@xarvos

洪民憙 (Hong Minhee)

@riley Are you asking because you don't know that now? Because those two characters mean different things.

@xarvos

洪民憙 (Hong Minhee)

@riley The reason bopomofo and Japanese kana look so different from each other is because each letter comes from a different Chinese character.

The resemblance between Japanese Kana and Devanagari is, well, either a coincidence or you have weird eyes, because they don't look anything alike to me.

If you don't know anything about Chinese characters or East Asian scripts, please don't make any more unreasonable claims.

@xarvos

Riley S. Faelan

@hongminhee I am pretty miffed by your apparent wilful ignorance of the fascinating histories of non-Chinese writing systems, let alone the fact that just because some other languages borrow a number of Chinese-derived characters for their writing systems doesn't mean they're using them in the same way, with the same shapes, and the same meanings as the official guardians of the Chinese language in the imperial republic's capital of Taipei insist them to do.

@xarvos

ポット🫖

@hongminhee @riley @xarvos The individual hiragana characters have no Indic origin. Its usual arrangement あ-か-さ-た-... ([vowel only]-k-s-t-...) resembles how Sanskrit characters are conventionally arranged in a table. This resemblance is believed to derive from works written by Japanese Buddhists who studied texts originally from India.
https://www.u-tokyo.ac.jp/focus/ja/features/z1304_00195.html

Riley S. Faelan

@hongminhee No, I'm making explicit how absurd the concept of Han Unification is in light of other commonly accepted policies of Unicode.

Insomuch as the code space is a problem, they should have specified Korean letterforms to be canonically encoded fully in jamo, not as precomposed syllables, and possibly place the precomposed syllables somewhere outside the BMP.

Unifying Japanese characters with similar-looking Chinese characters is as absurd as unifying Greek 'Α' with Coptic 'Ⲁ' just because the latter is a reshaped form of the former.

Spoiler: the Unicode Consortium actually tried that, which is why the base Greek letters are in a block called "Greek and Coptic", U+0370 – U+03FF. It turned out to suck so badly that most of these were retroactively reconceptualised as Greek-only letters, and a separate block of "Coptic", U+2C80 — U+2CFF, was established for mostly the same letters, except in Coptic-only forms.

There's still a few letters in the old "Greek and Coptic" block that only make sense in Coptic texts, such as 'Ϣ', which nobody in their right mind would want to unify with the Cyrillic 'Ш' merely because that's where it came from when the Cyrillic alphabet was composed by a man who might or might not have been called Kirillos when he had trouble finding a Greek letter that might correspond to the Bulgarian 'ш' sound.

There isn't one, because this sound is not used in the Greek language. But Egyptian used to have this sound, and had a series of letters depicting reeds in a pond to represent it, until eventually the Demotic form of the latest version got adopted into the mostly-Greek-based writing system of the post-late form of the Egyptian language that is Coptic, and through it, into the mostly-Greek-based Cyrillic.

Riley S. Faelan

@hongminhee The reason bopomofo and kana look different is, they were created separately. The Japanese syllables, as a concept, are not derived from the Chinese script. They're derived from a Brahmic/Sanskrit idea of how syllables are supposed to work, and that's why the gojûon looks so much like varṇamālā. The letterforms might have been re-rebused from common Japanese Kanji characters in use at the time, possibly because the Brahmic letterforms aren't exactly optimal for being written with a Japanese-style writing brush; possibly because whoever brought a Brahmic script to Japan had travelled around enough to see that there were many different and not-always-mutually-intelligible forms of Brahmic scripts around already.

@xarvos

Riley S. Faelan

@pot Pay careful attention to the 51^st sound of the 50-sound table: the venerable terminal 'ん'. Separating the terminal sound of a syllable into a distinct letter while keeping the preceding consonant and vowel together doesn't make sense in any of the major Chinese ways of syllabic word composition analysis, with the possible exception of the new linguistic ideas brought in by the Yuan dynasty, which I'm not particularly familiar with. Neither bopomofo nor fanqie does that. But it's relatively natural outcome of applying the common Brahmic way of building an abugida — rather than a syllabary — to the Japanese syllable composition.

@hongminhee @xarvos

Riley S. Faelan

@xarvos Well, if you unify alpha with alef and beta with beth (and beis), then it will make perfect sense to also unify the concept of an alphabet with the concepts of both alefbet and alefbeis.

And why not throw alif, ba, and abjadi and alefba into the mix while we're at it?

@hongminhee

Riley S. Faelan

@pot Most of the character shapes in kanaare probably redrawn, not borrowed from a Brahmic script. Somebody probably picked a bunch of kanji, and pulled the standard rebus trick on them that people have been doing since time immemorial when building a phonetic script out of the concepts of a logographic script, in order to get characters that made sense in the Japanese context. But the character inventory is based on something closely related to Dewanagari.

@hongminhee @xarvos

⏣ (hexed)

@riley what greek letter would you map o to? omicron or omega? what about hebrew? what about c, x, j, or w?

i hope we both find this absurd and you’re just using that as a point against unification rather than actually advocating for it

@hongminhee

Riley S. Faelan

@xarvos

Who says you have to align by the narrower alphabet?

If omicron/omega split is a thing in some alphabets, these could be used as the base, and one of them mapped into the round vowels of the alphabets that have only one round vowel.

In summary, Han Unification is an absurd idea and should never have been done at the code point level. A similar project, for defining (probably context-specific) mapping tables would have been much more sensible.

@hongminhee

Kagami is they/them 🏳️‍⚧️

@hongminhee I guess we can merge them and do <span lang="ja">高</span> <span lang="zh-CN">高</span> in case it should be differentiated, as it will affect the font selection.

Riley S. Faelan

@xarvos Also consider the problem of unifying en English alphabet that has the letters 'ƿ' and 'Ȝ' in it with an English alphabet that has eschewed them, or possibly created new letters with the old phonetic values.

@hongminhee

洪民憙 (Hong Minhee)

@riley Using your logic, shouldn't there also be separate sets of Latin characters for English, French, Italian, and German?

洪民憙 (Hong Minhee)

@riley You are trying to convince me by deliberately muddying the waters between the shapes of the characters and the syllabic system. Even if the Japanese kana borrowed the idea of a syllabic system from the Brahmic, each hiragana's shape is derived from the cursive script of Chinese characters, and each katakana's shape is derived from a fragment of Chinese characters (hence 片仮名), which in turn is derived from the mayogana (万葉仮名).

@xarvos

Riley S. Faelan

@hongminhee It might actually make sense.

It happens, though, that most of the old European languages have many centuries of a history of being used together, and being thought of as local variations of a common script. This may also apply to the use of the Arabic script for a number of languages, but, arguably, breaks down at the point of Farsi, which has an Arabic-derived script that significantly differs from Arabic.

But there's also a bunch of Latin-script-using languages that don't neatly fit into the old European script-space, such as Esperanto. It might make sense to allocate a fully differentiated Unicode block to Esperanto, even if most of the West European languages are kept unified.

Speaking of East Asian languages, Vietnamese uses Latin script in a form modified so heavily that an even better case of mapping it distinctly from English, German, and French exists than for Esperanto.

Distinct mapping has numerous handy uses for purposes such as facilitating simple and automatic mapping between the Latin and Cyrillic rendering of Serbian, which is widely written in both scripts, and a similar mapping between the Hindi/Urdu, which is widely written in both the Dewanagari script and the Farsi version of Arabic script. Did I already mention that the latter should be considered distinct from the classic Arabic version of the Arabic script?

Carlana Johnson :v_trans:

@hongminhee @riley @xarvos I have read in books that the *order* of the 50 sounds is derived from Siddhaṃ (not Devanagari!) but the shapes are unrelated, and up until the 20th century, iroha order was more commonly used that 50 sounds order.