Unification of the ellipsis in Unicode

Why the Unicode Consortium screwed it all by unifying the CJK and Latin ellipsis characters

What horror to see an ellipsis with the dots at the x-height instead of on the baseline! What went wrong? Shouldn’t dots of an ellipsis always sit on the baseline? Of course, you know the answer: a CJK font is the culprit. And this scenario nicely illustrates why the Unicode standard is wrong about the ellipsis.

In fact, in both Chinese and Japanese, a three-dot ellipsis simply does not exist: In proper orthography, a CJK ellipsis consists of six equally-spaced dots spanning two fullwidth spaces. The three-dot symbol you find in CJK fonts is therefore not the ellipsis, but what some call the CJK three-dot leader — that is, one half of a CJK ellipsis, a holdover from DBCS (double-byte character set) days.

The CJK ellipsis, like the CJK comma, the fullwidth comma, and the CJK period, can either be centred or anchored to the baseline. This is in contrast to the Latin ellipsis that can only be anchored to the baseline. In linguistics, we call this a minimal pair, two things with a single semantically-significant difference that is large enough for us to distinguish between the two.

The fact that the CJK three-dot leader and the Latin ellipsis constitute a minimal pair shows conclusively that they are two completely different characters that should never have been unified. Having unified them into a single code point in Unicode was a serious error that is now too late to fix.