Typesetting flush-left CJK prose properly

Background

Software made outside C, J, and K used to have the problem of mistaking CJK phrases, sentences, or even whole paragraphs as single words because of lack of spaces between words. Since perhaps a couple of years ago, this problem has almost all but disappeared.

Now we are faced with the opposite problem: software assuming that the line can break between any pair of CJK characters (optionally subject to kinsoku-shori rules).

The goal

Some words are best treated as “unbreakable”. For example, personal names would look “unrespectful” if a line break occurs in the middle of the name, and such line breaks are not desired even in office correspondance.

If text is flush left, this undesirability of line breaks within words would apply to all compound words. This is not hard-and-fast (I suspect partly because of lack of support in typesetting software), but how sentences in Japanese manga are typeset provides a convincing case that this seldom-stated rule in fact exists, at least at a subconcious level. For example, in the book Japanese the Manga Way there is the following example in panel 236 on p. 127:

人数多くで
行った方が
安く
すむんだ
からね

No line wrap occurs in any of the word or word-like combinations identified by the book: 人数, 多く, 行った, 安く, すむ, んだ, and から. In fact, the way the sentence is typeset in the example indicates that the sequence すむんだ is considered an unbreakable unit by the manga artist.

If you look through the whole book, you would easily find out that sentences in almost every manga strip in that book is typeset like this.

Line breaks being able to occur between any CJK characters is a myth, created by Western-produced typesetting software that do not fully understand proper CJK typesetting nuances, and—very unfortunately—reinforced by Chinese people who, after years of using such defective software, have already forgotten how to typeset their own language correctly.

Problem with Adobe software

In Adobe’s “Creative Suite” (both CS and CS2), it is possible to mark several characters are an unbreakable unit. Unfortunately, it is not possible to mark adjoining unbreakable units as separate. So it is impossible, in this example, to indicate that a line break may occur between 人数 and 多く, but no line break may occur between either of these words.

This deficiency not only make it impossible to typeset flush left text correctly, it also makes it impossible to use the proper name mark (underline) correctly for Chinese text.

$Id: y.html,v 1.4 2007/03/01 05:40:18 ambrose Exp $
[ Back to common problems in computerized CJK typesetting ]