Beyond kinsokushori

There are more rules than kinsokushori tells you

Most software used to not know how to deal with long strings of CJK text that contain no spaces. Now that they know they can break it up between pairs of kanji, we are now faced with the opposite problem: that line breaks are occurring in inappropriate places.

You are probably thinking that this is solved by following kinsokushori rules. However, this is not the case because kinsokushori only deals with punctuation, yet there are instances where you should not break up sequences of kanji and these cases are not described by any known kinsokushori standards.

In Chinese, a common case for this is personal names. Most people consider personal names to be unbreakable units, and breaking them can be interpreted as a lack of respect.

In both Chinese and Japanese, another case for this is to keep strings of kanji that form a word (compound word in the case of Chinese) together, especially when a paragraph is flush left (not justified). While this tradition has already virtually disappeared in Chinese, it is still being used in Japanese.

One example which can demonstrate that this unstated rule in fact exists is the book Japanese the Manga Way, in which you can find lots of examples where line breaks in many different manga are obviously positioned to avoid breaking up words. For example, in panel 236 on page 127 we find

人数多くで
行った方が
安く
すむんだ
からね

No line wrap occurs in any of the word or word-like combinations identified by the book: 人数, 多く, 行った, 安く, すむ, んだ, and から. In fact, the way the sentence is typeset in the example indicates that the sequence すむんだ is considered an unbreakable unit by the manga artist. If you look through the whole book, you would easily find out that sentences in almost every manga strip in that book is typeset to avoid breaking up words.

We can summarize our observations thusly:

StrictnessScenariosLine breaking behaviour
Strict Careful flush-left (top) typesetting
  • Kinsokushori rules followed
  • Personal names never broken across lines
  • Words not broken across lines
Office correspondance
  • Kinsokushori rules not necessarily followed
  • Personal names never broken across lines
Normal typesetting in Japan/PRC
  • Kinsokushori rules followed
Relaxed Newspapers in Hong Kong
  • Kinsokushori rules not followed
  • Hyphenation rules not necessarily followed

Line breaks being able to occur between any pair of kanji is a myth, created by Western-produced typesetting software that do not fully understand proper CJK typesetting nuances, and — very unfortunately — reinforced by Chinese people who, after years of using such defective software, have already forgotten how to typeset their own language correctly.

Current software not equipped to do this

Currently, to make sure that line breaks can only occur between words, the only thing we can do in most software is to manually insert line breaks. There is simply no easy way to mark up these unbreakable units of kanji as unbreakable.

A concrete example is Adobe InDesign, since it is a piece of “professional quality” typesetting software. With InDesign, you can mark up sequences of characters as unbreakable (No Break). However, since Chinese is typeset without spaces between words (nor is Japanese normally typeset with spaces between all words), it is impossible to mark adjacent unbreakable words as both unbreakable. Trying to do so will result in the two words as being joined into one single unbreakable unit, which is the wrong result.

The only software that are equipped to do the correct thing are text-based systems, TeX and LaTeX in particular. HTML/XML + CSS have the potential to also accomplish the feat, but the World Wide Web Consortium seems to want to relegate HTML and XML only to the web, not realizing that CSS is so at the cutting edge in terms of CJK typography support.

Note

According to http://chcsdl.open2u.com.tw/old_course/f/fc/download/fc05.pdf (see note in honorific outdents), there is actually a rule for never splitting personal names. This rule is called 名不置兩行 (míng bù zhì liǎng háng / ming4 bat1 zi3 leong5 hong4, “never position a name [so that it breaks] into two lines”) and is one of the traditional honorific devices. Although many of the old honorific devices have become obsolete, this one seems to be still honoured, if only at a subconscious level as an unwritten rule.

 

This article was first written in Jaunary, 2007. You can find the original article in the archives.