The removal of the U element in HTML5

An HTML element that should never have even been deprecated

This article was written in 2011 and last updated on October 3, 2011. Since that time things have changed, and U is no longer a deprecated element in HTML5. The article is kept here as a record of the states of things around that time.

There was some discussion on the removal of the U element around December 2010 on the www-style list, and of course there is bug 10838 at W3C, the master bug for the reinstatement of U which is now deemed closed and “won’t fix”. Aside from the incredulous bureaucracy, two points have specifically been used as “reasons” for U’s removal and I will try to show why these are not really reasons, and the decision to remove U will foster a disrespect for both orthography and typographic conventions.

Before the so-called reasons are addressed, let me reiterate that to Chinese people, the U element represents a punctuation mark (and by extension the default underlining in A represents an incorrectly used punctuation mark), and therefore its removal represents an attack on orthography, which should raise a red flag no matter which typographical tradition you happen to be following. Despite W3C’s position that they should “move forward” without regard to backward compatibility, U remains the only possible way for the punctuation mark in question to be universally represented on web pages.

“Reason” number 1: It is visual formatting

The most-cited reason for U’s removal is it being “visual formatting”, and there are two flaws in this: First, it is not purely visual formatting; and second, punctuation marks by their very nature are visual formatting.

To start, I will simply restate that there is semantics to the underline in the Chinese language. The U element is not simply visual formatting and its removal will have an effect on certain well-defined semantics being expressed. The fact that the punctuation mark being expressed by U is one of two that the West has apparently never heard of is unfortunate, but it seems incredible that W3C does not seem to even want to acknowledge this.

From the opposite perspective, let me state also that all punctuation marks are, by their very nature, all just “visual formatting with semantics”. There is no inherent meaning to any of the shapes of any of the punctuation marks, and their meanings are ascribed to them only by convention (and if you look at it from a big enough picture, only by per-language conventions). In the West, punctuation marks like the hyphen, comma, period, and slash all descended from a single scribal mark that can take all these forms and which can express all the meanings expressed by these modern punctuation marks. Even in the modern day, where grammar books and style guides regulate use of punctuation marks, we still see variance in what these marks actually mean. In actual use (say how hyphens and dashes are actually used in English, or how commas are actually used in French), punctuation marks do not accurately describe sentence or even (in the case of hyphenation in English) word structure.

The fact that punctuation marks are by their nature visual markup is well illustrated by this now-seldom-stated (and now seldom followed, and even often discouraged, for example by Robert Bringhurst) rule: If a word that does not usually take on a plural form (e.g., an abbreviation or a number) is pluralized, an apostrophe should be used before the plural s unless the pluralized word is visually separated from the plural ending by italicization or other typographic means. This use of the apostrophe is exactly how the colon is used in Finnish, and omission of the apostrophe in such cases — a disrespect for traditional orthography and typographical conventions in English — is now wreaking havoc with computerized text processing, giving us spelling mistakes like FAQS (FAQ’s, the plural of FAQ). The old rule was in fact years ahead of its time and, had it not been abandoned, we would not have to deal with a class of easily-preventable spelling mistakes.

If a punctuation mark in Chinese must be removed to give way to structural markup, should English punctuation marks be also removed to give way to new structural markup that describe sentence structure? This is in fact not without precedent (the Q element); but why would anyone believe new structural markup will be any better, given the spectacular failure of the P element as structural markup?

“Reason” number 2: It will be misused and abused

On the question of misuse and abuse, I am not going to dispute that it will be misused and abused, because this is totally irrelevant — anything can be misused and abused.

But first let us look at misuse and abuse from a different perspective: by claiming visual formatting as an excuse for the removal of U, the W3C is encouraging incorrect use of this “visual formatting” in inappropriate contexts — the use of default underlining in the A element.

Remember the meaning of underlining in Chinese: that the underlined word is a proper name. This is not some random scribal tradition but the standard meaning of standardized orthography. To a Chinese reader, underlined links look like random text that should be proper names but aren’t. The fact that Chinese people don’t complain about this implies that the West has forced its meaning of underlining (emphasis) onto the Chinese language and damaged the integrity of the Chinese punctuation system.

In removing U and condoning the underling in A, the W3C is actively telling Chinese people that underling can no longer be regarded as a punctuation mark in its own right, and that the Chinese rules are wrong, that the West — and only the West — is right.

And in marking the master bug as “won’t fix”, the W3C is sending a sublimal message that such cultural precedence is incontestable.

Viewed this way, the “rationale” that underlining is visual formatting because this is how the A element presents itself is totally unjustifiable: instead of removing U, it should instead have acknowledged that underlining in A is culturally inappropriate and proceeded to recommend — or even require — that user agents not underline links by default.

Summary

The removal of U sends a clear message to both Chinese and non-Chinese alike: To the Chinese, it is saying that you don’t need to follow Chinese punctuation rules, that Westernized rules always take precedence over Chinese rules, and that this precedence is uncontestable. To the non-Chinese, it is saying that respect for local orthography and typographical rules is unimportant, that localization can be just skin-deep, without a deep understanding of local culture.

The instilling of such attitudes is the real misuse — or rather abuse — in W3C’s decision to remove U from HTML.