Failure of the HTML P element as structural markup

P was never structural markup

In HTML 4.01, the standard states that “the P element represents a paragraph”. However, it is very easy to see that it in fact cannot represent true paragraphs, and it is in fact visual formatting disguised as structural markup, and, as we will see, W3C’s double-standard treatments of P and U shows a certain hard-to-understand hypocricy.

Let us ignore the problem of whether P is visual, and examine the reasons why P is clearly not structural:

There are exactly two reasons: First, that P cannot enclose quotations; and second, that P cannot enclose lists. Let us examine each of these in turn.

In fact, the previous paragraph is good demonstration of the second problem: Suppose we are to mark up the previous paragraph structurally, we should mark it up as:

<p>There are exactly two reasons:<ol><li>that P cannot enclose quotations; and <li>that P cannot enclose lists.</ol>Let us examine each of these in turn.</p>

Of course, we cannot mark it up like this, because P closes before OL, so the last sentence is outside the P element. P simply cannot describe the structure of this paragraph — and many other similar paragraphs.

Similarly, P fails when we have a block quotation within a paragraph. Consider this:

I like what Somerset Maugham said about writing — “A good style should show no sign of effort; what is written should look like a happy accident.” — and I believe this applies not only to writing poetry and prose, but also to writing code.

To correctly mark it up structurally and highlight the quotation as a block quotation, we ought to be able to write

<p>I like what Somerset Maugham said about writing — <blockquote>A good style should show no sign of effort; what is written should look like a happy accident.</blockquote> — and I believe this applies not only to writing poetry and prose, but also to writing <em>code</em>.</p>

As in the case with lists, we simply cannot do this, as P closes before BLOCKQUOTE, so the last part of the sentence will be outside the P — and note that it is not even a whole sentence that will be outside the P element that supposedly represents the whole paragraph. The P element, in this case, cannot even preserve sentence structure, let alone describe the paragraph structure.

As the TeX manual shows, block quotations within paragraphs are not only not uncommon, they are in fact very common since mathematical expressions are often block quotations. Since TeX appeared before HTML, I find it very puzzling that the W3C could claim P to be structural when it cannot even handle quotations-within-a-paragraph.

It is thus obvious that P cannot properly express the semantics of representing a paragraph. Following W3C’s own logic, since P does not have a clear semantic meaning, it must be “visual formatting”.

The W3C complains about authors “misusing” P for visual markup. In fact, due to the way the P element is defined in the DTD, it can serve no function other than as visual markup. P, as currently defined, simply does not have the ability to structurally represent a paragraph.

P still isn’t structural markup — even in HTML5

If you have read section 8.1.2.4 in HTML 5’s draft spec, you will know that P is still not structural. You still cannot represent a quotation or a list within a paragraph because P’s end tag is still optional before BLOCKQUOTE, OL, UL, and DL. The draft spec still makes the same claim that P “represents a paragraph”, but in fact in a lot of cases it simply cannot represent a whole paragraph.

As currently defined, P is pure visual markup without the ability to truly and adequately represent paragraphs. It is pure hypocricy for the W3C to claim it as structural and claim lack of structuralness in U for its removal. In defending U’s removal, W3C is giving the impression that they don’t consider backward compatibility very important. Obviously, if backward compatibility is really not important, they should have redefined P so that P can enclose BLOCKQUOTE, OL, UL, and DL, finally allowing it to truly represent commonly-seen, real-life paragraphs.