On articles and excerpts

I happen to love WordPress, just like many front-end developers do. It can give you almost the fullest control over your markup. When I first started using it there was almost no html5 around. A blog post was just a <div> with some classes. But it’s not anymore. A new tag emerged and now a blog post is an <article>. It is an <article> both in a listing of posts (say on an “Our blog” page, where the content of a blog-post is short and usually contains a “Read more” link) and on its single page. The fact that a short version of content gets its own <article> got me interested and I decided to find out whether millions of users get semantic posts listing pages and to update my knowledge on this tag.

What is an <article>?

The first place I go to when I need to find out anything about any tag is the W3C spec where we see that

“The article element represents a complete, or self-contained, composition in a document, page, application, or site and that is, in principle, independently distributable or reusable, e.g. in syndication. This could be a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.”

The first words that are chosen to describe this tag are complete, or self-contained. Can we really describe some introductory sentences of a blog-post “complete”, or “self-contained”? Nope, it’s not complete. The score is 0 : 1.

Reading further we see that an <article> is in principle, independently distributable or reusable, e.g. in syndication. Can we reuse those short excerpts of info or distribute them via RSS for instance? Sure! 1 : 1. Will those short excerpts be useful without the “read more” link? Depends on the content but maybe.

And closer to the end of the definition we see examples of a true <article>: a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content. The key example is of course a blog entry. Is a short version of a blog entry also a blog entry? oO hm, not sure, it’s still 1 : 1.

What is a blog entry and what is not?

To get the definition winner we must decide what a blog entry is. I have to admit here that I’m not a native speaker, I’m just a person who knows some English :)

A blog entry is a blog post. But what exactly is a blog post, what does it consist of? When we say “Have you read the latest post by Paul Irish?” do we mean that short paragraph with a read-more link? Of course not!

We mean a post that consists of a title, some meta info, great content (with links or images) that builds the body of the post and, optionally, the ability to leave a comment. In fact, you are reading one right now. You are not reading a short version of it. It is complete, independently distributable and reusable :) It is definitely an <article>, at least in terms of html-semantics. So it looks like it’s 1 : 2.

A short excerpt is not an <article>.

What do mentors say about <article>?

And so are those of 99% of users who happen to use WordPress as their CMS, which is millions of people really, providing they don’t dig too deeply.

On the other hand there are some sites that may have asked the same question I did and didnot wrap their excerpts in <article>s. The html5rocks.com has a similar listing of posts (without the “Read more” link, but with an excerpt) that are simply list-items or MDN’s Latest hacks on the MDN page and on the Hacks site. On all of these sites <article> is used on the post’s single page only.

What are the alternatives?

When it comes to choosing this or that tag I always ask myself a question: what does this piece of content signify? What does it represent? When it comes to excerpts – they are essentially quotations… parts from something bigger… so a crazy thought may be to mark’em up as <blockquote>s with the <cite> element as its heading:


<ol reversed>
    <li>
        <blockquote>
            <header>
                <cite><a href="http://blog-post-url">Blog post title</a></cite>
                <p>by Alex Bondarev</p>
            </header>
            <p>lorem ipsum</p>
            <footer>
                <a href="http://blog-post-url">Read more</a>
            </footer>
        </blockquote>
    </li>
</ol>

But unfortunately a <blockquote> is an excerpt from another source. It can hardly be used to mark up our own posts (excerpts from someone else’s blog or someone’s tweets? why not!)

A more pragmatic version will be as follows:


<ol reversed>
    <li>
        <div>
            <header>
                <h2><a href="http://blog-post-url">Blog post title</a></h2>
                <p>by Alex Bondarev</p>
            </header>
            <p>lorem ipsum</p>
            <footer>
                <a href="http://blog-post-url">Read more</a>
            </footer>
        </div>
    </li>
</ol>

A plain old unsemantic <div> is better than a wrongly-used <article> or <blockquote>.

Well, why not <excerpt> this?

As another alternative we may think of a new <excerpt> tag.


<ol reversed>
    <li>
        <excerpt>
            <header>
                <h2><a href="http://blog-post-url">Blog post title</a></h2>
                <p>by Alex Bondarev</p>
            </header>
            <p>lorem ipsum</p>
            <footer>
                <a href="http://blog-post-url">Read more</a>
            </footer>
        </excerpt>
    </li>
</ol>

Benefits? Well, it is definitely more semantic than a <div>. It signifies the meaning of the content well. So why not.
Minuses? It may be too narrow, it represents a specific piece of content… or is it not a minus?
Will someone else’s excerpts fall under the <excerpt> category instead of the <blockquote>? Depends, if we define the <excerpt> properly they won’t :)

A note on listings of full posts.

If we use full posts on our listing page, <article> is definitely the tag to use, just like Jeremy Keith or Bruce Lawson do.