Draining the swamp

It’s best to imagine WordPress’s plugin ecosystem as a swamp. Swamps are terrible. You don’t want to be there. You run a constant risk of disease and/or drowning. Anything that sinks into the swamp–it’s not coming back.

I’ve been debugging an odd problem on our WordPress installations involving categories. On some sites, posts which are in have multiple categories don’t display more than one category. That would be strange enough, but the category permalinks are coming out in the format SITE_URL/category/foo with the title baz, where foo is one category and baz a different category:

1
<a href="http://my.wordpress.site/category/category1">Category2</a>

Strange, seemingly non-deterministic behavior? The usual suspects would be database corruption or a theme bug. Yet neither seemed likely in this case. Database corruption usually isn’t so… predictable …and we quickly verified that this error was occurring in both our custom themes and stock TwentyTwelve. That would leave a core bug (unlikely with something so fundamental, but still possible) or a bad plugin.

After several patient hours of tracing execution I’d narrowed the problem to the function WordPress uses when building up the category list: the_category(). The category link string was correct before going in for formatting and it came out mangled. WordPress uses filters to allow plugins to “hook in” and modify output. A search of our plugin code revealed the culprit: Remove Title Attributes.

WordPress adds title attributes to links by default, a behavior which apparently annoys the hell of many people, including at least one person at Lafayette in the past. This plugin simply removes them with a regex (I would be remiss if I didn’t link to the famous StackExchange thread about why you should never, ever, parse HTML with a regex). To accomplish this the plugin added a filter which washed the generated category code through its regex.

Unfortunately, the regex is improperly written. In jargon, it’s greedy. This is the expression evaluated:

1
title='(.+)'

If you pass a string with multiple URL fragments it’s going to match beginning on the title tag of the first URL and ending on the end tag of the last URL. A more properly focused regex would be this:

1
title=\"([^"]*)\"

That’s it. Mystery solved.

Unresolved, however, is the larger problem with the WordPress plugin ecosystem. This plugin was added to the plugin repository in August 2009. It has never been updated since. It has been broken from the very beginning. The author has disappeared. The support forums are moribund. There’s no github repository for me to fork, should I want to continue support, since WordPress in its infinite wisdom uses SVN for everything. Spend some time Googling and you’ll find people talking up this plugin, never realizing the problems inherent. It’s still being downloaded. This may be inexperience (I’m a Moodle veteran and new to WordPress) but I don’t see a good way to get the word out that this plugin has a serious bug. If WordPress allowed you to usurp a plugin then I could push out an updated version so at least you’d get notified in your Dashboard. All I can do is leave a review indicating that it’s broken in 3.5.1 (for this specific use case) and link back to this post.

Not that it matters overmuch in this case since we’re likely to deep-six it here, but the situation feels inadequate. There’s got to be a way to do better.