Marcel's blog

WordPress Race Condition with MySQL Replication

My employer runs WordPress to power the Healthy and Green Living section of our web site. The blog serves spikes of dozens of pageviews per second. We use HyperDB to send read queries to a few slave databases.

One day, I found that replication was falling further and further behind, mostly because of updates to wp_options.  These writes were contending with reads for the MyISAM table lock on the slave. Because MySQL prioritizes writes higher than reads, it reduced our read concurrency to roughly one thread:


Captcha the Dog Exploit

Care2 has an interest in animal-themed captchas, so I evaluated Captcha the Dog.  I think I have found a vulnerability, at least in the image recognition component, which I believe is the meat of the puzzle.


Version Control Comparison for Large Repositories

At Care2, our main repository had 120,000 files and a 2.4 GB CVS checkout. CVS was mostly working with some hacks to run faster on our huge repository. But I wanted more out of version control.

The biggest issue was that merging didn't work well. Sometimes adding or removing files on a branch would have an unexpected result after merging to trunk. And it was difficult to merge to and from trunk multiple times. I know, I know, you can tag branches at just the right place to track what's been merged already... but I'd rather not.

We also relied on file lists to speed up CVS operations.  File lists help by restricting CVS commands to a carefully maintained list of files that were actually touched on a feature branch. But we ran into intermittent problems when the file list database got moved or access was accidentally revoked. The file lists were good at speeding up CVS but greatly increased the complexity and fragility of our development process, so I was eager to leave them behind.

Our two main reasons for ditching CVS were, in a nut shell:

  • Cumbersome merging
  • Slow performance on our large repository

With those reasons in mind, and I set out to find a replacement.


Faster Feature Branching in Large CVS Repositories

At work we have a large CVS repository.  By large, I mean 120k files, 2.5GB checkout.  Most things work fine, and we've evolved some techniques to deal with operations that would otherwise be slow.

Things that work well:

  • Committing a small list of files
  • Updating your whole working copy, since we only expect to do so once daily
  • Updating a small list of files to get someone's recent changes

Things that didn't work well:

  • Scanning your working copy for things you forgot to checkin
  • Branching, because if you do the naive thing, you have to wait for the CVS to branch the whole repository
  • When using tags the naive way, for example for marking a release and deploying it, again because CVS has to walk the whole tree

While we never quite addressed the first problem, we do pretty well at making sure CVS never has to walk the whole tree.


Tracking CVS with Git using cvs2git

At work, we use CVS to manage code, but I want something better: Git.  The git-cvsimport tool can do efficient incremental updates from CVS into Git — just what you need if you want to work in Git while your team's primary VCS is CVS.  But git-cvsimport is based on cvsps, and cvsps is a dead project.  And worse still, cvsps segfaults on my employer's repository.  Enter cvs2git.


Nested Imenu for PHP

I wanted an easy way to navigate a PHP file full of object-oriented class definitions in Emacs. My search for such a tool turned up php-mode integration with imenu. Imenu allows modes to generate menus of structural elements in a file, where selecting an element jumps to to its location in the file.

But php-mode separates the list of functions from the list of classes. The list of functions is often way too long, and it's not clearly organized by class.


Plucene vs. Ferret

Switching from Plucene to Ferret for full-text search yielded huge performance improvements in both memory usage and execution time.

I setup search for an email list a year or two year ago. The original search used Plucene, a Perl port of the well-known Apache Lucene search library. Performance was never great at about 15 sec for first search results when I set it up. But over time, performance degraded to 60+ sec.


Riding Rails with Typo

I upgraded from my home-brew XSLT-based static blog to Typo with the main goals of getting robust comment features and getting to tinker with a Rails app. Along the way, I also tried to support the original blog's URLs, article IDs, and look-and-feel. And I checked that it would deliver pages reasonably quickly.