Marcel's blog

DigitalOcean: Better Performance, Lower Price

I recently switched away from Rackspace to DigitalOcean and have been pleasantly surprised with the performance improvements.

I’ve gotten great support from Rackspace and would not rule out their services in the future, but so far my experience with DigitalOcean has also been good.


Willow Bow Drill Cordage

I had partial success with making a willow bark bow string for a bow drill.

At 4EEE’s family wilderness experience last week, I made a bow drill kit. The spindle and fireboard are made from cottonwood, and the bow string was parachute cord. After starting a couple fires with it, I was looking for ways to experiment. The parachute cord bow string seemed incongruous with the other natural, hand-made parts, so I set out to replace it.


Keep-Alives with Rails on Heroku

Red "F"

I just ran a Rails 4 app through and it walloped me with a big red “F” (fail) for my app’s use of keep-alives. I felt a bit miffed. I had done some basic work to ensure decent performance, like:

  • Recompressing images
  • Designing critical-path server code for consistently fast first-byte times
  • Using asset pipeline to concatentate and minify scripts and CSS
  • Using heroku_rails_deflate to server gzipped assets

But the waterfall confirmed that the browser was spending about 30 ms per request on establishing a new connection for each object on the page. I expected to see this per-connection overhead amortized over multiple requests.

While 30 milliseconds might not sound like much, a few reopened connections add up to the 100 ms that Amazon translates to 1% of sales.


HTTP Keepalives are an HTTP header used to implement persistent connections. Persistent connections reduce the overhead of setting up TCP connections when downloading a page over HTTP. Instead of opening a new connection for each file downloaded, the browser can open one connection and re-use it for as many files as necessary. In addition to saving the ~30 second round-trip for another TCP handshake, it avoids another TCP slow start process for growing the congestion window to take full advantage of available bandwidth.

The server has to know to keep the connection “alive” — to not close the connection after sending the requested file. The client may explicitly request this behavior via the “Connection: keep-alive” header, which most browsers use. But with the end of the connection no longer delimiting the end of file, the client needs another way to determine when to stop waiting for bytes of data when reading the HTTP response.

There are at least two ways the server can communicate how many bytes of data the client must read. One way is with a Content-Length header, which simply tells the client how many bytes of response body to read. But sometimes the server doesn’t know when it sends the headers — at the beginning of the response — how many bytes of data it will eventually send. One example of this is when the response is generated by a CGI script. So another way is to use “chunked” Transfer-Encoding, where the server sends the data in chunks, each of which is preceded by the length of that chunk. That way, the server can buffer a bounded amount of data before sending it to the client and send another chunk later if the CGI script (or whatever source) turns out to produce more data.


I expected Heroku to take care of all of these HTTP details for me. After all, that’s what they do — take care of stuff, so I don’t have to.

Heroku does support keep-alives, but only sometimes. Heroku’s router never uses keep-alives between itself and the application, because the latency of establishing a new connection is minimal. But between the browser and their router, their docs say they do support keep-alives with HTTP/1.1 by default. They’re just not completely clear about when they support it.

It makes sense for a high-performance router like Heroku’s to not wait for the application to produce an entire response before starting to forward it to the client. It reduces buffering requirements on the router and reduces overall latency between browser and application to start forwarding the response when it’s only partially received. But it also means that Heroku cannot provide the browser with the file length when it sends response headers unless the length is given by the application, either up-front via Content-Length or incrementially via Transfer-Encoding. (Actually they could theoretially do that, but maybe buffering partial responses to convert them to “chunked” Transfer-Encoding might add complexity or undesirable latency… I’m not sure what their rationale is.)

The heroku_rails_deflate Gem

I expected Rails to take care of sending the right HTTP headers for me. Shouldn’t this “Just Work”?

After some fiddling and comparing response headers of different apps, I discovered that heroku_rails_deflate was interfering with the process. It installed a HerokuRailsDeflate::ServeZippedAssets middleware that was removing the Content-Length header when serving gzipped assets, probably because of ambiguity about whether the Content-Length should be that of the gzipped data or the uncompressed data. But it turns out that the file handler to which it delegates already sets the correct Content-Length. (Rack::File is eventually called by ActionDispatch::Static.)

The issue was identified during a refactor and fixed in a later version, so I upgraded from 0.2.0 to 1.0.3 to enable keep-alives. But this only fixes gzipped assets. The base HTML doesn’t support keep-alives.


It turns out that, in addition to servering the .gz version of CSS and Javascript files, heroku_rails_deflate also installs Rack::Deflater to compress other files.

Compressing HTML is a potentially big win, but unfortunately, Rack::Deflater was indiscriminantly gzipping images as well. This is a waste, because they are already compressed, but it is a small waste. The real problem is that, in the process, it removes their Content-Lengths, causing Heroku to disable keep-alives.

When serving files from disk, the file handler middleware provides a Content-Length field by reading the file size from the filesystem. Rack::Deflater then compresses every response as a stream of unknown length, so it does not know the compressed content length before sending headers.

Removing Rack::Deflater

I decided to turn of Rack::Deflater, at least for images. It turns out that Rack 1.6.0 provides a version of that middleware that allows selective compression, but that would require upgrading rails, which is more than I wanted to do at the time.

So I just disabled the middleware entirely. It means my HTML won’t be gzipped, but it’s generally small enough on my site that enabling keep-alives is a bigger win.

It’s not as easy to remove as config.middleware.delete because the middleware is not yet in place when config/application.rb is loaded. An initializer in config/initializers is likewise too soon. But doing so in an “after_initialize” block is too late; the middleware chain has already been frozen by that time. It turns out that a good time to remove it is in an “initializer” block inside of config/application.rb:

module MyApp
  class Application < Rails::Application
    # ... other configs here ...

    initializer "app.hack" do |app|
      app.config.middleware.delete 'Rack::Deflater'


Green "A"

So in summary, to make Heroku serve my app with keep-alives I had to:

  • Upgrade heroku_rails_deflate to ~> 1.0.3
  • Remove Rack::Deflater at just the right time

And with that, WebPageTest gives its seal of approval: a big green “A”.

WordPress Race Condition with MySQL Replication

My employer runs WordPress to power the Healthy and Green Living section of our web site. The blog serves spikes of dozens of pageviews per second. We use HyperDB to send read queries to a few slave databases.

One day, I found that replication was falling further and further behind, mostly because of updates to wp_options.  These writes were contending with reads for the MyISAM table lock on the slave. Because MySQL prioritizes writes higher than reads, it reduced our read concurrency to roughly one thread:


Captcha the Dog Exploit

Care2 has an interest in animal-themed captchas, so I evaluated Captcha the Dog.  I think I have found a vulnerability, at least in the image recognition component, which I believe is the meat of the puzzle.


Version Control Comparison for Large Repositories

At Care2, our main repository had 120,000 files and a 2.4 GB CVS checkout. CVS was mostly working with some hacks to run faster on our huge repository. But I wanted more out of version control.

The biggest issue was that merging didn't work well. Sometimes adding or removing files on a branch would have an unexpected result after merging to trunk. And it was difficult to merge to and from trunk multiple times. I know, I know, you can tag branches at just the right place to track what's been merged already... but I'd rather not.

We also relied on file lists to speed up CVS operations.  File lists help by restricting CVS commands to a carefully maintained list of files that were actually touched on a feature branch. But we ran into intermittent problems when the file list database got moved or access was accidentally revoked. The file lists were good at speeding up CVS but greatly increased the complexity and fragility of our development process, so I was eager to leave them behind.

Our two main reasons for ditching CVS were, in a nut shell:

  • Cumbersome merging
  • Slow performance on our large repository

With those reasons in mind, and I set out to find a replacement.


Faster Feature Branching in Large CVS Repositories

At work we have a large CVS repository.  By large, I mean 120k files, 2.5GB checkout.  Most things work fine, and we've evolved some techniques to deal with operations that would otherwise be slow.

Things that work well:

  • Committing a small list of files
  • Updating your whole working copy, since we only expect to do so once daily
  • Updating a small list of files to get someone's recent changes

Things that didn't work well:

  • Scanning your working copy for things you forgot to checkin
  • Branching, because if you do the naive thing, you have to wait for the CVS to branch the whole repository
  • When using tags the naive way, for example for marking a release and deploying it, again because CVS has to walk the whole tree

While we never quite addressed the first problem, we do pretty well at making sure CVS never has to walk the whole tree.


Making Money Online while Homeschooling

Speaker Ann Zeise shares her own success with web publishing as a way to earn money while homeschooling. In short, she recommends finding a topic you are passionate about, researching it, and publishing what you find on the Web. She then details how to monetize this valuable content using advertising services such as Google AdSense.


Dismantling the Inner School

Speaker David Albert builds a metaphor for the journey to setting ourselves free in homeschooling: we must dismantle our inner school, brick by brick, smashing with a sledgehammer the assumptions we have about how learning must work.



Speaker John Young describes how his family and mentors, and all our ancestors, used storytelling as an essential household activity to engage people. Storytelling teaches youngsters how to soak up details in their surroundings that were essential to survival, but it has appeal to all ages.  And the format of the talk is, of course, a quilt of stories.


Tracking CVS with Git using cvs2git

At work, we use CVS to manage code, but I want something better: Git.  The git-cvsimport tool can do efficient incremental updates from CVS into Git — just what you need if you want to work in Git while your team's primary VCS is CVS.  But git-cvsimport is based on cvsps, and cvsps is a dead project.  And worse still, cvsps segfaults on my employer's repository.  Enter cvs2git.


TD Ameritrade leaks your data. Your compensation?

TD Ameritrade sticks out in my inbox as one of the biggest spammers I do business with. Well, it turns out they are not actually spammers themselves; they just leaked my email address. Not much more, I hope. And what I get in return? Not much.


Short URL Klipper Script

As mentioned in my last post, Klipper opens the door to a world of useful little mini-applets. Here's another script that's been handy lately: klipper_ln-s. It posts to, a url-shortening service. Read more...

Pastebin Klipper Script

Klipper is the KDE Desktop's clipboard management tool. It keeps track of your last few copy-and-pastes. But, in allowing you to configure actions that are selectable when the clipboard contents matches a regular expression, it opens the door to a world of useful little mini-applets. Read more...

Nested Imenu for PHP

I wanted an easy way to navigate a PHP file full of object-oriented class definitions in Emacs. My search for such a tool turned up php-mode integration with imenu. Imenu allows modes to generate menus of structural elements in a file, where selecting an element jumps to to its location in the file.

But php-mode separates the list of functions from the list of classes. The list of functions is often way too long, and it's not clearly organized by class.