It is a tale as old as time (well, not really): you add a new feature to your app, you
run the tests and they all come back green. You
git push master, the build automatically
runs, and comes back green. You deploy to production and… it all breaks. Because there
was some underlying system package missing, or a version mismatch, or any of a hundred
The solution is obviously to make the dev and test environments more similar to the production environment. Keeping that gap as small as possible is one of the tenets of the 12 factor app: dev/prod parity.
Historically that's been easier said than done on Heroku.
With our product Voltos, we need a very high degree of trust and confidence in our test infrastructure. We need to mitigate or reduce any incidental risk that is introduced in our test infrastructure to ensure that what we've tested is definitely the same as what we have in production. And with relatively high (security related) version churn in underlying system dependencies like OpenSSL over the past few years, the risk that our test infrastructure is running a different version of such a library to our production system isn't acceptable.
The problem with PaaS CI
So much of the value of running on a PaaS like Heroku is that someone else is helping take care of things for you. When there's a critical OpenSSL issue Heroku have usually started rolling out patches across the fleet before most people would have had time to respond.
This freedom to focus on your own product comes with considerable constraints, an obvious cost: you're not the one that controls the base image.
Compound it with the fact that for the most part you've not been able to run your own CI setup on Heroku, but instead had to pick a vendor from the Add-on/Elements Marketplace. Which has been great for convenience but introduces a wider gap in that dev/prod parity ideal. You've now got a third-party vendor who is managing your production infrastructure, and a different third-party where you're trying to run some approximation of what you've got in production.
Not to mention it almost certainly doesn't use any custom buildpacks or other Heroku extensions that affect exactly how your application runs.
The dream Heroku CI setup
I spent over 3 years trying to politely cajole various CI providers into supporting a different approach. Something that more accurately replicated a production Heroku environment. Something that actually ran within a dyno. That way it would use all the same buildpacks. It could talk to Heroku Postgres just like my app did. I was free to test any other add-ons if I felt it appropriate in an integration test as opposed to mocking literally everything. Nobody was able to fit it into the way they ran their CI services though.
Well… that is until I met Buildkite.
Let's go fly a kite
The Buildkite approach to CI is beautiful in it's simplicity: an agent you run that sits waiting for work, when it gets some it runs the various scripts you define in the Buildkite UI, and returns the status of those commands back to Buildkite. All of the handling of webhooks, fanning out work to parallel works (if you need it), reporting status back, etc. is taken care of. The one thing they don't do is run your infrastructure for you.
Which is perfect. Heroku do that for me.
Some minor tweaking of the default agent behaviour, wrap it up in a buildpack, and it's ready to run our tests on Heroku for us whenever we deploy.
Here's how we're using it with parts of Voltos:
Setting up Heroku
- Setup a Heroku Pipeline so that you
stagingapp and link it to your
- Add the Heroku GitHub Integration to
stagingapp so that Heroku will check out any changes to your
masterbranch automatically. Make sure you do not check the
Wait for CI to pass before deploycheckbox, otherwise your code will never get deployed to this branch.
- If you have some test specific external dependencies, such PhantomJS, you'll need to
add them via additional buildpacks (e.g.,
heroku buildpacks:add https://github.com/stomita/heroku-buildpack-phantomjs)
- You'll need to tell
bundlerto ignore only the
developmentdependencies now (so that
testdependencies will be installed), and that you want to run the app in test mode
heroku config:set BUNDLE_WITHOUT=development RACK_ENV=test RAILS_ENV=test.
- I've enforced an environment/config variable of
APP_NAMEas a requirement to make targeting individual Buildkite agents/apps on Heroku easier (a subject for another time). For now just make sure you set the app name:
heroku config:set APP_NAME=your-app-name.
- Install the Buildkite Agent Buildpack as the very last buildpack on your app:
heroku buildpacks:add https://github.com/gluio/heroku-buildkite-agent.
Setting up your local app
There's some minor customisation required in your app to make the transition as seamless as possible. You'll need to create scripts for each step you want Buildkite to run (e.g., setting up the database, running the tests). This is because Buildkite normally assumes it will be a long-running agent that will continue to get work, run it, return status, and then wait for new work.
We don't want to do that on Heroku.
Instead we want to fetch work, run it, and if it fails we want to kill the agent and then return an error to the Heroku build process so that it aborts too. That means two minor hacks, one for any regular step we run and then a slightly different approach for whatever we know to be the last step in a pipeline.
Here's the script I use for setting up the database (
You'll see it's just the regular
bundle exec rake db:migrate from Rails. But then I check
the return code for the previous command. If it's not equal to zero, kill the agent and
to abort the build process. Otherwise
exit 0 and continue the build.
The script to run our tests (
bin/run_tests) is the last step in our pipeline. It looks like:
The only substantial difference here is that the agent will always be killed, irrespective of the return status of the previous command. Because in this instance we're done with the agent and need it to shut down and not try and receive any more work.
Setting up Buildkite
- Create a Buildkite account.
- Create a new pipeline.
- Add the steps for your process (e.g.,
bin/run_tests) and save the pipeline
- Go to the Agents page, reveal & copy your agent token, and add it to your Heroku app:
heroku config:set BUILDKITE_AGENT_TOKEN=8b8b9ccad.
Push a change to
master and watch GitHub notify both Heroku and Buildkite
of the change. Buildkite will queue up the test work while Heroku is busy checking out the code, resolving
dependencies, and building the app. Once that's done the Buildkite Agent will start, it'll fetch
the instructions on what it needs to run, and submit the results back.
Really fast production deployments
From here you can use the pipelines feature of
heroku pipelines:promote to move the release
on your staging app directly onto your production one in a fraction of a second and have really fast
manual deployments to production at a time you control.
Auto-deployment of green builds
The alternative approach we've taken is to enable the GitHub integration on our production Heroku
app, but check the
Wait for CI to pass before deploy checkbox for that app. Now the production app
will wait for the staging app to run the tests and make sure everything is green, before re-doing the
build on production but this time excluding the test dependencies.