Our Jekyll CMS migration from Drupal
Why did we migrate from Drupal to Jekyll?
Four years ago we migrated our website and blog to Wordpress. I'll never forget the reaction when we announced the news to our team. Silence, quickly followed by irritation, resentment, and eventually a lot of cynical jokes. Two years after the move to Wordpress, we migrated to Drupal. There were many reasons why Drupal was a better fit for us, and this time I was sure the team would be lining up to build content. Unfortunately, the result was the same. Both attempts succeeded from a technical standpoint, but adoption and engagement never got off the ground. Our problem? BHW doesn't align with the audience that CMS solutions target. Our team is made up entirely of technical staff comfortable with HTML and Markdown. Our staff feels hindered by traditional CMS solutions. After two years on Drupal we migrated to Jekyll, a static site generator. This time the reaction has been different.
To be clear, Wordpress and Drupal are mature CMS systems that, when designed, deployed, and cared for properly, offer easy authoring solutions for almost anyone. If your team isn't comfortable with HTML, text editors, and some basic source control, then a static site generator probably isn't for you. While static site generators aren't poised to take over from traditional CMS solutions, their rising popularity and numbers can't be overlooked. Most businesses making a content authoring decision today will select a CMS, but what many don't consider is that you are committing to the platform and all it entails. It's too easy to believe the selection of an open source CMS is a low cost, low risk decision. Businesses incorrectly assume the "free" open source CMS should be easy to switch later, because after all it's "free." Carefully consider what you're signing up for:
Many of these efforts require good advice and guidance from consultants to be safe and effective. The cost of a bad decision is higher than you would expect, simply because CMS migrations are complicated and your valuable content is stored in a content database and deeply nested directories, spread into many fragments. Putting your content back together can feel like an episode of a TV show with a team of people taping shredded documents together. A small mistake can mean that the SEO your team has worked so hard to build is lost due to duplicate content penalties, broken links, or missing metadata.
Consider security for a moment. The popular open-source CMS systems are based on PHP and use many plug-ins for even the most basic website. There's been a lot of discussion lately about 75% of PHP sites having known vulnerabilities. New sites are less likely to be exposed to these issues, which leaves you to focus concern on the CMS software itself. While Drupal and Wordpress are carefully vetted for security flaws, vulnerabilities always slip through and require constant vigilance in the form of patching and updates. Most CMS best practices recommend using as few plug-ins as possible to reduce the surface area for attacks. Plug-ins are usually written by much smaller teams, are patched less frequently, and in most CMS systems, are the principal entry point for attacks. Even if you invest the time to stay ahead of these issues, now you have to deal with human concerns like weak password complexity, shared passwords, etc. An issue in any one of these areas opens your site to attack.
Compare this experience to a Jekyll generated static site. Jekyll only runs on a workstation and produces static files. Your server only runs a web server, a config file, and hosts your static files. Instead of serving dynamic content from PHP-based CMS systems and their plug-ins, you serve static files through Nginx or Apache, two of the most hardened and thoroughly tested web servers on the planet. Supporting Nginx or Apache couldn't be easier, with most linux distributions automatically updating these two packages for vulnerabilities. The overhead and worry of running a PHP CMS is reduced to nearly zero.
What about recovery? If your CMS has been running awhile, be prepared to face issues installing exactly the version of plug-ins and software you need, then patching up your vulnerability. If your PHP CMS was hacked, you are probably facing a server reinstall and database restore, including a file system restore for your images and other assets. Without a recent backup you will lose content and data. Don't underestimate the time and complexity of a rebuild, and if you're concerned, ask your team or vendor to perform a restore test to make sure you're prepared in the event of an attack. Be ready to watch their face fall as they consider the effort. Even if you're lucky and have a snapshot backup to restore, you still have to bring it up and hope you can find the vulnerability and patch it before getting hacked again. Recovery on Jekyll can be as simple as:
- Provision a new VM
- Install Nginx
- Deploy your files
Unlike a PHP CMS, there's no chance for content or data loss in Jekyll because your static files are deployed after being built on your local workstation. With distributed and hosted source control, your site will be safe even if you lose access to your workstation. Simply find another computer, pull down your source, install jekyll (it's a one-line command line install), and build your website.
Performance is another advantage Jekyll holds over a PHP CMS. A search of Google for Drupal or Wordpress performance will display many recommendations of plug-ins and proxy servers, two of the most popular being W3TotalCache and Varnish. Even after using these solutions, carefully tuning our configuration, and upgrading our server, we still couldn't reliably deliver our expected performance. A quick search will show the same performance challenges mentioned online. Caching and proxy systems are used in an effort to overcome the poor performance exhibited by Drupal or Wordpress. Both underlying CMS systems are very chatty, running many queries to render a single page. Plug-ins make this worse by running their own queries too. Viewing a typical page can generate hundreds of queries to the content database. Caching is a great way to get around this problem, but has its own downsides too. Caching strategies vary, but you often need a larger server with more RAM to cache your website. Edits to single pages might be a quick cache update, but site-wide changes mean you have to completely rebuild the entire cache. On our Drupal site, rebuilding the entire cache took almost 10 minutes, and the site was basically unusable while this occurred. Proxy systems like varnish may even cache based on the HTTP User Agent, resulting in multiple copies of the website for different devices and browsers. We took the approach of pre-caching our site, but if we simply let the site render pages that missed the cache, we would see page loads over four seconds. Our move to Jekyll allowed us to retire all this infrastructure and simply use Nginx for serving static files, a task that it excels at. It's so fast in fact that we were able to reduce the size of our VM server several times and save on hosting costs, all while proving better performance than before!
CMS systems bury their configuration in multiple settings pages. Once you add modules to the mix with their own setup, you can easily have a situation where your configuration is stored in over ten different screens. Couple that with the reality that your custom pages and blog content probably have their own settings and controls for their metadata and SEO information. When you need to make a configuration change, finding what to change and having confidence in its outcome can be a real challenge. Compare this to Jekyll's approach. Configuration is stored in one of two places, the _config.yml file for system-wide concerns, or at the top of the page or post's individual file in the "front matter", a YAML section listing a simple collection of name-value pairs useful to the page. Jekyll's approach is a welcome change to your average Drupal or Wordpress site. It's very empowering to review your configuration information all in one place while having the ease of editing using text editor tools authoring, search and replace, etc.
In a traditional CMS system you may have content versioning (with a plug-in), but that feature seldom extends to versioning system configuration information and templates. Usually you would create a database backup before configuration changes, naming the backup something appropriate to reference later if needed. The backup is opaque. There's no way to know exactly what was changed, you simply have a before-and-after version of the system state. Jekyll's flat file approach is exactly the same, but fundamentally different in how it is applied. Unlike database backups, flat files are easily compared, and differences are easy to understand using command line tools or GUI products like "Beyond Compare". Jekyll may accomplish the same end result of storing a before-and-after picture of the configuration state, but its done in a much more approachable fashion that provides amazing utility. When you couple this approach with source control, you now have a visual record of the changes, a note about the reason, and an audit of who performed the change and when.
Jekyll's configuration management is so superior to a CMS that it's one of the main reasons we moved our website and blog from a database backed CMS. This one feature gives you easy fallback, branching support for testing new ideas, a contextual audit trail, comments, and more. The power of this approach to change management extends even into the content itself, allowing pages and posts to be edited in the same powerful comparison context. Don't underestimate the value of this key difference between these platforms. At BHW, this one feature was the main reason we even researched flat file CMS systems to begin with. Drupal and Wordpress have some plug-ins that promise visions of a similar approach, but many of these plug-ins have drawbacks and are frequently incompatible with other plug-ins you might select for your website.
I started the article mentioning that many businesses underestimate the cost of a traditional open-source CMS. Free software doesn't translate to having a "free" website. Even if a business has a handle on the upfront cost, very few understand the upkeep cost. Long-term cost breaks down into several categories:
- Content Authoring
Hosting cost can vary dramatically, but most analysis agrees that hosting Drupal or Wordpress can be expensive. There's no question it's more expensive than hosting a static site generated by Jekyll. Sites like github pages can now host Jekyll static sites for free. Not every Jekyll site can make use of github pages though, which prevent the installation of certain plug-ins. The difference in cost could be insignificant for smaller websites, but if your business has enough viewers and content, you will probably pay significantly more for hosting Drupal or Wordpress because you will be forced to pay for performance in the form of larger servers for caching.
Maintenance is likely the cost most businesses underestimate. Ignoring maintenance is the central reason most websites gets hacked. Constant vigilance is required to monitor for vulnerabilities and install updates or patches. Most of the time the update or patch can be installed easily; however, some updates require code or configuration changes. Plug-ins might not port a patch to the version your website uses, forcing you to upgrade to a new version of the plug-in. These code changes and forced upgrade events occur more than you would think, and they're always unexpected. Usually you end up working in a staging environment to test the update before deploying to production. In order to accurately test, it's important to mirror production as closely as possible, meaning synchronizing production content into staging. This can be a time consuming effort on its own, but the truth is that few businesses do this. Most simply install to production and "hope for the best," then react if something goes wrong. Some likely consider it an educated gamble that saves the cost of running a second environment at the expense of some unexpected urgent website work if something goes wrong. Once your open source CMS has been running awhile, you'll eventually have to make a major version upgrade, like going from Drupal 6 to 7 or 8. These are extremely challenging and time-consuming events, and ignoring them puts you at risk since support for older versions of the base CMS solution fades quickly as new major versions are released.
Removing any barriers during content authoring was the biggest factor in exciting our team. Few things can stop adoption faster than a poor experience in the authoring tools. Once staff start contributing, the next challenge is how to help them make their content compelling and engaging. Again, if the tools get in the way, our team is less likely to contribute. In both cases we found that authoring from the familiar environment of a text editor like Sublime was far superior when compared to a browser-based CMS tool. A third barrier is the infrastructure requirements for authoring, data connection, logins, VPN, etc. We wanted to enable our team to author content wherever they were without jumping through hoops. A benefit we never expected from Jekyll is that it allows completely off-line authoring. Because Jekyll ships with an integrated web server for the authoring workstation, our staff can author even if they are offline without a data connection. The final engagement hurdle to clear for our team was confidence. CMS systems can present steep learning curves and the specter of a mistake in configuration can mean that your content is ineffective. Jekyll's publishing and compositing system is so simple that it doesn't trigger the same anxiety that a complex CMS evokes.
The last consideration is content flexibility. How easy is it to move your content to another authoring solution? We feel that Jekyll's static file approach is more "future-proof" than using a content database. When we first performed our migration from Drupal to Jekyll, we tried to use a script to pull content. Because we used plug-ins for metadata and images, the tool was unable to pull all of our content. We had to write more scripts on our own, dig around in the database, and use Ruby code to piece things back together. Even after the code efforts, we still had to do a manual scrub of many files. Now that we're on Jekyll, if we ever decide to switch to a different CMS, it's difficult to imagine a more portable format for our content than Jekyll's static files. Each file contains all the content and configuration settings needed for rendering on our website. This is far more useful than piecing together content from a database.
One thing we didn't discuss in this article is the initial website development. At BHW we seldom develop websites from a template, preferring to build responsive websites best tailored to designer's visions using the appropriate tool for each job. We routinely build integrations and websites on systems like Drupal and Wordpress for clients, and from our experience there's no question that Jekyll is easier, which translates into cheaper. If your business can accommodate the higher technical bar that Jekyll imposes, it's undoubtedly a better value than a traditional CMS. There are countless successful websites running on Wordpress and Drupal. I'm happy to say that our website is no longer one of them.