Moving Never Ending Voyage to https

Recently I secured our travel blog Never Ending Voyage, which is built on WordPress and runs on Apache on a Linode instance.

Contents

Setting Up

Initially, it was sharing a Linode with a few other sites that we own. When I first started looking into securing our sites last year, I experimented with self-signed certificates and found that sharing secure and non-secure sites on the same server can cause various issues to appear.

I used Let’s Encrypt which allows you to secure as many sites on the same server as you’d like but I decided that each of our major sites should have their own Linode instance, so step 1 was moving the entire site onto a smaller instance.

Back to Contents

Trial Run

Never Ending Voyage is the cornerstone of our business and major outages or any kind of security warning would be very bad. It also has a lot of content: over 700 posts, over 50 pages, and over 14300 comments, so I wanted to make sure that everything worked before attempting it on our live site.

On this new Linode, I set up a new sub domain, an apache virtual host pointing to it with a .htpasswd gate, and re-created our entire site on this sub domain.

In order to get this to work, I have a few MySQL commands that I run to convert everything to the new domain:

UPDATE `$databaseName`.`wp_posts` SET post_content = REPLACE(post_content, 'http://$remoteURL', 'http://$localURL');  
UPDATE `$databaseName`.`wp_options` SET option_value = REPLACE(option_value, 'http://$remoteURL', 'http://$localURL') WHERE option_name='siteurl';  
UPDATE `$databaseName`.`wp_options` SET option_value = REPLACE(option_value, 'http://$remoteURL', 'http://$localURL') WHERE option_name='home';

It’s important to note that, unlike the posts table, it’s a bad idea to blanket change all the data in the options table to the new domain. This can corrupt the serialisation that themes and plugins use to store options data (meaning that widget and plugin data can simply disappear).

Back to Contents

Preparation

One of the biggest issues when dealing with secure sites is mixed-content warnings, where non-http assets are loaded in to a page that is being served over https. How these warnings are handled by the browser depends on the kind of content being loaded. Passive content such as images will usually result in the padlock indicator in the address bar disappearing. Active content like iFrames may result in the content not being loaded at all.

Back to Contents

Images

Never Ending Voyage uses a lot of images. WordPress inserts these images into the posts as absolute URLs, so I thought this step would involve huge and potentially damaging database find and replace queries to swap them out.

Fortunately, this was not necessary. We use Amazon CloudFront as a CDN, and W3 Total Cache rewrites all of the image URLs to this CloudFront address automatically.

One thing I did find was that I was using a CNAME entry in my DNS records to redirect the original CloudFront domain into something a bit nicer (images.neverendingvoyage.com).

Unfortunately, https isn’t supported doing this unless you pay Amazon a bunch of money to set up certificates on their end to recognise this URL, but switching off that setting in the W3 Total Cache CDN settings fixed it and, just like that, all the images were loading securely.

One other interesting thing I found was that the expires headers that I had set on my S3 bucket were not being forwarded when requesting over http but were when I started requesting https.

The one downside to relying on this on-the-fly rewriting of the image URLs is that all of the raw content in the database itself still references the non-secure http://www.neverendingvoyage.com address. While I could use the above MySQL queries to update the posts table, messing with the live database like that makes me nervous and, as everything is working OK now, I’m inclined to leave it as it is for the moment.

Even if I did turn off W3 Total Cache and CloudFront, these non-secure image URLs should not trigger a mixed-content warnings as their requests would be rewritten to https by Apache.

Back to Contents

iFrames and Other Assets

I knew we used some external iFrames like Google Maps in our site but I didn’t have a complete list of what posts and pages they appeared on. I also wasn’t sure if there were other assets being pulled in that I had forgotten about, so I took a database dump of the posts table and threw some regex at it to pull out every single URL in every published post and page on the site.

This…was a little overboard. The text file was massive, the many RegEx queries I used to cut out the URLs took forever to complete, and most of the URLs were either straightforward links or images and could be ignored but I did find a few of the old Google Maps iframes that were still using old http addresses.

There were also a few assets I was loading form some of our other servers, copying those assets into the Never Ending Voyage media library fixed those.

A more reasonable approach, and one that I’m using going forward, is just to have a list of around 40-50 posts and pages that include the most popular (according to Google Analytics), the most representative, and those that have any special formatting or additional features, like maps or book previews.

Back to Contents

Widgets

None of these things caught the URLs used in widgets. W3 Total Cache doesn’t rewrite those URLs and their content isn’t stored in the posts table. Any URLs need to be changed by hand.

Don’t forget to check your widgets.

Back to Contents

Let’s Encrypt and Certbot

Certbot and Let’s Encrypt make the actual process of installing certificates and setting up redirects very easy. Answer a few simple questions and it rewrites everything and it all works.

Back to Contents

Performance

General

There is a performance penalty to switching to https in terms of server load, which I noticed almost immediately after switching it on. I thought that the basic Linode plan coupled with CloudFront would give us plenty of headroom but the CPU and memory usage has gone up significantly and stayed up, which is something to be aware of.

Back to Contents

Redirects

Let’s Encrypt asks if you want to force redirects to the https version of the site, which I did. However, because I was using a .htaccess file to redirect non-www requests to www, this could trigger the following convoluted chain of redirects:

http://neverendingvoyage.com -> https://neverendingvoyage.com (from vhost configuration) -> http://www.neverendingvoyage.com (from .htaccess) -> https://www.neverendingvoyage.com (from vhost configuration)

While I was researching all this, I discovered that .htaccess is not the recommended way to redirect things anyway and it’s much more performant to do this in the vhost configuration file if you have access to it.

Here’s the final ruleset I used (from Simone’ Blog, which includes an excellent walkthrough on how this works):

RewriteEngine on  
RewriteCond %{HTTPS} off [OR]  
RewriteCond %{HTTP_HOST} !^www. [NC]  
RewriteCond %{HTTP_HOST} ^(?:www.)?(.+)$ [NC]  
RewriteRule ^ https://www.%1%{REQUEST_URI} [L,NE,R=301]

If the request is not using https, or the host doesn’t include the www., grab all of the characters from after the possible www to the end of the host part of the URL (which grabs things like www.subdomain.example.com) and reconstruct the url using https and with www at the start.

The flags in the rewrite rule have the following meanings: NE is the noescape flag (don’t escape special characters), the L is the last flag (stop processing rules here), and R is issue a redirect to the browser (in this case, 301, which is the permanent redirect—this will tell search engines and the like to start updating their indexes to use the new https domain).

Back to Contents

Additional Things to Consider

SEO

Google sees http and https as two different entities. If you’re using Google’s Webmaster tools to manage your indexing, then you can either request to move a property from http to https or just simply set up a new one and start over. Re-submit your sitemaps and the re-indexing will begin again, then pay attention to any errors and get them fixed.

Back to Contents

Traffic

Reports on the impact of moving from http to https vary—some get a bounce in traffic, others have seen their traffic drop temporarily. We’ve been secure for about a month now and have not experienced a significant change in traffic volume.

Moving to https is easier and cheaper now than it ever has been thanks to the efforts of organisations like Let’s Encrypt. Many managed and shared hosting platforms are beginning to offer it as part of their packages or as an inexpensive add-on and, even using WordPress with 7 years of content, it was a surprisingly painless process.

Back to Contents