Jekyll, Dropbox, AWS and IFTTT - easy blogging

(Note: This is going to be long, techy and of no interest whatsoever to anyone not actively looking for similar advice. It’s partly a follow-on to a similar post from years ago that still draws visitors looking for help, and partly an aid to my own memory if I ever do this sort of thing again. Most of you can ignore it as tl;dr.)

When I first switched to using Jekyll to power this site, I had a fairly slick setup relying on Dropbox to sync, add, and generally control things. Unfortunately, my host nixed the use of Dropbox as, even stopping it after sync, it classed as a daemon and on shared hosting that’s a no-no as they tend to eat system resources used by others. Fair enough.

And so things languished until the start of this year, when I happened on something talking about Amazon Web Services and using AWS’ EC2 instances to generate websites. For those of low-moderate techery, AWS is Amazon’s vast cloud server and storage arm on which half the internet is based. Users pay per hour of server uptime and for storage and traffic requests, rather than flat-rate monthly fees. (And there’s a free 12-month plan which gives you plenty of time to get to know the way the system works. Which is pretty straightforward - more complex than standard hosting, but clear and well-documented except for a few obscure oddities.) EC2 instances are temporary virtual servers that can be created, started, stopped, and deleted more or less at will. While up, they function exactly like your own personal server, free for you to install and run software on, with none of the use limitations of shared hosting. Keeping a T1 micro-instance (the smallest) running round-the-clock for a month costs about twice my existing hosting. But keeping one active a couple of times a day just to update and build the site, then pushing the results to Amazon’s S3 file storage and shutting down again, would cost peanuts.

So that’s what I’ve done. Since that original post continues to draw traffic from inquisitive Jekyll users, I’ll outline everything I’ve done on this setup here, and it’ll serve to jog my own whisky-raddled memory.

The aims

The site’s posts, layout and everything else is held in Dropbox. This is vital, because I can save/edit text files in Dropbox on my phone. It stops me being tied to my laptop to upload anything here.
Dropbox syncs between my devices and an occasional EC2 instance.
Once synced, Jekyll runs on the instance and rebuilds the site.
The site is then pushed to an S3 bucket set up to act as a website.
The instance then shuts down. This should happen a couple of times a day. Other than writing the initial text files, I shouldn’t have to do a thing.
Cross-posting between services like Instagram, Twitter, Facebook, as well as link post-sharing and generating link roundups, and sharing new posts to FB/Twitter happens automatically via IFTTT without lifting a finger.
The setup will therefore be fiddly to get right, but once done, usage should be almost effortless and the resulting site bulletproof.

The setup

From the AWS EC2 console, I launched a T1 micro-instance using the default Amazon Linux image. Most of these steps would be much the same on any of the others available (little differences like the default username on an Ubuntu instance being ‘ubuntu’ rather than ‘ec2-user’ aside). There’d be small changes to the errors you’ll encounter and the dependencies you need to meet to install stuff, but nothing a quick google won’t fix.

Dropbox is the first thing to install and set up, exactly the same as in the original post: follow the instructions on the Dropbox website to wget the command line program, create a secondary “blogger” Dropbox account and share your /Dropbox/myblog directory on your main account with it (so you don’t sync everything to your site), and link the command line installation to your “blogger” account. Your /home/ec2-user/Dropbox/myblog directory will become the source directory used by Jekyll to build your site each time.

Now install RVM, following the instructions on the site. (Don’t worry about setting a Ruby version; the default seems fine.) You may have to jump through a couple of hoops, but it should be easy enough. It should also properly add RVM to your $PATH; that seems to work fine. Add Rubygems via sudo rvm rubygems latest.

Now install Node (and npm). It should be in repos, so sudo yum install -y nodejs npm or similar may do the trick. You may have more hoops to jump through here, missing dependencies and such. It’s not too much of a slog, though; I survived.

Now you can finally gem install jekyll.

That gives you Dropbox and Jekyll. The other tool you’ll need is something to sync your /Dropbox/myblog/_site directory to an S3 bucket. While the base Amazon Linux image comes with the AWS command line tools installed, you want to install s3cmd instead. The reason is that it includes a “sync” option that only up/downloads files that have been changed. The AWS CLI allows you to push directories around, but doesn’t have that fine-tuning. Syncing saves time and bandwidth. Installing s3cmd for me meant downloading the .tar.gz, uploading it to the instance, untarring it, and then discovering that the Amazon Linux image doesn’t include the gcc packages needed to make/install by hand. If they’re missing on your instance, some sudo yum install work will eventually, tediously, have it sorted and then you can install s3cmd manually.

Then all you need are a couple of S3 buckets, one called yoursite.com and one called www.yoursite.com. Google for instructions here, but it’s very easy (literally a couple of checkboxes). The latter redirects to the former, and until you’re ready to switch, the former will be reachable at yoursite.s3.amazonaws.somethingorother.com so you can view the results for testing.

Create an IAM user (via the IAM entry in the services menu; this is actually very easy) with permissions to push/pull etc. to S3. Get the access and secret keys for it.

Run s3cmd configure to configure your S3 connection using those keys. This’ll allow the system to talk to your S3 buckets.

Sync and build on instance boot

First, run Dropbox. You can either do this as a cron command, as part of a script, or (as I’ve done for reasons I now forget, though I seem to remember some difficulty involving cron) by adding Dropbox to /etc/init.d.

To have Jekyll build the site after sync and then push the resultant site to S3, I have a script, ec2jek.sh, that runs through cron.

#!/bin/bash

# give time for dropbox to sync
sleep 60

# let the script know to use your RVM ruby environment

source /home/ec2-user/.rvm/environments/ruby-2.2.0

# build the site 

jekyll build -q --source /home/ec2-user/Dropbox/myblog --destination /home/ec2-user/Dropbox/myblog/_site
# (q just means do it quietly; we don't want errors shutting things down and you'll sometimes get non-critical issues from Markdown conversion)

# probably not needed, but let it sit for a moment

sleep 30

# sync the site with its version in your S3 bucket

/usr/bin/s3cmd sync --config /home/ec2-user/.s3cfg -rM --no-mime-magic --delete-removed /home/ec2-user/Dropbox/myblog/_site s3://yoursite.com
# the --config switch tells it where to find your credentials as set by the configure command, while --no-mime-magic stops it screwing up .css and .js files by guessing their filetypes wrongly and making them non-functional

This script runs through cron. Cron is a pain to use with anything that’s not root-installed and especially ruby gems installed through RVM. You have to declare the full PATH right at the start otherwise jekyll commands will fail silently. Enjoy the typing.

@reboot export PATH=:/home/ec2-user/.rvm/gems/ruby-2.2.0/bin:/home/ec2-user/.rvm/gems/ruby-2.2.0@global/bin:/home/ec2-user/.rvm/rubies/ruby-2.2.0/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:/home/ec2-user/.rvm/bin:/home/ec2-user/bin && /bin/bash /home/ec2-user/ec2jek.sh > /dev/null 2>&1

There’s a further script, which I’ll come to in a minute in the IFTTT section, that also runs through cron, going off after sync (hopefully), and before site build:

@reboot sleep 20 && /bin/bash /home/ec2-user/Dropbox/myblog/_posts/namefix.sh > /dev/null 2>&1

At some point, I’ll roll the contents of this one into ec2jek.sh to avoid any funny timing issues.

But wait! We also want the instance to shut down. Why no shutdown -h now command here? Good question! The answer is: because it doesn’t work. Shutdown requires sudo, and that requires a password as standard. I’ve given everyone rights to use that command without password by editing my sudoers file. I’ve tried every other trick I can think of or search for. No joy at all.

I can only get it to work in a root crontab. So sudo crontab -e and add:

@reboot sleep 3300 && /sbin/shutdown -h now

Since Amazon bills you one hour for instance start, even if it’s only on for five minutes, you might as well keep it running for (in this case), 55 minutes before shutting down. At some point I’ll add multiple build/s3 sync commands to the script above to take advantage of the extra uptime without running the risk of a shutdown mid-site build, which is no more than a little copy and pasting.

What about scheduling?

Ah, yes. We’ve got an instance that will automatically sync, check, rebuild and push the site on boot. We just need it to start up on automatic schedule a couple of times a day. Only problem is: I haven’t cracked this yet.

In theory you can use an auto-scaling group to auto-create little instances on schedule with user data (a script that runs on first boot, but not after) that uses the AWS CLI to start up the mothership instance. In theory. In practice, can’t get it to work. The little instances have EC2 credentials so their permissions should be fine for the aws start-instances command, but it just fails. And since you need an instance to start an instance, you’re doubling your billable instance hours.

Second option is this script that creates a scheduled launcher through Heroku. The instructions on Github are detailed and helpful, the script looks fine, but I can’t get it to work. It looks like ruby’s fog gem (which deals with AWS amongst other cloud services) doesn’t interact with AWS properly, again even with solid credentials (without them, it fails with a permissions error, with them it fails saying that server.id is undefined).

So at the moment, I use the AWS app on my phone. Two or three taps and the mothership instance is started. No real hassle, and since it shuts down automatically I don’t need to touch it again until next time. Not ideal, but much less hassle than it might be. (And the app’s good, too, FWIW.)

Linking everything up with IFTTT

IFTTT is wonderfully versatile when you have a site generated from text files sitting in Dropbox. Key is the “create text file” action in the Dropbox channel. Using that, you can pipe just about anything into it from any trigger.

For example, here’s my Instagram post creator recipe.

If: new photo by me

Then: create text file in /myblog/_posts

File name: (Caption)

Content:

---<br>
layout: post<br>
type: photo<br>
tags:<br>
- instagram<br>
- photography<br>
---<br><br><a href="(Url)"><img src="(SourceUrl)" alt="(Caption)" class="photo"></a><br><br>
(Caption)

(For Jekyll newbies, the stuff between the triple dashes is the .yaml front matter that sets post properties like type, tags, which layout to use, what the title (in this case none) is, etc. I want all my pics to have a similar format, and commonly include type: something in my front matter to enable me to style different types of posts in different ways (like hiding auto-set titles, which Jekyll didn’t use to have; I’m working on that at time of writing). On both my phone and my laptop, I use text expansion snippets so that all I have to type is nhhead and a whole front matter block is generated for me.)

So if I upload a picture of a cat to Instagram, captioned “play him off, keyboard cat”, IFTTT creates a file called play_him_off_keyboard_cat.txt in _posts that’ll display the photo and “play him off, keyboard cat” underneath here on my site.

But Jekyll ignores .txt files when it builds posts. It reads .md files, Markdown, and only if they have a YYYY-MM-DD-title-with-hyphens-for-spaces.md format. That’s where that namefix.sh script I have in my crontab comes into play.

#!/bin/bash
FILES=/home/ec2-user/Dropbox/myblog/_posts/*.txt

for f in $FILES
do

pth=${f%%/*}
tit=${f##*/}
new=`echo $tit|tr '_' '-'`

echo $new 

name=${new%.*}
ext=${new#*.}
today=`date +%Y-%m-%d`
newname=$today-$name.md         
echo $newname
mv -f "$f" "/home/ec2-user/Dropbox/myblog/_posts/$newname"

done

This script takes the name of any .txt file, replaces all underscores (which IFTTT adds for spaces) with hyphens, strips off the extension, adds the date in YYYY-MM-DD format in front of the name, and then gives it a .md extension and resaves it in _posts, erasing the original. Markdown is just a form of text file, and the two extensions are mutually swappable, so there’s no risk there. Any IFTTT-created text file is turned into Jekyll format without me needing to do a thing.

(Except when the title text, and hence filename, contains a period or a hyphen. A title that ends up being YYYY-MM-DD-some-words–double-hyphenated.or-dotted.md when fixed breaks the name convention and it won’t generate a post. I need to add in a trivial couple of extra lines to fix those rare cases.)

I have IFTTT recipes to turn links I share on Facebook tagged #snh and items I send to Pocket (which Reeder and Readkit, my RSS reeders, both share to natively) into link posts here, one that turns tweets I tag #nh into quote posts, one that turns videos I like on YouTube into embedded video posts, one to cross-post Instagram pics, one that will turn email I send to IFTTT into a text post, and one that appends any link I add to del.icio.us to a links roundup text file, and one that sends bit.ly links, which Tweetbot on iOS shares to, to del.icio.us to pipe down the same route (in the old days, I had a script to automatically turn link roundups into posts on a regular basis, but I’m not sure it’s needed for me at the moment). I also use IFTTT to autopost entries here to Facebook and Twitter through its RSS feed channel.

If I used other things - and I’ve thought about auto-sharing regular iOS photos, which should be possible - I could add those using similar recipes to the example Instagram one. The array of channels is frankly astonishing. (If IFTTT ever folded, I’m also aware that it’s possible to roll your own version of the same service.) IFTTT is so very, very easy to use, and having both that and Jekyll talking to Dropbox ties everything together.

Finally

Change the nameservers at your domain registrar to point at Amazon’s Route 53 DNS routing service (as per the easy-peasy instructions in their docs). When I did it, it took literally about five minutes for the DNS change to propagate. Blindingly fast.

Light a cigar/open a beer/sacrifice an octopus to the dread lord of the deep. Whichever takes your fancy. You’re done.

Conclusion

It takes some work to set up, but once the system’s good, you end up with a site stored as flat text files, immune to inbuilt security flaws and database errors (like the one that took down seancregan.com for two days last month without me knowing it), accessible through Dropbox (and thus always available, editable, checkable, on any device you have Dropbox on), that cross-posts and updates itself without any direct user input, served via the vast, fat, fast, always-on pipe that is S3. For next to nothing in outlay.

The Nameless Horror