I recently spent some time upgrading Feedbin’s hardware, and wanted to share the results.
Feedbin is a Ruby on Rails app, and for the most part Rails is constrained by the single-threaded performance of a CPU. So for the application servers that means favoring higher-clocked CPUs at the expense of number of cores.
The application servers use Intel Xeon E3-1270V6 CPUs. This is a four-core 3.8Ghz CPU. Each application server is configured with 16GB of DDR4 RAM. The application servers primarily run Unicorn and Sidekiq. This amount of RAM gives the processes plenty of room to spread out.
One measurement that is CPU constrained is view generation.
This chart shows the 95th percentile performance of all views generated by Feedbin. 95th percentile means that 95% of all views were generated in less time than the line in the chart. Michael Kopp wrote a good overview of how average performance can be misleading and the benefits of measuring in percentiles instead.
Looks like the new hardware gives us a nice 20%+ boost in performance.
The database servers were also upgraded. These servers are configured with dual Intel Xeon Gold 5120 CPUs. Each CPU has 14 2.2Ghz cores. With hyperthreading, that gives them an embarrassing 56 usable threads. They also have 64GB of DDR4 RAM and use Intel S3710 Series SSDs for primary storage. They each have secondary SSDs used for PostgreSQL’s write-ahead logging, so more disk I/O is available for queries.
This upgrade gives us another nice boost. If you look at the graph, you’ll notice a daily spike. These are the times when the database server is running vacuumdb. This is an important maintenance process that is used to reclaim space and optimize query planning.
Before the upgrade, vacuum caused a significant delay. After the upgrade, the performance at its worst is the same or better than pre-upgrade performance at its best.
Overall I’m happy with the upgrades. The hardware is more expensive now, but it’s important to me that Feedbin always performs well. If I can achieve that by just spending a little more money, then it makes that decision easy.
If you’re not familiar with Micro.blog, here’s how its creator, Manton Reece, describes it:
A network of independent microblogs. Short posts like tweets but on your own web site that you control.
Micro.blog is a safe community for microblogs. A timeline to follow friends and discover new posts. Hosting built on open standards.
The experience of using Micro.blog is like the early days of Twitter, in all the best ways.
Micro.blog is good for blogging, because it acts as sort of gateway-drug into that habit. Say you start off just using it for Twitter-like microposts, but then you realize you have more you want to say. Micro.blog detects the length of your post and prompts you to add a title, turning that post into a full-fledged blog post.
The closest service that I can think of is App.net. However, Micro.blog is different in important ways.
You can now subscribe to Twitter content in Feedbin.
Tweets have become media rich, with support for multiple photos, videos and links. However, traditional Twitter clients are limited to showing tiny thumbnails and plain links. They make it too easy to mindlessly scroll through endless inane thoughts.
Feedbin treats tweets differently. The idea of the feature is to fully unpack the tweet. If a tweet links to an article, Feedbin will attempt to load the full article and display it alongside the tweet. Feedbin will also include full-size images, videos and gifs with native YouTube, Vimeo and Instagram embeds.
You can start adding Twitter content to Feedbin the same way you would subscribe to a feed. Feedbin will recognize any Twitter URL that contains tweets. It also supports shortcuts for subscribing directly to twitter @usernames as well as #hashtags. For example:
To achieve the best possible experience, I have a few recommendations:
The best stuff on Twitter exists in the form of media attached to tweets like links and images. Feedbin includes a built-in filter that will only show you these tweets. The filter is on by default, but when you subscribe you’ll be able to choose to see all tweets instead.
Follow fewer accounts in Feedbin. Rather than following your entire home timeline, try creating a Twitter list that only includes a few of your favorite accounts.
Twitter is deeply integrated with Feedbin and tweets include a number of new searchable fields. Using these fields you can easily find and filter tweets:
twitter_media:true|false (link or image)
I’d be interested to hear your feedback on this feature. Get in touch!
The experience of using Feedbin in Mobile Safari has been improved too. You can now swipe horizontally to navigate between the panels.
Feedbin used to support Add to Home Screen, but an update was made to home screen web apps that prevented you from navigating between pages. This meant the feature really only worked for single-page sites, because any attempt to login would kick you out to Safari.
Articles often link to other websites and blogs. I’ll usually open these links in a new tab as I go, to read what the links contain. However, I like to do all my reading in Feedbin because it’s a pleasant and consistent reading environment.
This feature adds the ability to view the contents of a link, all without leaving Feedbin. Only the article contents are displayed, so anything loaded this way is optimized for reading.
JSON Feed is an alternative to the RSS/Atom formats. The great thing about JSON Feed is that it encodes the content as JSON instead of XML. This is good because parsing and writing XML feeds is hard.
The specification has a small surface area and is a great piece of technical writing. You should check it out. If you publish a website, consider offering a JSON Feed alongside your RSS feed.
One of the criticisms I’ve seen of JSON Feed is that there’s no incentive for feed readers to support JSON Feed. This is not true. One of the largest-by-volume support questions I get is along the lines of “Why does this random feed not work?” And, 95% of the time, it’s because the feed is broken in some subtle way. JSON Feed will help alleviate these problems, because it’s easier to get right.
I also want JSON Feed to succeed because I remember how daunting RSS/Atom parsing were when building Feedbin. If JSON Feed was the dominant format back then, it would have been a non-issue.
This command pushes the database to S3. This functions as a base backup that when combined with the WAL archives, that are continuously uploaded, can be restored to any point after the base backup started.
WAL-E offers a counterpart command, backup-fetch, to actually restore the data from a backup-push. To test the backups I needed to build in an automated way to restore the database.
Feedbin already uses Digital Ocean for a few things, so my first thought was to use Digital Ocean for this. I wrote a script to provision a Digital Ocean server, restore the backup to it and then delete the server after the backup completed.
This turned out to be too expensive. Sending data to S3 is free, but reading it back out will cost you. For Feedbin’s database this worked out to be about $40 every time I restored the database. I wanted to test backups daily but the data transfer cost would quickly add up to about $1,200/month.
While I was looking at S3 pricing, I found out that reading from S3 is free when it is read by an EC2 instance in the same region as your S3 bucket. It was also possible to save money on the EC2 instance itself by using Spot Instances instead of on-demand. With Spot Instances you bid on your instance and AWS tells you if you can have it for that price or not.
Critically, no matter what your bid is, you never pay more than the spot price which is “The current market price of a Spot instance per hour.” With this in mind you don’t have to guess what to bid and since your bid matches the on-demand price, your instance will never be terminated early due to the price exceeding your bid.
The instance I want costs $0.78/hr so I bid $0.78/hr, but only end up paying the spot price of ~$0.18/hr.
I was new to the AWS CLI, but once I figured out the right data to send, it turned out to be a fairly simple script.
This puts in a request to launch a c4.4xlarge instance with 800GB of storage. It also specifies UserData which is executed by the instance after it boots up, which is a perfect fit for the script that configures the machine to run postgres and restore the database backup.
#!/bin/bash# pg_restore# Add postgresql.org offical releases as an apt source
sh -c'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
apt-get install-y wget ca-certificates
wget --quiet-O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Install dependencies
apt-get -y update
apt-get install-y postgresql-9.2 postgresql-contrib-9.2 postgresql-server-dev-9.2 \
build-essential python3-dev python3-pip libevent-dev daemontools lzop pv ssl-cert
# Make the postgres user a sudoer so it can shut down the machine laterecho"postgres ALL=(ALL) NOPASSWD:ALL"> /etc/sudoers.d/postgres
chmod 440 /etc/sudoers.d/postgres
# Install and configure WAL-E
python3 -m pip install wal-e[aws]
chown-R root:postgres /etc/wal-e.d
# Download the latest backup
service postgresql stop
envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-fetch --pool-size=16 /var/lib/postgresql/9.2/main LATEST
# set the postgres recovery settingssudo-u postgres bash -c"cat > /var/lib/postgresql/9.2/main/recovery.conf <<- _EOF_
restore_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch --prefetch=16 \"%f\"\"%p\"'
recovery_end_command = 'mail -s \"Database Restore Complete\" email@example.com && sudo shutdown -h now'
service postgresql start
This is all that is needed to stand-up a fully functioning PostgreSQL server and restore the database. No Chef, Ansible or any other provisioning tools required.
The important part here is that postgres lets you specify a command to run once recovery is complete, the recovery_end_command.
Here I have it send me an email and shut down the server, which terminates the EC2 instance so it’s no longer incurring cost.
If the email goes missing, then I know the restore never completed and I can go figure out what went wrong. AWS helps you out here too. The results of the UserData script are automatically logged to /var/log/cloud-init-output.log So you can seen exactly where the restore went wrong.
I would be interested in hearing any questions or comments about this.