About 2 years into our startup, our database servers very doing more than 97% CPU load all the time, and it was impacting our overall database read performance DRASTICALLY.
At this stage we had two options:
- Create a read replica for serving all read queries.
- Introduce a caching layer between the application and the database.
We went ahead with the first option, and it did decrease our data fetch latency, however it introduced new problems( including the time lag between the master and the replica to sync), plus we still were hitting our database to get data, though just not the primary.
So as an added optimization , we decided to introduce a database caching layer in between, which would save the trip to the database itself.
After much deliberation , discussion and disagreement between the team we decided to use Redis over Memcache.
Reason for using Redis:
- On disk persistence: Though this might seem like a nonsensical feature to be available in a caching tool, however in the past Memcache when restarted , used to put a whole lot of load on the database before actually filling up with data. However with Redis there was no such issue. Since it was on disk, the number of misses while fetching data from Redis were much less. Hence the database load remained more or less moderate.
- Multi Cluster Deployment: Though even memcache had this, but the pure ease of using Redis in this kind of a deployment scenario was the reason for this choice.
- Elaborate data structure support: This is by far Redis’s most amazing feature. It’s ability to store elaborate data sets and structure really took away the overhead of parsing data in certain format for the caching layer to understand. This again helped us lessen the development time, and more or less get the system up and running faster.
There’s a very nice tutorial about how to install Redis on Ubuntu here.
Things to avoid doing in Redis:
Though Redis is inherently fast, however there are a couple of things which should be avoided to churn out maximum performance from Redis:
- Keep key entropy as high as possible: You can think of redis as a giant hash table, thus it would be worth your time to try keeping the overall entropy(read randomness) of all keys highs, so that there is least amount of clash
- Inserting large number of records from Redis: A lot of time I see developers carelessly writing Redis inserts in a loop. Now I get the fact that Redis can be blazingly fast, but that does not justify its abuse to the point of bad performance. Ensure all your large inserts happen on bulk updates.
- Calculate your caching server memory carefully: Redis since being in memory primarily holds all its data in the RAM, however when persisting to the disk, it forks a new process which persists all the data from the RAM to the disk. Therefore make sure the total number of memory available on your cache server is DATA x 2
Redis is a great tool, but again I would like to point out, the best way to use it and extract maximum benefits out of it is not to abuse it.