I am still testing this setup so please be aware to have adequate testing before trying to put it in production and share your valuable thoughts in comments if you have better idea for implementing same thing. As we know, Redis is an open source, BSD licensed, advanced key-value store with lots of features which makes it preferred choice. Persistency (disk dump) is one of them. In a busy, loaded Redis server, persistency can have its own noticeable overhead and impact performance.
How can we have data reliability/recoverability without having overhead of persistency in working redis server?
We are talking about non-transient data here which is important for application. There can be lots of approaches. You can tune the disk dump frequency settings in redis.conf to have a fine balance of things, this will do the trick for Most of times. You can also have a master slave setup where master do the work and only slave have disk dump enabled, this looks acceptable until one fine day where master server reboots or redis service restarted, and find nothing in its datadir. Consequently slave also get informed that no data exists so data will be wiped out from slave as well. Though you can recover data from old disk dumps from slave but this can be disastrous. One way I can think of to avoid such incident is to rsync data (dump) from slave to master datadir. There would be some latency and you can loss some data but in our case this looks like fine balance between performance and reliability. Here are the steps to implement this along with script to do the sync:
Step 1. Turn off persistency in Redis master server
You can do this using config set commands so no need to restart redis service. Do not forget to update redis.conf also:
$ redis-cli config set save "" OK $ vim /etc/redis/redis.conf save "" #save 900 1 #save 300 10 #save 60 10000
Step 2. Make sure persistency (disk dump) enabled in slave server (by default they are enabled). Set up password less ssh access from master to slave server so rsync can work smoothly. You can follow this guide to have that in ubuntu.
Step 3. In Master server, put this script and update its variables to point to correct datadir etc.
$ cat redis-sync.sh #!/bin/bash #j# # redis-sync.sh : sync redis db from slave to master since persistency in master is disabled. #j# LogFile="/var/log/redis/redis-sync.log" LogRotate=3 # keep log for these number of days LocalDataDir="/data/db/redis/" RemoteDataFile="/data/db/redis/dump.rdb" if [ $# -ne 1 ]; then echo 'Usage: redis-sync.sh <serverip> or logtrunc' exit 0 fi # it's a good idea to truncate self log file to avoid disk space alert due to it. if [ $1 == "logtrunc" ]; then LinesToKeep=`echo $LogRotate*24*60 | bc` tail -n $LinesToKeep $LogFile > $LogFile.new rm -f $LogFile mv $LogFile.new $LogFile echo "$(date +"%b %d %R") Log file $LogFile truncated to have last $LogRotate days of logs only." >> $LogFile exit fi Instances=`pgrep redis-sync | wc -l` if [ $Instances -gt 2 ]; then echo "$(date +"%b %d %R") $Instances instance(s) of this script already running, exiting to let them complete." exit 1 fi # Generally redis db dump file is not readable by other user/group, hence give read permission to it ssh ubuntu@$1 "sudo chmod +r $RemoteDataFile" if [ $? -ne 0 ]; then echo "$(date +"%b %d %R") Error: Communication failure or unable to make dump readable at $1." >> $LogFile exit 1 fi # sync dump from slave, redirect output to $LogFile for debugging else trash it rsync -avz ubuntu@$1:$RemoteDataFile $LocalDataDir > /dev/null 2>&1 if [ $? -ne 0 ]; then echo "$(date +"%b %d %R") Error: Unable to rsync $RemoteDataFile from $1 server!" >> $LogFile exit 1 fi touch $LocalDataDir echo "$(date +"%b %d %R") OK: redis dump $RemoteDataFile syncd from $1." >> $LogFile
Supply slave server ip/hostname to script for sync, let’s do a test run, check log file for messages:
$ ./redis-sync.sh 10.0.0.xxx The authenticity of host '10.0.0.xxx (10.0.0.xxx)' can't be established. ECDSA key fingerprint is d0:e7:0a:a1:56:32:ca:ec:f1:e3:90:7d:1b:56:0b:5b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '10.0.0.xxx' (ECDSA) to the list of known hosts. $ tailf /var/log/redis/redis-sync.log Apr 22 05:44 OK: redis dump /data/db/redis/dump.rdb syncd from 10.0.0.xxx.
Looks fine. You will get error message in log file in case script failed to work properly.
Step 4. Put script in cron, also to avoid getting lots of disk space consumed by script log file, call it with logtrunc option to truncate that. You can set no. of days in script for which you want to have logs or can have automatic rotation.
$ crontab -e * * * * * /root/custom-scripts/redis-sync.sh 10.0.0.xxx # truncate log file 0 4 * * * /root/custom-scripts/redis-sync.sh logtrunc
Step 5. You can disable auto starting of Redis service upon machine reboot, first call sync script to sync dump from slave and then start the service. Although this doesn’t have specific benefit since if master goes down, slave won’t do anything but in rare cases where data dir/dump in master got corrupted or get missed, it a good idea to take fresh from slave.
$ update-rc.d redis-server disable $ vim /etc/rc.local /root/custom-scripts/redis-sync.sh 10.0.0.147 /etc/init.d/redis-server start
As mentioned earlier, this is just quick fix. Please share your thoughts on shortcomings/better approach, if any.