Home Server Upgrade (Part 3)
I wanted to feel confident that my system backups were working as I expected so I decided to roll my own solution using rsync.
Rsync
rsync - a fast, versatile, remote (and local) file-copying tool
If you have never heard of rsync
, it is basically
scp
but better and more efficient. It uses a delta-transfer
algorithm that is able to copy only the parts of the files that have
changed between the source and existing files. This reduces the amount
of data that is sent over a network.
Shell script vs. Python
I saw some example incremental backup bash scripts online but to me
they looked way too messy. For my own peace of mind, I decided to write
a wrapper over rsync
in python. This allowed me to write
tests to verify that files were being backed up correctly and abstract
the logic from declaring which files were being backed up.
Config File
The config file is just a python script that imports the wrapper. I
have mine located at
/home/ethan/incremental-backups/run_backups.py
and it looks
like this.
from rsync.config import Config
from rsync.backup import backup
= "/home/ethan/.local/share/backup/rsync-backups.log"
LOG_FILE
# will need to be root to run rsync for a different user
= Config(
HOME_DIRECTORIES ="/home/",
source_dir="/media/backup_drive_linux/home_directory_backups/",
destination_dir=[
exclude_file_patterns"/home/*/.cache/", # no need for cached files
"/home/*/.local/", # nvim plugins and python packages are huge
"/home/*/.npm/", # what is the crap in here?
"**/node_modules/", # yikes
],=LOG_FILE
log_file
)
= Config(
LARGE_HARDDRIVE ="/mnt/hd2",
source_dir="/media/backup_drive_linux/hd2_backups/",
destination_dir=LOG_FILE
log_file
)
backup(HOME_DIRECTORIES) backup(LARGE_HARDDRIVE)
Here I declare two Config
objects, one for the home
directories on the system and one for a second hard drive on the
machine. These two Config
s are backed up separately.
Personally, I think this is much cleaner than a shell script with all of
the bizarre quoting rules.
I use Syncthing to sync the files I am currently working on from any machine into the home directory of the server. In this way I only need to back the server itself up. Files that I no longer need on every machine get archived to the second hard drive on the server and are shared over the network using Samba.
Cronjob
I set up a cron job to run the script every night at midnight. As I
am backing up multiple home directories where I am not necessarily the
owner of the files, the cronjob must be run as root. To edit the system
crontab, use sudo crontab -e
versus just
crontab -e
for your user crontab.
MAILTO=ewrietz@gmail.com
# min hour day(mo) month day(we) [user] command
0 0 * * * /usr/bin/python3 /home/ethan/incremental-backups/run_backups.py
One problem that I ran into was the location of the log file that is
defined in the config file. Initially, I had used
Path.home()/incremental-backups/run_backups.py
in the
config, however as the cronjob was being run as root, the log file was
ending up at /root/.local/share/backup/rsync-backups.log
.
As I wanted the log file to end up in my user’s home directory, simply
using an absolute path in the config file solved the issue.
Log file
The cronjob has been running without at hitch since 06/08/2022. This is what the tail of the log file looks like for the job that ran this morning.
INFO 12/13/2022 12:00:01 AM Starting incremental backup from /media/backup_drive_linux/home_directory_backups/2022-12-12-00-00-02
INFO 12/13/2022 12:00:01 AM Running ['rsync', '--delete', '--archive', '--acls', '--xattrs', '--verbose', '--link-dest=/media/backup_drive_linux/home_directory_backups/latest', '--exclude=/home/*/.cache/', '--exclude=/home/*/.local/', '--exclude=/home/*/.npm/', '--exclude=**/node_modules/', '/home', '/media/backup_drive_linux/home_directory_backups/2022-12-13-00-00-01']
INFO 12/13/2022 12:00:03 AM Time elapsed 1.1290129330009222 seconds
INFO 12/13/2022 12:00:03 AM Finished backup successfully
INFO 12/13/2022 12:00:03 AM Symlink created from /media/backup_drive_linux/home_directory_backups/latest to /media/backup_drive_linux/home_directory_backups/2022-12-13-00-00-01
INFO 12/13/2022 12:00:03 AM Starting incremental backup from /media/backup_drive_linux/hd2_backups/2022-12-12-00-00-03
INFO 12/13/2022 12:00:03 AM Running ['rsync', '--delete', '--archive', '--acls', '--xattrs', '--verbose', '--link-dest=/media/backup_drive_linux/hd2_backups/latest', '/mnt/hd2', '/media/backup_drive_linux/hd2_backups/2022-12-13-00-00-03']
INFO 12/13/2022 12:00:04 AM Time elapsed 1.6626670630066656 seconds
INFO 12/13/2022 12:00:04 AM Finished backup successfully
INFO 12/13/2022 12:00:04 AM Symlink created from /media/backup_drive_linux/hd2_backups/latest to /media/backup_drive_linux/hd2_backups/2022-12-13-00-00-03
Backup Directory Structure
The output directory contains a folder for each backup named using the date and time that the backup occurred.
[ethan@rietz /media/backup_drive_linux/home_directory_backups]
~ ls | tail
2022-12-05-00-00-01
2022-12-06-00-00-01
2022-12-07-00-00-01
2022-12-08-00-00-01
2022-12-09-00-00-01
2022-12-10-00-00-06
2022-12-11-00-00-01
2022-12-12-00-00-02
2022-12-13-00-00-01
latest
There is one additional directory called latest
which is
a symbolic link to the latest backup directory. The latest
directory is used by rsync
. If a file is not changed since
the last backup, a hard link is created to that file in the
latest
directory.
As of today, I have 165 backups.
[ethan@rietz /media/backup_drive_linux/home_directory_backups]
~ ls | wc -l
166
If we do an ls -l
, it displays the number of reference
counts, or hard links, to that inode.
[ethan@rietz /media/backup_drive_linux/home_directory_backups]
~ ls -al 2022-05-24-21-50-17/home/ethan
total 68
drwxr-x--- 8 ethan ethan 4096 May 14 2022 .
drwxr-xr-x 4 root root 4096 May 10 2022 ..
drwxrwxr-x 3 ethan ethan 4096 May 9 2022 .ansible
-rw------- 165 ethan ethan 846 May 9 2022 .bash_history
-rw-r--r-- 165 ethan ethan 220 Jan 6 2022 .bash_logout
-rw-r--r-- 165 ethan ethan 3771 Jan 6 2022 .bashrc
drwxrwx--x 7 ethan ethan 4096 May 14 2022 .config
drwxrwxr-x 6 ethan ethan 4096 May 9 2022 .dotfiles
drwxrwxr-x 5 ethan ethan 4096 May 24 2022 git
-rw------- 165 ethan ethan 20 May 9 2022 .lesshst
-rw-rw-r-- 9 ethan ethan 2769 May 13 2022 logs.txt
-rw-r--r-- 165 ethan ethan 807 Jan 6 2022 .profile
-rw-r--r-- 9 ethan ethan 787 May 10 2022 server.md
drwx------ 2 ethan ethan 4096 May 9 2022 .ssh
-rw-r--r-- 165 ethan ethan 0 May 9 2022 .sudo_as_admin_successful
drwxr-xr-x 6 ethan ethan 4096 May 15 2022 Sync
-rw------- 165 ethan ethan 2050 May 9 2022 .viminfo
-rw-rw-r-- 150 ethan ethan 165 May 10 2022 .wget-hsts
lrwxrwxrwx 165 ethan ethan 24 May 9 2022 .xinitrc -> .dotfiles/linux/.xinitrc
lrwxrwxrwx 165 ethan ethan 24 May 9 2022 .Xmodmap -> .dotfiles/linux/.Xmodmap
lrwxrwxrwx 165 ethan ethan 25 May 9 2022 .xprofile -> .dotfiles/linux/.xprofile
lrwxrwxrwx 165 ethan ethan 27 May 9 2022 .Xresources -> .dotfiles/linux/.Xresources
lrwxrwxrwx 165 ethan ethan 25 May 9 2022 .xsession -> .dotfiles/linux/.xsession
lrwxrwxrwx 165 ethan ethan 22 May 9 2022 .zshenv -> .dotfiles/unix/.zshenv
We can see that the files that were there in the first backup have 165 hard links. There is a file system dependent upper limit on the number of inode references that can exist. On an ext4 formatted hard drive, that upper limit is the max value for a 16 bit number (65535). As long as I only do one backup per day, I should be good for the next 179 years.