Server Sync: scp, rsync, SyncThing and more!
July 11, 2023 |
File Server Synchronization
I want discuss a topic that has been of enormous importance in my life that wouldn’t usually be covered in a class. An auxiliary topic, like git or how to do your taxes.
In keeping with the theme of self-hosting your build servers and not letting Apple make obselete the hardware you paid good money for, this article discusses some technologies that make it easier for you to self-host a file server and (importantly) make backups of the data on that server.
One popular use case for a self-hosted file server is for a home media server. In this instance, a media server application such as Plex or Jellyfin also resides on the file server. Or, you may not have an interest in paying Apple 99ยข per month to back up the images on your phone to their servers, but still want to keep those memories. In which case, you could upload the photos to your personal file server. A third use case would be to contribute to internet preservation. Archive.org is threatened with lawsuits and imminent closure on a yearly basis. Other knowledge repository sites have already been seized. Finally, you may need to maintain servers for your employer.
scp
Secure Copy Protocol (scp) isn’t a syncing protocol, but it is useful for data transfer, especially after you’ve established an SSH connection to the target device.
The most common way I use scp is to recursively transfer folders using the -r switch. To use the command, specify the local directory and the target machine, username, and destination directory.
Let’s say our local machine has a directory on it called ComputerChronicles, which contains MP4s of episodes of Computer Chronicles (currently available for free on Archive.org) and we want to move the directory to our file server. Assume the way to connect to the file server would be by connecting via ssh pete@192.168.0.220
.
Copy the entire directory using the -r
switch.
scp -r ~/Movies/ComputerChronicles/ pete@192.168.0.220:~/Movies/
You can also transfer individual files:
scp ~/Movies/ComputerChronicles/CC1024_artificial_intelligence.mp4 pete@192.168.0.220:~/Movies/
It also works remote -> local:
scp -r pete@192.168.0.220:~/Movies/ComputerChronicles/ ~/Movies/
With any default scp or ssh command, you must supply a password to access the remote resource. To avoid this (and make the connection ready for scripting) consider setting up passwordless login.
rsync
rsync is the best utility I’ve found for keeping files in sync for backup purposes. I started off using it to mirror data between two external harddrives plugged into the same machine. I have one drive, called NTFS
, that I load media onto (like Computer Chronicles) and use to keep backups of my phone. But redundancy is crucial. So I bought a second drive, called NTFSBackup
. When I started I was manually copying folders between the two until I learned about the simple and elegant rsync command.
To achieve zen-like, “set it and forget it” functionality for two drives on the same machine, rsync works like this:
sudo rsync -av /path/to/NTFS/ /path/to/NTFSBackup/NTFS/
The default path for external drives will of course be different depending on your operating system. Also notice that the NTFSBackup
disk contains a directory with the same name as the original drive (NTFS
). This isn’t required, but I just find it makes managing the data easier. The backup disk could contain multiple disks, so each could have a label.
You can also rsync between drives on a network (local -> remote or remote -> local, but not remote -> remote directly). The remote rsync syntax looks similar to scp.
sudo rsync -av pete@192.168.0.233:/media/NTFS/ /Volumes/NTFSBackup/NTFS/ --exclude "Sandbox" --exclude "School"
The above command also demonstrates how to exclude directories so cut down on the time it takes to make the backup.
When I said that you cannot do remote -> remote backups, I mean that you can’t specify the following:
sudo rsync -av pete@192.168.0.233:/media/NTFS/ mark@192.168.0.220:/Volumes/NTFSBackup/NTFS/
However, using SSH, you can turn the backup into a local -> remote type, for example:
ssh pete@192.168.0.233
pete@192.168.0.233:~$ sudo rsync -av /media/NTFS/ mark@192.168.0.220:/Volumes/NTFSBackup/NTFS/
Syncthing
Syncthing
is a fantastic program for sharing and syncing network resources. I use it to synchronize documents across all the computers on my network. It’s basically a OneDrive replacement.
SyncThing is easy to get configured on macOS and Windows with official client applications that make SyncThing simple to enable and use. On Linux, you need to tinker around a bit more and
read the docs.
Once the SyncThing Server is running, you simply install the client on your other machines, discover the server, authenticate, and share folders among the network.
I find it valuable to sync my PDF library and my writing (such as the writings for this site) across all devices. That way, if I download a new PDF or jot down some notes in MarkDown, they go right to the server and are distributed among the clients any time they connect to SyncThing. I prefer a Hub-And-Spoke model for this program, where all clients connect to only one server. In the above image, I have two Remote Devices where plexserver
is the primary server, and nia-and-compute
is a backup that is not always online. Each client machine on the network would have a similar set up: i.e., one or two remote devices only.
I find SyncThing similar to keeping a remote Git repository and pulling the changes before starting work. Although, SyncThing (out of the box) does not offer much in the way of version control. It’s not a huge deal to me, as when working on code, because I primarily add rough notes and screenshots, with little regard for perfect grammar. I smooth that out when writing my MarkDown in VS Code before building to my Hugo site (sometimes).
Google CDC File Transfer
One last project to mention, although I have not tried it out, is Google’s open source tools called
CDC File Transfer.
These tools,
cdc_rsync
and cdc_stream
, are for syncing and streaming files from Windows to Windows or Linux. The tools are based on Content Defined Chunking (CDC), in particular FastCDC, to split up files into chunks. It was carved out of Google’s killed video-game streaming project, Stadia.
The CDC File Transfer tools include cdc_rsync
, a tool to sync files from a Windows machine to a Linux device.
It is basically a copy tool, but optimized for the case where there is already an old version of the files available in the target directory.
- It quickly skips files if timestamp and file size match.
- It uses fast compression for all data transfer.
- If a file changed, it determines which parts changed and only transfers the differences.
They claim that in their tests:
It is up to 30x faster than the one used in rsync (1500 MB/s vs 50 MB/s).
See the project GitHub for more information and happy copying, no matter which method you choose!