OPINION
FILESYSTEM MONITORING
Gabriel Krisman Bertazix is a senior software engineer at Collabora.
"While filesystems developers do their best to avoid corruption, it’s impossible to completely protect a system from accidental issues. Whether they’re caused by random bit flips, disk crashes or software bugs, users don’t enjoy losing their data for no reason. This is why filesystem developers put a huge effort in not only testing their code, but also in developing recovery tools. In fact, all persistent filesystems deployed in production are accompanied by some support infrastructure.
When an error happens, administrators and recovery daemons must be notified ASAP so they can begin emergency recovery procedures, like recover from backups, rebuild RAIDs, replace disks or run fsck. When one needs to watch over a large quantity of machines, like in a cloud provider with hundreds of machines, a reliable monitoring tool is essential.