syscheck
How do I get syscheck?
For downloads, use the following links:
What's syscheck?
I often find myself hopping from server to server, to see whether jobs
have crashed, to check that daemons are running, what not. Being lazy
as I am, I'm always trying to make such systems "self-healing" in the
sense that possible errors should be auto-detected and fixed. Yes, I
admit - being called at night for support is something I avoid like
the plague.
There are many tools to accomplish this task, from tiny to fullblown - e.g.,
cfengine is a truly great one. But in many situations I'll log onto a server where such tooling isn't available, and I find myself needing something small, portable, quick to set up, and just fit for the job.
In such cases I tend to use
syscheck:
- It's small, just one script and one configuration file.
- It's portable - it'll run on all systems where simple basic Perl (5+)
is installed. That's basically everywhere.
- It's versatile - it'll run in single-shot mode or as a daemon.
- It does what I need it to do - it's flexible enough to provide
the "self-healing" mode that I need, but lean and mean enough not
to be bloated.
Well, it works for me. Feel free to grab it and try it out, feel free
to modify and extend it.
Syscheck is distributed under GPLV3,
which basically means that you're free to use it without cost or
warranty, that you're free to re-distribute it as long as you don't
change the licensing, and that you're free to modify it as you see
fit, but in that case you're obliged to make your changes public. If
you modify
syscheck, I'd appreciate hearing about it - just
drop me a mail.
How to use syscheck?
- Install syscheck in any
directory you like on your system. E.g., /usr/local/bin makes
sense.
-
Copy syscheck.conf.sample
to the same directory, and rename it to syscheck.conf.
- For a quick overview, to see 'usage' information, just
type
syscheck
- Modify your syscheck.conf to suit your needs; the stock
file syscheck.conf.sample is pretty self-explanatory (more on
the configuration is described below).
- Test it out. Run
syscheck test
If you need more
verbosity, run syscheck -v test
- For a real run which would verify that all your daemons are
running (and which would restart them if necessary), run
syscheck go
- Once you're satisfied, enter the following line in your crontab
definition (assuming you want syscheck to run each 5 minutes):
*/5 * * * * /usr/local/bin/syscheck go
- If you aren't allowed to use cron, then type
syscheck daemonize 300
(where 300 is the sleep-period between checks).
When syscheck runs in daemon mode, then sending signal 1 (HUP)
to it, instructs it to re-load the configuration.
What's in the configuration?
Syscheck mainly does the following for you:
- It lets you gather the output of any given command into a
list (this is the populate command);
- It lets you match the list against something you'd expect when a
daemon is running (this is the expect command);
- It lets you run a corrective command when the expected output
wasn't matched in the list (this is the correct command).
Additionally there are these two things:
- The configuration can inject any Perl code into a
running syscheck process (this is the eval command).
This is used for very specific situations, or e.g. to set the
PATH.
- Unconditional commands can be run using system; i.e.,
such commands are always executed, whether some output is found in a
given list, or not.
The formal syntax of all commands is:
- populate
- Syntax: populate LISTNAME COMMAND. The output of the
command is stored in the named list.
- expect
- Syntax: expect LISTNAME REGEX. The regular expression is
searched for in the named list. Expect statements can be
repeated to match multiple regular expressions. The outcome is
"true" when all matches are made (so it's a logical 'and' match).
- correct
- Syntax: correct COMMAND. The command is executed when one
or more of the previous expects failed to match.
- eval
- Syntax: eval PERLCODE. This is used to special cases to
'inject' Perl code.
- system
- Syntax: system COMMAND. The command is run
unconditionally.
Examples
The usage of the commands is best illustrated by examples. Below is a
configuration that checks whether
httpd is present in the
process list. If not,
apachectl start is executed. This of
course checks that Apache is up and running.
# Get the process list (output of ps ax)
populate pslist ps ax
# Check that httpd is in that list, if not run apachectl start
expect pslist httpd
correct apachect start
Once a list is available, then the same list can be re-used.
Next
expect/corect combo's can re-use the list
pslist.
Also, lists can be constructed from any output; be creative. The
following example probes whether Apache is running by (a) examining
the process list and searching for
httpd, (b) fetching the
output of http://localhost/ and searching for the
string
<html>.
# Get the process list
populate pslist ps ax
# Get http://localhost/
populate httpoutput curl http://localhost/
# Match httpd in the processes and <html> in the http output
expect pslist httpd
expect httpoutput <html>
# If one or both are not found, correct the situation. Kill off
# any misbehaving Apache processes first.
correct killall -9 httpd; sleep 1; apachectl start
Below is an example of
eval. Imagine a hypothetical
program
mydaemon which won't run unless the environment
variable
MYHOME is set. Assuming that restarts
of
mydaemon would not work without that variable, then you'd
have basically two options: (a) set
MYHOME before
calling
syscheck, or (b) use
eval in the configuration.
These options are equivalent. Below is an example of using
eval
in the configuration file:
# Get the process list
populate pslist ps ax
# Set MYHOME
eval $ENV{MYHOME} = '/opt/mydaemon/etc';
# Scan for 'mydaemon' in the process list, if not found, start it
expect pslist mydaemon
correct /opt/mydaemon/bin/mydaemon