Most of the code is written in Perl, using the SNMP.pm module to interface to the NET-SNMP (formerly UCD-SNMP) package to gather SNMP data. (Originally Tcl scripts using the Scotty extensions for network management were used.) A MySQL database is used to store most of the data, with a little held in flat files. (Previously mSQL was used but proved unstable.)
Some of the collection scripts run continuously, some several times an hour, some daily. Some involve collection of data from other machines using SSH: this includes the scripts which gather NIS information as the NMS machine itself does not run NIS (to avoid dependency on another machine which may not be available).
The NMS machine has an Apache web server to present information and interact with users using a mix of pages generated periodically (for time-consuming reports such as for all hosts and all users) and pages generated on-the-fly from CGI scripts (such as individual host, user and device SNMP queries). (The Apache server should be upgraded with the mod-perl module to allow faster execution of the cgi scripts.)
The module AppConfig which "allows complex data structures in a 'standard' unix format" might be a better choice |
Output of programs is directed to logfiles, constructed from the programs' names with datestamps in the form YYYYMMDD, located in the configured log directory (default /usr/local/myNMS/var/log unless specified otherwise in the configuration file).
KeepAlive traps signals INT and HUP: both cause it to close down all running programs; after HUP the programs are restarted, whereas after INT the program removes its own .pid file (in the configured var directory) and exits.
If run with command line parameters -HUP, -INT or -KILL, or -SUSPEND, the program finds (from the .pid file) the PID of the running instance of itself and sends a HUP or INT signal to it. In the case of the SUSPEND option it then remains running (in a loop) until it is itself terminated by an INT or KILL signal, so that it prevents the normal re-starting of an instance by cron. This allows the programs it runs to be suspended without having to alter either the KeepAlive config file or crontab.
KeepAlive itself is (re)started every minute from cron: on startup it checks to see if a copy of itself is already running (by reading the .pid file and checking that a program of that PID is running) and exits if it is.
Which commands are run is specified in the KeepAlive configuration file etc/KeepAlive.cf: this is re-read every 10 seconds (or whatever interval is configured in the KeepAlive script) and any changes are acted upon by killing any commands no longer found in the config file and starting any new commands.
The module Proc::Simple would probably do the job better than the current implmentation:
Proc::Simple helps controlling background processes in Perl. It provides "Process Objects" that mimic their real world counterparts. You don't have to deal with fork and wait and friends, Proc::Simple is very easy to use, you just start processes in background, poll their status once in a while and kill them if necessary.However Proc::Simple requires Perl 5.6, which was not available on the development system. |
This is currently invoked from cron by the wrapper script Query_Hosts
This is currently invoked from cron by the wrapper script Query_Users
The routines getting/updating devices' SNMP info use a data structure - usually called %SNMP - which
is structured thus:
$SNMP {deviceID} deviceID, assigned as the unix time when the device was discovered
{mtime} time (in unix seconds) when device info updated from live SNMP query (of system group)
{sysName} }
{sysDescr} }
{sysContact} } usual system group variables
{sysLocation} }
{sysObjectID} }
{sysObjIDtxt} version of sysObjectID translated to text string (not currently implemented - used to get this from Scotty)
{UpSinceTime} sysUpTime translated to an absolute time in unix seconds
{ifNumber} number of records in ifTable:
{ifTable} a set of data structures, indexed on ifIndex values, i.e.:
{ifTable}{$index} each comprising the elements:
{ifDescr} }
{ifType} }
{ifSpeed} } values from the SNMP ifTable group
{ifPhysAddress} }
{ifAdminStatus} }
{ifOperStatus} }
{ifLastChangeAt} ifLastChange translated to an absolute time in unix seconds
{ifName} from the ifName table
{ipAddrTable} a set of data structures, indexed on IP address i.e.:
{ipAddrTable}{$Addr} each comprising the elements:
{ipAdEntIfIndex} ifIndex with which this address is associated
{ipAdEntNetMask} netmask associated with this address
in addition when information for this device is retieved from the DB it is added to the structure thus:
{DB}
{deviceID} }
{sysName} }
{sysDescr} }
{sysContact} }
{sysLocation} }
{sysObjectID} } values from DB corresponding to 'live' values (above)
{sysObjIDtxt} }
{sysServices} }
{ifNumber} }
{UpSinceTime} }
{mtime} }
u u u u u u uu u u /.../ u u u <- user from logs I I I I I I II I I \...\ I I I <- IP from logs |------------ IP-MAC ---/.../----------| <- IP & MAC from IP_MACThese sightings will persist over periods of minutes or hours in the case of user activity, and indefinitely (hours -> years) for IP-MAC records. Where a host is used sequentially by many users (e.g. a shared PC) we will see:
u1 u1 u1 u1 u1 u2 u2 u2 u2 u2 u2 |------------ MAC ----------------|or on a multi-user (timesharing) host:
u1 u2 u1 u3 u4 u1 u4 u2 u2 u3 u4 u1 |------------ MAC ----------------|ANOMALIES:
When we have duplicate IP addresses we will seee this
u1 u1 u1 u1 u1 ... u2 u2 u2 u2 u2
|------- MAC1 --------| |------ MAC2 ------|
or with multiple users:
u1 u2 u4 u3 u1 u3 ... u4 u4 u4 u5 u5 u5
|------- MAC1 --------| |------ MAC2 ------|
We can also check for possibly accounts shared amongst multiple users (solely from web logs)
u1 u1 u1 u1 u1 u1
I1 I1 I1 I1 I1 I1
u2 u2 u2 u2 u2
I2 I2 I2 I2 I2
|----------|
Depending on the duration of the overlap, type and location of machines this may be
a fairly innocent case of a user moving to a new PC in the same lab having failed to log
out correctly from their first machine (or using 2 machines simultaneously),
a user innocently using web browsers on more than one machine simultaneously (e.g.
on a PC and, via X, on a unix host to which they connect via the PC).
Where the same user is logged into single-user machines in physically separate parts of the
campus however an account is being shared, and this may have security implications (e.g. a compromised user account).
The sources of data we have are:
The actual correlation of the two data sets looks as if it ought to be simple but I have not found it to be so, without having impracticably large data sets in memory (for example reading the entire web cache logs and ARP cache data into structures such as hashes indexed on IP address, and correlating one with another for each address seen in either) or being impracticably slow (e.g. traversing the web cache files once for each host and matching against arp records).
However a variation of the first approach outlined above seems a possibility:
The arp table itself is small enough to reside in memory, so we could break the cache logs into chunks covering relatively small times, read these into data structures, and correlate these with the arp table.
Thus we might have a table of IP-MAC-time of
IPadd (index),
time-first-seen (index),
time-last-seen (index),
MAC address
And for the web logs a table of IP-user-time of
IPadd (index),
time-seen (index),
userID
These would be easier to implement as SQL tables than perl data structures e.g.
hash tables, as we would want to retrieve records for a given IPadd combined with a
time relative >= t1 and <= t2. Writing these tables and then retrieveing them
would be slow. The alternative would be to search through a hash indexed on IPadd
and time looking for records whose time range matched that of the web records.
We could have my %IP_MAC ;
%{$IP_MAC {$IPadd}} = (
time1 => $t1,
time2 => $t2,
MAC => $MAC
) ;
my %host_user ;
$host_user {$IPadd} {$time} = $user;
and then for each host of host_user
retrieve all records from IP_MAC with the same IP,
and for each time of host_user {$IPadd}
find all records from IP_MAC with matching times
As mentioned the granularity of arp records is in the order of 10 minutes,
and in this period web cache logs can contain many hundreds or even thousands of records,
many of which have only milliseconds' difference in timing,
so a great deal of compaction of data is possible by amalgamating web log records
for the same host + user occuring within a certain period of time such as 10 or 100 seconds