Big Brother - Help

A Web-based Systems and Network Monitoring and Notification System

Order of severity

Serious Trouble
No report lately
May need attention
All is well

All connections are checked every 5 minutes


BB Installation and Configuration Manual


Big Brother FAQ


Pager Codes

The administrator will be notified when conditions merit. The numeric message is formatted as follows: [3 DIGIT CODE] [IP-ADDRESS]


Severe Conditions

Most severe conditions result in the administrator being notified. These include loss of network connectivity, loss of HTTP access, and disk conditions over 95% full, since these can result in a system hang. Furthermore, any "NOTICE" messages in the message file causes a notification since this may signal a disk fault.

Under these circumstances, the screen should turn red. Click on the corresponding red dot for additional information about the condition.

If a severe situation is occurring that is not being noticed by Big Brother, use the PAGE/ACK button on the main screen to notify the administrator manually.


Warning Conditions

These include HTTP server errors, disks 90-94% full, the death of important processes, and "WARNING" messages in the system logs.

The screen should turn yellow if this is the most severe situation at the time. Click on the corresponding yellow dot for additional information, and notify the administrator manually if necessary.


No Report Warnings

Each report is checked for freshness. If any report is more than 30 minutes old, it is marked with a purple dot, and the screen turns purple, assuming that it is the most serious situation at the time.

These may be the result of heavily loaded systems, but may also indicate a more serious loss of communication within the Big Brother system itself.


System Information

Click on any server name for additional details about the machine. Information about all components are available, including serial numbers, partition sizes, SCSI addresses, and the physical locations of the devices. This information lives in the www/notes directory.


General Information

The current status of any individual component is always available by clicking the appropriate dot in the display matrix. You may have to hit Reload to get the most recent entry.

Occasionally the screen changes color for CPU or HTTP warnings. These can usually be disregarded since Big Brother has been instructed to be very sensitive during this initial test. Similarly, internet connections may turn yellow when the network is heavily loaded. Although it should be checked out, this is usually not a problem unless the whole Internet section goes yellow.


Big Brother Column Information

conn

The conn column denotes the ping check performed periodically. This code is located in bb-network.sh.


nntp

The nntp column denotes the nntp check performed periodically. This code is located in bb-network.sh. It makes sure the news server is alive and well.


cpu

The cpu column denotes the cpu check performed periodically. This figure is based on the 5 minute load average as reported by the 'uptime' command, in the second column. The code for this test is located in bb-local.sh.


disk

The disk column denotes the disk check performed periodically. This test is just the 'df' command with the disk most full being reported. The warning amount is 90% by default, and the system is set to panic at 95%. These values are set in $BBHOME/etc/bbdef.sh and may be changed. The code for the disk test lives in bb-local.sh. You may also set warning/panic level individually in the etc/bb-dftab file. See the etc/bb-dftab.INFO.


dns

The dns column verifies the status of the DNS server on that machine. The test is basically an nslookup with the server name and IP address as arguments.


ftp

The ftp column denotes the ftp check performed periodically. This code is located in bb-network.sh. It is part of the new group of generic server tests performed. To test this service on a given machine, just include 'ftp' on the line in the bb-hosts file.


http

The http column denotes the http check performed periodically. This code is located in bb-network.sh. It will return OK if the server is there and does not return a string containing the word 'Error'. It should be more rigourous. Note that password-protected pages return an error when they shouldn't.


msgs

The msgs column denotes the msgs check performed periodically. This code is located in bb-local.sh. Only NOTICE and WARNING conditions are considered. Note that a NOTICE condition will cause a notification (code red) whereas a WARNING just turns the screen yellow. There is no way to turn these messages off, short of clearing out the messages file manually or modifying the tags from WARNING to wARNING and NOTICE to nOTICE. You may also introduce tags in the etc/bbdef.sh file in the PAGEMSG and MSGS variables.


oracle

The oracle column is for the status of the Oracle database instances on the servers, their process status, and a visual as to what is up and what is not. It checks and lists any and all processes used by the Oracle server(s), and also checks the network listner(s). It gives full status response from the Oracle listener control program's status command. In addition, it now also gives a list of any and all users currently connected, including Oracles own system logins. Code is run from the oracle extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later. If you want a copy, check out the
Big Brother Info Page or email me if you want a copy, at paul@pluzzi.com.


cpu2

CPU2 is a column for the status of the processors. There isnt really any column devoted to this and it is a big enough concern. It is currently tailored to SUN specifically, with the mpstat and psrinfo commands. It checks for any off-line processors, and will go red on that condition. It also checks swap information. Code is run from the cpu2 extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


dmp

DMP is a column for the status of the dynamic multi pathing features provided by EMC and Veritas and Sun. It checks the format command for O/S level check and then queries the Veritas setup to see that it matches. It will go red and page on any missing paths, whether it be from format or the Veritas check. It is currently tailored to SUN specifically, but should work on any supported Veritas platform, as the commands are standard. Code is run from the dmp extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


ha

This is the High Availability column. It is currently setup for Veritas First Watch but will definately be Veritas VCS aware in the very near future. Once again, it is specifically tailored to the SUN platform right now, but as the need arises, other platforms will be added. Code is run from the ha extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


iostat

This column is for the iostat command, so that it can be made available via the web. It currently polls twice at 5 second intervals. The output can be pretty long and get truncated on shared disk environments. In fact I am a victim of this alread on the dsgoddb01 host. This column also does a vmstat command for 5 intervals of 5 seconds. Code is run from the iostat extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


ipcs

IPCS is a column for the status of the inter process communication status. Code is run from the cpu2 extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


logs

The logs column is dedicated to showing log files regardless of condition. I dont currently set any negative color for this, because that is what the "msgs" column is for. However, I can see the need to do some other parsing of these files in the future, so look for that. Another difference between this and the "msgs" column, is that I currently look for the syslog, sulog, and messages file, whereas the "msgs" column only checks the messages. Code is run from the logs extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


network

The network column is designed to report any negative conditions for a downed interface. It also reports information about the routing tables, and goes red on any collisions or other errors from netstat -i. This may need to be modified on a network that is slow, or is still shared. I happen to be on switched networks exclusively, so I dont have any collisions. If that is not the case for you, you will need to add the line that checks for a 10% ratio of collisions to total traffic. Code is run from the network extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


printers

The printers column is for checking the status of any network connected printers. It is specifically written around HP's hpnp software so you will need to have that for this to work. Code is run from the printers extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


prtdiag

The prtdiag column is strictly to show the output of a verbose prtdiag command on SUN. I know that a portiion of this get truncated, but it is very minimal. The negative status indicators are flagged on any offline, unsteady, or otherwise negative conditions. Code is run from the prtdiag extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


top

TOP is a column for the status of the freeware top processes running on a system. This uses the -b flag to capture output for a snapshot of time. Code is run from the cpu2 extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


vx_check

The vx_check column is used again, on a SUN platform, to check the status of the disks, thru Veritas Volume Management. It gets a list of all disks in the Veritas control, and executes a "vxdisk check" on them. It dynamically gets the disk list, so there is no need to supply a list. That would not be proper anyway, because if you add a disk, and it has problems, you would not otherwise know about it, if it was a static disk list. Code is run from the vx_check extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


vx_group

The vx_group column is for checking the Veritas processes running, the vxiod daemon output, and of course a "vxdg list" command. Code is run from the vx_group extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


vx_list

The vx_list column is dedicated to checking for any offline, or invalid disks listed thru Veritas, that dont have a "double dash" line. In otherwords, if the disk is configured for any disk group, whether it be raw or cooked, it will be checked. If it is not online, you are getting a negative color for it. Code is run from the vx_list extension script provided in the new $BBHOME/ext directory which is only in release 1.2b and later.


pop3

The pop3 column denotes the pop3 check performed periodically. This is part of the generic test code in bb-network.sh. It checks that the pop3 server is alive and well. To test a machine for the pop3 server, put the word 'pop3' on that server's line in the bb-hosts file. You may have to put pop-3 instead on certain platforms. Check /etc/services for the correct spelling.


procs

The procs column denotes the procs check performed periodically. This code is located in bb-local.sh. It makes sure that the processes defined in etc/bbdef.sh in the PROCS variable exist on the local machine. If a process does not exist, and it has been defined in the PAGEPROCS variable, then the code is red and a notification is sent out. The ps command is used to get a current process listing.


smtp

The smtp column denotes the smtp check performed periodically. This is part of the generic server test code located in bb-network.sh. It makes sure that the SMTP process (usually sendmail) is alive and well.

Copyright © 1997-1999 The MacLawran Group Inc - All Rights Reserved