Recently I was being asked to write a script to monitor the Oracle Alert Logs for "ORA-" errors and send out an automated email to the DBA team.
Here are the challenges :
1.How do I make the script scan only the lines which were not scanned by the previous scan .
An other way to reframe the question - if scan x happened at 3pm, how do I ensure that the Scan (x+1) picked up only the new lines generated after 3pm. Should I store the time stamp of the last scan or should I store the high-water line count . I finally decided to go with the HWM line number.
2.Also how do I deal when a alert log is recycled?Will my script be able to find out that alert log has been recycled and then automatically set by line HWM to start from line 1 of the new alert log.
Looks like checking the stored HWM line count and the current line count of the alert log would help me in finding this out .If the alert log line count is smaller than the stored HWM line count then it would be safe to assume the alert log was recycled. Again in extraordinary circumstances where an alert log was recycled and the subsequent script ran so late(say after a couple of weeks or so) that new alert log line count exceeded the stored HWM line count ,then my logic would make it skip the first stored HWM lines of the alert log for that particular run only (but then again regaing normal operations for the subsequent scans)but I choose to ignore this possibility as this is extremely very remote and even if so, only the first scan would fail to detect the ORA- errors.
#!/bin/bash
#Alert Log Monitoring Script
#Usage: alert_monitor.sh
#Author:Gautham Chandrasekaran
#Version 1.0 06 NOVEMBER 2013
EMAIL_LIST=gautha@hotmail.com
ALERT_LOG_FILE=/u01/app/oracle/diag/rdbms/prod/PROD_1/trace/alert_PROD_1.log
EMAIL_FILE=/tmp/alert_mail.log
#Error out if Alert Log is missing
if [ ! -f $ALERT_LOG_FILE ] ; then
echo "**** $ALERT_LOG_FILE FILE MISSING FROM ALERT LOG MONITOR ******" > $EMAIL_FILE
cat $EMAIL_FILE|mailx -s "Alert Log Errors" $EMAIL_LIST
exit 1
fi
ALERT_LOG_LINE_CNT_FILE=${ALERT_LOG_FILE}.ctr
#First run of the script or somebody deleted it.Therefore start from zero.
if [ ! -f $ALERT_LOG_LINE_CNT_FILE ] ; then
echo 0 > $ALERT_LOG_LINE_CNT_FILE
fi
STORED_LINE_CTR=`cat $ALERT_LOG_LINE_CNT_FILE`
ALERT_LOG_LINE_CTR=`cat $ALERT_LOG_FILE|wc -l`
#check to see whether somebody has recycled alert log file.
#in this case the STORED_LINE_CTR will be higher than ALERT_LOG_LINE_CTR
#If so just reset STORED_LINE_CTR to 0.
if [ $ALERT_LOG_LINE_CTR -lt $STORED_LINE_CTR ] ; then
STORED_LINE_CTR=0
fi
LINES_TO_MONITOR=`expr $ALERT_LOG_LINE_CTR - $STORED_LINE_CTR`
tail -n $LINES_TO_MONITOR $ALERT_LOG_FILE|grep -i "ora-" > /tmp/alert_mail.log
#Reset line number to ctr file to skip the scanned rows in the next run
echo $ALERT_LOG_LINE_CTR > $ALERT_LOG_LINE_CNT_FILE
#Email only if the file is not empty
if [ -s $EMAIL_FILE ] ; then
cat $EMAIL_FILE|mailx -s "Alert Log Errors" $EMAIL_LIST
fi