The external monitoring process regularly monitors jobs running in the scheduler and provides notifications when certain conditions occur. By default, the interval between the monitoring of jobs is 15 minutes. This interval can be overridden during the setup of the external job monitor. Refer to the RED Installation Guide for details.
Job monitoring is defined by selecting the Monitor > Job Monitoring menu option from the Scheduler window, or from the stand-alone scheduler maintenance program.

When this menu option is selected a check is made to ensure that the external monitor process is active. If the external process has not reported in the last 24 hours, the following message appears:

In such a case, the external monitoring process on the UNIX platform needs to be checked or started. Refer to the RED Installation Guide for instructions on how to maintain the external monitoring process.
The following window is displayed:

In this example, the job called 'Enterprise Reporting Daily Refresh' is being monitored. The days Monday through Friday are checked, so monitoring will occur on these days. The nominal start time is 04:00 am. Notifications occur if the following conditions occur.

  1. The job fails to start within 60 minutes of this nominal start time.
  2. The job fails to finish within 240 minutes (4 hours) of the nominal start time.
  3. Errors occur while the job is running
  4. The job fails with errors.

In the above example screen, the user has created an Action Script called 'monitor_db_mail' and saved it is a host script object type in RED. This host script is used in all conditions defined above that could cause a notification to occur.

Monitor Days

When defining monitoring for a job, one or more days need to be selected. As well as the normal week days, there is also the first and last day of the month. Using these two special days, it is possible to monitor a job that is critical at the end or start of a month. If Monday through Sunday are checked, then it would not be necessary to set the special days. The monitoring day is the day on which the job starts (or could start). 

Note

The monitoring day is the day of the absolute minimum start time of a job. For example, if we wish to monitor a job that starts at 00:01, and which has an earliest possible start of 5 minutes earlier (e.g. 23:56), then we need to set monitoring for the day of the earliest possible start. Therefore, to monitor the Monday morning run, we must also have the Sunday check box selected.

Nominal Start Time

The nominal start time is the time at which the job will normally start. This field must be populated for all jobs being monitored, even if not used. Monitoring is optimized for daily jobs, so this field is normally set to the start time of the job.

Earliest Possible Start

The earliest possible start field allows for the setting of a possible start time before the nominal start time. In most situations, this field is set to 0. When setting to a value other than zero, the impact of the potential start time on the days being monitored needs to be considered. See the note under the Monitor Days section above. The nominal start time less the earliest possible start time, provides an Absolute Minimum Start Time which is used by most of the notification conditions, as well as the monitor days.

Periodicity of Job

The periodicity defines the interval between iterations of the job. For a daily job, this would be 24 hours and 0 minutes or 2400. This periodicity is used by the monitoring software to ascertain if a job being examined is the current iteration of the job or a previous iteration.

Notification Order

Notifications are processed in the following order. If a notification is sent for a job, then no further notification checks are made for that job.

  1. Finish Error
  2. Finish Warning
  3. Finish Success
  4. Must Finish In
  5. Run Error
  6. Run Warning
  7. Checkpoint 3
  8. Checkpoint 2
  9. Checkpoint 1
  10. Must Start Within

To fully cover all possible events it is usually necessary to set multiple notifications.

Must Start Within (Notification)

The Must Start Within notification allows notification when a job fails to start within a specified number of minutes of the absolute minimum start time. In the example above, the job must start within 60 minutes of the start time.
If this criteria is not met, then the monitoring software will look at the Action Script and Parameter to Script fields to ascertain how to process the notification. If an action script has been defined, then this script is executed in the UNIX environment and is passed the defined parameters. If a script is not defined, then the Parameter to Script field is executed in the UNIX environment.

Must Finish In (Notification)

The Must Finish In notification allows notification when a job fails to finish within a specified number of minutes of the absolute minimum start time. In the example above, the job must finish within 4 hours of the start time. See Must Start Within above for the action taken, if the criteria is met.

Run Warning Count (Notification)

The Run Warning Count notification allows notification when a job exceeds a specified number of warning messages whilst running. This notification only occurs when a job is running. If the job has failed to start, has finished or failed before the external monitor checks the number of warnings this notification will be ignored. See Must Start Within above for the action taken, if the criteria is met.

Run Error Count (Notification)

The Run Error Count notification allows notification when a job exceeds a specified number of error messages whilst running. This notification only occurs when a job is running. If the job has failed to start, has finished or failed before the external monitor checks the number of errors this notification is ignored. See Must Start Within above for the action taken, if the criteria is met.

Finish Warning Count (Notification)

The Finish Warning Count notification allows notification when a job has exceeded a specified number of warning messages and has completed or failed. This notification only occurs when a job has completed or failed. If the job has failed to start or is still running when the external monitor checks the number of warnings, this notification is ignored. See Must Start Within above for the action taken, if the criteria is met.

Finish Error Count (Notification)

The Finish Error Count notification allows notification when a job has exceeded a specified number of error messages and has failed. This notification only occurs when a job has failed. If the job is still running when the external monitor checks the number of warnings, this notification is ignored. If the job has been restarted and gone on to complete normally, this notification is ignored even when errors have occurred. See Must Start Within above for the action taken, if the criteria is met.

Finish Success (Notification)

The Finish Success notification allows notification when a job completes. This notification can be used for example to mail a log of the completed job to a specified user. See Must Start Within above for the action taken, if the criteria is met.

Checkpoint 1 (Notification)

The Checkpoint 1 notification allows notification when a job fails to achieve either a specified number of task completions, or a specified number of information/OK messages within a specified elapsed time. The elapsed time is the time from the absolute minimum start time. This notification only occurs when the job is running. If the job has failed to start, completed or failed when the external monitor checks the checkpoint, this notification is ignored. See Must Start Within above for the action taken, if the criteria is not met.

Checkpoint 2 (Notification)

The Checkpoint 2 notification allows notification when a job fails to achieve either a specified number of task completions, or a specified number of information/OK messages within a specified elapsed time. The elapsed time is the time from the absolute minimum start time. This notification only occurs when the job is running. If the job has failed to start, completed or failed when the external monitor checks the checkpoint, this notification is ignored. See Must Start Within above for the action taken, if the criteria is not met.

Checkpoint 3 (Notification)

The Checkpoint 3 notification allows notification when a job fails to achieve either a specified number of task completions, or a specified number of information/OK messages within a specified elapsed time. The elapsed time is the time from the absolute minimum start time. This notification only occurs when the job is running. If the job has failed to start, completed or failed when the external monitor checks the checkpoint, this notification is ignored. See Must Start Within above for the action taken, if the criteria is not met.

  • No labels