Oracle® Light Weight Availability Collection Tool User's Guide Release 3.3 for Oracle Solaris Part Number E20940-01 |
|
|
PDF · Mobi · ePub |
This section lists the various errors logged by the Oracle Lightweight Availability Collection Tool, their functional meaning, and any actions that should be taken when these errors are displayed either in /var/adm/message or on the screen.
Indicates that an invalid cause code entry is in the /etc/default/lwact file. The user can set up to three levels of default cause codes for the outage events in this file. The cause code level that contains incorrect entry is logged in the error message with the square brackets ([]); that is, XX can take either [L1CC] or [L2CC], or [L3CC] based on the level of cause code that is invalid.
Action: Enter the valid set of cause codes against the L1CC, L2CC, L3CC fields in the /etc/default/lwact file. Use the logtime -M command to get the list of valid cause codes for all three levels.
This error message indicates that a user has tried to modify the cause code for an invalid event number; that is, a non-outage event. Users can modify/assign the cause codes in only the halt and panic outage events.
Action: Use the ltreport -v command to display the list of outage events along with their corresponding event numbers.
This message indicates that a user has entered invalid an cause code for the level displayed in the message. X can be either 1, 2 or 3.
Action: For each level 1 cause code, there is a corresponding umbrella of level 2 and level 3 cause codes under it. The only valid cause codes for that level is listed under the umbrella. To obtain the valid list of cause codes, use the logtime -m command.
This error message indicates a user has successfully modified the event number X. In this message, X is the event number.
Action: No action is required. Informational only.
This message is logged when the Oracle Lightweight Availability Collection Tool terminates (for example, in the case of pkgrm).
Action: No action is required. Informational only
This error message indicates a user has tried to start the tictimed daemon that is already running.
Action: No action is required. Informational only.
LWACT is removing the zero-byte file and starting afresh. Occurs when the availability datagram file turns to 0 bytes in size for an unknown reason.
Action: For pre-LWACT 3.2 installation, remove the zero-byte file, tictimed will recreate it. For LWACT 3.2 or higher versions, no action is required. LWACT will automatically remove the zero-byte file.
The entire message is as follows:
[tictimed] datagram file corruption detected. LWACT is quarantining the corrupted file and starting afresh. If required user can pick up the uncorrupted datagram file from the last run explorer output in-rder to avoid considerable data loss.
Whenever the Availability datagram is found to be corrupted, the Oracle Lightweight Availability Collection Tool automatically quarantines it to the same folder where the Availability datagram is present with a filename of the format: lwact_corrupted_<UTC at which the corruption was detected> (for example: lwact_corrupted_1208531225). Quarantining the Availability datagram causes a data loss in the Oracle Lightweight Availability Collection Tool. Old data, collected before the file corruption occurred, will not be taken into account by the tool during the availability calculation.
Action: In order to minimize this data loss, you can manually obtain the uncorrupted copy of the datagram from the previous Oracle Explorer image.
If the Availability datagram is lost or deleted for some reason, tictimed, which periodically updates the timestamp on the log file, will not be able to carry out this activity. Hence, it logs the error message. A few possible cases where this error can occur are the following:
The datagram file is corrupted and tictimed has quarantined it.
The Availability datagram file has been deleted by the user for some reason.
Action: No action is required. tictimed will automatically recreate the file afresh if it does not find it.
This error message indicates that user has attempted to start LWACT manually using the init script.
Action: No action required. Information only.
This error message indicates that user has attempted to start LWACT which was already running.
Action: No action required. Information only.
This message indicates that user has attempted stop LWACT manually using the init script.
Action: No action required. Information only.
The entire message is as follows:
**ATTENTION** Event generation not in chronological order. It can affect availability metrics. Sudden fall back in system date may have caused this. Check and correct system date. Otherwise, quarantine current datagram to start monitoring availability afresh.
This message occurs when the availability events are recorded out-of-sequence in the availability datagram. Out-of-sequence events can occur due to sudden fall back in system date (for example, system shutdown today and boots back to a date from last week). In such cases, LWACT detects the sudden shift in time and records the message indicating the exact time when the system fell back in time. The affected system can report incorrect availability metrics.
Action: You can check and correct the system date or quarantine the current datagram to start monitoring the availability of the system afresh. Please note that old availability metrics will be lost during when the datagram is quarantined.