Oracle® Light Weight Availability Collection Tool User's Guide Release 3.3 for Oracle Solaris Part Number E20940-01 |
|
|
PDF · Mobi · ePub |
This chapter introduces you to the Oracle Light Weight Availability Collection Tool. This tool is a standalone product that collects availability data.
This section explains the architecture. The following graphic shows the end-to-end data flow of the Oracle Lightweight Availability Collection Tool.
Upon installation of the Oracle Lightweight Availability Collection Tool on the monitored host, the tool spawns a daemon (tictimed) that continuously monitors and collects the availability status of the host. This collected availability data is stored in the form of an XML file. The Oracle Lightweight Availability Collection Tool's reporting utility (ltreport) can be used to generate and view command line interface (CLI) based reports from this file. The tool also provides a few XSL sheets to generate HTML-based reports from the datagram.
The Availability datagram is picked-up and transported back to Oracle by the Oracle Explorer Data Collector and is stored in Oracle's database. This data is used by Oracle to improve its products and is also available for account managers to communicate improvement opportunities with customers.
The availability data collected by the Oracle Lightweight Availability Collection Tool is stored in the form of datagram within the file system of the monitored host. The availability data is embedded in between XML tags. The Availability datagram can be broadly categorized into two sections:
Monitored System Information
Availability Data
The following is a sample of an Oracle Lightweight Availability Collection Tool Availability datagram file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <single_system_availability_results> <systemInfo> <hostName>bs6-s0</hostName> <hostId>83254cb1</hostId> <zoneName>global</zoneName> <timeZone>US/Mountain</timeZone> <sysSerialNumber>unknown</sysSerialNumber> <OSName>SunOS</OSName> <OSVersion>5.10</OSVersion> <cpuArchitecture>sparc</cpuArchitecture> <productType>Serverblade1</productType> <lwactVersion>3.1</lwactVersion> </systemInfo> <event type="epoch" utc="1207784519" timeStamp="Wed Apr 9 17:41:59 2008" up="0" dwnPlnd="0" dwnUnplnd="0" dwnUndef="0" cksum="13c8" /> <event type="boot" utc="1207784519" timeStamp="Wed Apr 9 17:41:59 2008" up="76820" dwnPlnd="0" dwnUnplnd="0" dwnUndef="0" cksum="13e4" /> <event type="panic" utc="1207861339" timeStamp="Thu Apr 10 15:02:19 2008 -06:00" up="0" dwnPlnd="0" dwnUnplnd="1" dwnUndef="0" L1causeCode="Unplanned" L2causeCode="Undefined" L3causeCode="Undefined" wasPlanned="2" cksum="2708" /> <event type="boot" utc="1207861340" timeStamp="Thu Apr 10 15:02:20 2008 -06:00" up="8931" dwnPlnd="0" dwnUnplnd="0" dwnUndef="0" cksum="143b" /> <event type="time" utc="1207870271" timeStamp="Thu Apr 10 17:31:11 2008 -06:00" up="85751" dwnPlnd="0" dwnUnplnd="1" dwnUndef="0" elapsed="85752" totAvail="99.999" adjAvail="99.999" cksum="1c95" /> </single_system_availability_results>
Note:
In this sample datagram file, the epoch and boot events do not have a time zone offset. This might happen if you upgrade the Oracle Lightweight Availability Collection Tool from a pre-3.0 version to a later one. The latest Oracle Lightweight Availability Collection Tool will always contain time zone offset information in the event timestamps.The information collected in between the tags <systemInfo> and </systemInfo> constitutes the system information section. This section provides details about the monitored host, such as the following:
Hostname
Hostid
Zonename (if present)
Timezone of the host
System serial number (if known)
OS name and version
CPU architecture (SPARC/x86)
Product type
The version of the Oracle Lightweight Availability Collection Tool installed on the host
The Availability Data section contains the availability events (boot, epoch, halt, panic) and its corresponding timestamp. All the availability calculations are done based on the data collected under this section.
This section identifies and describes the important fields of the Availability datagram (listed here in alphabetical order).
Adjusted availability
Represented in percentage as: ((Total uptime + Total Planned downtime) /Total elapsed time) * 100
Note:
Planned downtime is considered as uptime in this instance; hence, the term adjusted availabilityDowntime
The duration during which the host was out of run level 3 is considered as downtime (that is, the difference in coordinated universal time (UTC) between the outage event and its corresponding boot event). Downtime is recorded as a part of the outage event (panic/halt). It is decided by the wasPlanned field. The wasPlanned field can be one of the following designations:
Undefined (value of 0)
Planned (value of 1)
Unplanned (value of 2)
In the sample datagram (above), event #2 is a panic event, and event #3 is its corresponding boot event; the difference in UTC of event #3 and event #2 is the downtime. Therefore, downtime = 1207861340 - 1207861339 (= 1 sec)
Since the wasPlanned flag is 2, the downtime is marked against the field dwnUnplnd (Unplanned downtime).
Total availability
Represented in percentage as: (Total uptime/Total elapsed time) * 100
Types of Events
The following types of events are recorded in the Availability datagram by the Oracle Lightweight Availability Collection Tool:
epoch
Marks the beginning of event tracking. It is recorded only once in the Availability datagram (at the inception). The UTC of this event marks the inception time of the Oracle Lightweight Availability Collection Tool on the monitored host.
boot
Whenever the host returns to run level 3, a boot event is recorded in the datagram along with the corresponding timestamp.
halt
Whenever the host leaves run level 3 to any other level, a halt event is created with the time of halt being the time the host left run level 3.
panic
If the host encounters an un-natural downing such as system crash, upon the subsequent boot of the host (that is, a return to run level 3), a panic event is recorded where the time of the panic event is the time at which the Oracle Lightweight Availability Collection Tool stopped running.
time
Indicates the last recorded UTC for offline reporting. This event contains the consolidated uptime and downtime information. It also reports the elapsed time (measured as the duration in UTC that the Oracle Lightweight Availability Collection Tool is monitoring this host since inception). Apart from this information, the time event also reports system availability in two forms: Total availability and Adjusted availability.
Uptime
The difference in UTC between the current outage event and the last event before it, which would be a boot event, is measured as uptime.
In the sample datagram (above), if the uptime field in event #1 (boot event) is calculated as the difference in UTC between event #3 and event #2uptime = 1207861339 - 1207784519 (= 76820 secs)
Key capabilities of the Oracle Lightweight Availability Collection Tool are as follows:
Supported on Solaris 10 Containers/Zones
Note:
Oracle Light Weight Availability Collection Tool can be installed on non-global zones only via the global zone. If direct installation on non-global zone is attempted, it exits the installation returning an appropriate error message.Supports both SPARC and x86/x64 platforms
Stores the data in universally accepted datagram format
Does not generate network traffic
Tracks boot, halt, and panic events to a granularity of one second
Facilitates segregation of planned, unplanned, or undefined downtime for finer tracking
Deploys easily in Solaris package format
Is very lightweight on system resources
Facilitates both online and offline reporting
Basic reporting functionality is provided through the ltreport command line interface, which is part of the Oracle Lightweight Availability Collection Tool package. Additionally, the datagram enables a wide range of reporting options that are independent of any reporting database or applications. This enables availability reports to be generated on-site or through any report generating portals at Oracle.
Browser-based graphical reporting can also be performed at a system level. To enable such report generation capabilities, a predefined set of XSL style sheets are provided when the Oracle Lightweight Availability Collection Tool is installed. An XSL translator is required to generate the HTML reports using these style sheets. The XSL translator is not part of the Oracle Lightweight Availability Collection Tool.
Oracle Explorer Data Collector 6.0 or higher is required to collect the datagrams of Oracle Lightweight Availability Collection Tool 3.0 or higher. Although previous versions of Oracle Explorer Data Collector and Oracle Lightweight Availability Collection Tool may work, it is recommended that Explorer 6.0 or higher is used to collect the Oracle Lightweight Availability Collection Tool 3.0 (or higher) datagrams. Otherwise, Oracle may not be able to provide availability analysis based on older version Oracle Lightweight Availability Collection Tool datagrams.