4 Oracle Database High Availability Solutions for Unplanned Downtime

Oracle Database offers an integrated suite of high availability solutions that increase availability and eliminate or minimize both planned and unplanned downtime. These solutions help enterprises maintain business continuity 24 hours a day, 7 days a week. However, the Oracle high availability solutions go beyond reducing downtime by providing solutions to increase system use on the primary and secondary systems and to help improve overall performance, scalability, and manageability.

Table 4-1 describes the various Oracle high availability solutions for unplanned downtime. The table shows how the features discussed in the subsequent sections can be used to address various causes of unplanned downtime. Where several Oracle solutions are listed, the MAA recommended solution is indicated in the Oracle Solution column.

Table 4-2 describes the high availability solutions in each of the MAA service-level tiers for the MAA reference architectures and multitenant architectures.

See the tables in Chapter 7, "High Availability Architectures" for a summary of the attainable recovery times for all of the types of unplanned downtime for each Oracle high availability architecture.

Table 4-1 Outage Types and Oracle High Availability Solutions for Unplanned Downtime

Outage Scope Oracle Solution Benefits

Site failures

Oracle Data Guard (MAA recommended) and Oracle Application Failover solution

  • Integrated client and application failover

  • Fastest and simplest database replication

  • Supports all data types

  • Zero data loss by eliminating propagation delay

  • Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write)

  • Active-active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

Recovery Manager and Oracle Secure Backup

  • Fully managed database recovery and integration with Oracle Secure Backup

  • Non-real-time recovery

Instance or computer failures

Oracle Real Application Clusters and Oracle Clusterware (MAA recommended)

  • Integrated client and application failover

  • Automatic recovery of failed nodes and instances

  • Lowest application brownout with Oracle Real Application Clusters

Oracle RAC One Node

  • Integrated client and application failover

  • Online database relocation migrates connections and instances to another node

  • Better database availability than traditional cold failover solutions

Oracle Data Guard

  • Integrated client and application failover

  • Fastest and simplest database replication

  • Supports all data types

  • Zero data loss by eliminating propagation delay

  • Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write)

  • Active-Active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

Storage failures

Oracle Automatic Storage Management (MAA recommended)

  • Mirroring and online automatic rebalancing places redundant copies of the data in separate failure groups.

Oracle Data Guard (MAA recommended)

  • Integrated client and application failover

  • Fastest and simplest database replication

  • Supports all data types

  • Zero data loss by eliminating propagation delay

  • Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary

Recovery Manager with Fast Recovery Area and Oracle Secure Backup (MAA recommended)

  • Fully managed database recovery and managed disk and tape backups

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write)

  • Active-active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

Data corruption

Corruption Prevention, Detection, and Repair (MAA recommended)

Database initialization settings such as DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_PROTECT

  • Different levels of data and redo block corruption prevention and detection at the database level

Data corruption

Oracle Data Guard (MAA recommended)

Oracle Active Data Guard Automatic Block Repair

DB_LOST_WRITE_PROTECT initialization parameter

  • In a Data Guard configuration with an Oracle Active Data Guard standby, physical block corruptions detected by Oracle at a primary database are automatically repaired using a good copy of the block retrieved from the standby, and vice versa. The repair is transparent to the user and application.

  • Strong database isolation of data corruptions with Oracle Active Data Guard.

  • With MAA recommended initialization settings, Oracle Active Data Guard and Oracle Exadata Database Machine, achieve most comprehensive full stack corruption protection.

  • With DB_LOST_WRITE_PROTECT enabled, a lost write that occurred on the primary database is detected either by the physical standby database or during media recovery of the primary database, recovery is stopped to preserve the consistency of the database. Failing over to the standby database using Data Guard will result in some data loss. With Oracle Database 12c Release 1 (12.1.0.2) and Data Guard Broker, Data Guard Broker's PrimaryLostWrite property supports SHUTDOWN and CONTINUE (as in Oracle Database 11g Release 2 (11.2.0.4)), plus FAILVOER and FORCEFAILOVER options when lost writes are detected on the primary database. See Oracle Data Guard Broker

  • If a lost write is detected on the standby database, you can restore the affected file and restart Redo Apply if the lost write is isolated and the hardware problem is corrected.

  • DB_LOST_WRITE_PROTECT initialization parameter provides lost write detection.

    Note: Lost writes can corrupt the entire database, which may require that you rebuild the affected database after resolving the hardware issue.

Dbverify, Analyze, Data Recovery Advisor and Recovery Manager, ASM Scrub with Fast Recovery Area (MAA recommended)

  • These tools allow the administrator to execute manual checks to help detect and potentially repair from various data corruptions.

  • Dbverify and Analyze conducts physical block and logical intra-block checks. Analyze can conduct inter-object consistency checks.

  • Data Recovery Advisor automatically detects data corruptions and recommends the best recovery plan.

  • RMAN operations can conduct both physical and inter-block logical checks.

  • RMAN can execute online block-media recovery using flashback logs, backups, or the standby database to help recover from physical block corruptions.

  • ASM Scrub detects and attempts to repair physical and logical data corruptions with the ASM pair in normal and high redundancy disks groups.

Data corruption

Oracle Exadata Database Machine and Oracle Automatic Storage Management (MAA recommended)

DIX + T10 DIF Extensions (MAA recommended where applicable)

  • If Oracle ASM detects a corruption and has a good mirror, Oracle ASM returns the good block and repairs the corruption during a subsequent write I/O.

  • Exadata provides implicit HARD enabled checks to prevent data corruptions caused by bad or misdirected storage I/O.

  • Exadata provides automatic HARD disk scrub and repair. Detects and fixes bad sectors.

  • DIX +T10 DIF Extensions provides end to end data integrity for reads and writes through a checksum validation from a vendor's host adapter to the storage device

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write). Logical replica can be used as a failover target if partner replica is corrupted.

  • Active-active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

Human errors

Oracle Security Features (MAA recommended)

  • Restrict access to prevent human errors

Oracle Flashback Technology (MAA recommended)

  • Fine-grained error investigation of incorrect results

  • Fine-grained and database-wide rewind and recovery capabilities

Delays or slow downs

Oracle Database and Oracle Enterprise Manager

Oracle Data Guard (MAA recommended) and Oracle Application Failover solution

  • Oracle Database automatically monitors for instance and database delays or cluster slow downs and attempts to remove blocking processes or instances to prevent prolonged delays or unnecessary node evictions.

  • Oracle Enterprise Manager or a customized application heartbeat can be configured to detect application or response time slowdown and react to these SLA breaches.

    For example, you can configure the Enterprise Manager Beacon to monitor and detect application response times. Then, after a certain threshold expires, Enterprise Manager can call the Data Guard DBMS_DG.INITIATE_FS_FAILOVER PL/SQL procedure to initiate a failover. See the section about "Application Initiated Fast-Start Failover" in Oracle Data Guard Broker.

File system data

Oracle Replication Technologies for Non-Database Files

  • Enables full stack failover that includes non-database files


If you are managing many databases in DBaaS, we recommend using the MAA tiers and Oracle Multitenant as described in Section 2.3.1, "Oracle MAA Reference Architectures." Table 4-2 identifies various unplanned outages that can impact a database in multitenant architecture. It also identifies the Oracle HA solution to address that outage that is available in each of the HA tiers.

Table 4-2 Unplanned Outage Matrix for MAA Reference Architectures and Multitenant Architectures

Event Solutions by MAA Tier Recovery Window (RTO) Data Loss (RPO)

Instance Failure

BRONZE: Oracle Restart

Minutes if server can restart

Zero

SILVER: Oracle RAC or optionally Oracle RAC One Node

Seconds with Oracle RAC Minutes with Oracle RAC One Node

Zero

GOLD: Oracle RAC

Seconds

Zero

PLATINUM: Oracle RAC with Application Continuity

Zero Application Outage

Zero

Permanent Node Failure (but storage available)

BRONZE: Restore and recover

Hours to Day

Zero

SILVER: Oracle RAC

Seconds

Zero

SILVER: Oracle RAC One Node

Minutes

Zero

GOLD: Oracle RAC

Seconds

Zero

PLATINUM: Oracle RAC with Application Continuity

Zero Application Outage

Zero

Storage Failure

ALL: Automatic Storage Management

Zero downtime

Zero

Data corruptions

BRONZE/SILVER: Basic protection Some corruptiosn require recover restore and recovery of pluggable database (PDB), entire multitenant container database (CDB) or non-container database (non-CDB)

Hour to Days

Since last backup if unrecoverable

GOLD/PLATINUM: Comprehensive corruption protection and Auto Block Repair with Oracle Active Data Guard

Zero with auto block repair

Seconds to minutes if corruption due to lost writes and using Data Guard Fast Start failover.

Zero unless corruption due to lost writes

Human error

ALL: Logical failures resolved by flashback drop, flashback table, flashback transaction, flashback query and undo.

Dependent on detection time but isolated to PDB and applications using those objects.

Dependent on logical failure

All: Comprehensive logical failures impacting an entire database and PDB that requires RMAN point in time recovery (PDB) or flashback database

Dependent on detection time

Dependent on logical failure

GOLD/PLATINUM: With Oracle GoldenGate, you can fail over just one PDB

Dependent on detection time but actual failover can take seconds

Dependent on logical failure

Database unusable, system, site or storage failures, wide spread corruptions or disasters

BRONZE/SILVER: Restore and recover

Hours to Days

Since last backup

GOLD: Fail over to secondary (Oracle Active Data Guard or Oracle GoldenGate)

Seconds

Zero to Near Zero

PLATINUM: Active Data Guard Failover with Application Continuity

Zero Application Outage

Zero

Performance Degradation

ALL: Database Resource Manager and Tuning

No downtime but degraded service

Zero