4 Administering an Active Standby Pair Without Cache Groups

This chapter describes how to administer an active standby pair that does not replicate cache groups.

For information about administering active standby pairs that replicate cache groups, see Chapter 5, "Administering an Active Standby Pair with Cache Groups".

For information about managing failover and recovery automatically, see Chapter 7, "Using Oracle Clusterware to Manage Active Standby Pairs".

This chapter includes the following topics:

Overview of master database states

This section summarizes the possible states of a master database. These states are referenced in the tasks described in the rest of the chapter.

The master databases can be in one of the following states:

  • ACTIVE - A database in this state is the active database. Applications can update its replicated tables.

  • STANDBY - A database in this state is the standby database. Applications can update only nonreplicated tables in the standby database. Nonreplicated tables are tables that have been excluded from the replication scheme by using the EXCLUDE TABLE or EXCLUDE CACHE GROUP clauses of the CREATE ACTIVE STANDBY PAIR statement.

  • FAILED - A database in this state is a failed master database. No updates can be replicated to it.

  • IDLE - A database in this state has not yet had its role in the active standby pair assigned. It cannot be updated. Every database comes up in the IDLE state.

  • RECOVERING - When a previously failed master database is synchronizing updates with the active database, it is in the RECOVERING state.

You can use the ttRepStateGet built-in procedure to discover the state of a master database.

Duplicating a database

When you set up a replication scheme or administer a recovery, a common task is duplicating a database. You can use the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database.

To duplicate a database, these conditions must be fulfilled:

  • The instance administrator performs the duplicate operation.

  • The instance administrator user name must be the same on both instances involved in the duplication.

  • You must provide the user name and password for a user with the ADMIN privilege on the source database.

  • The target DSN cannot include client/server attributes.

On the source database, create a user and grant the ADMIN privilege to the user:

CREATE USER ttuser IDENTIFIED BY ttuser;
User created.

GRANT admin TO ttuser;

Assume the user name of the instance administrator is timesten. Logged in as timesten on the target host, duplicate database dsn1 on host1 to dsn2:

ttRepAdmin -duplicate -from dsn1 -host host1 dsn2

Enter internal UID at the remote datastore with ADMIN privileges: ttuser 
Enter password of the internal Uid at the remote datastore:

Enter ttuser when prompted for the password of the internal user at the remote database.

If you are duplicating an active database that has cache groups, use the -keepCG option. You must also specify the cache administration user ID and password with the -cacheUid and -cachePwd options. If you do not provide the cache administration user password, ttRepAdmin prompts for a password. If the cache administration user ID is orauser and the password is orapwd, duplicate database dsn1 on host1:

ttRepAdmin -duplicate -from dsn1 -host host1 -keepCG "DSN=dsn2;UID=;PWD="

Enter internal UID at the remote datastore with ADMIN privileges: ttuser 
Enter password of the internal Uid at the remote datastore:

Enter ttuser when prompted for the password. ttRepAdmin then prompts for the cache administration user and password:

Enter cache administrator UID: orauser
Enter cache administrator password: 

Enter orapwd when prompted for the cache administration password.

The UID and PWD for dsn2 are specified as null values in the connection string so that the connection is made as the current OS user, which is the instance administrator. Only the instance administrator can run ttRepAdmin -duplicate. If dsn2 is configured with PWDCrypt instead of PWD, then the connection string should be "DSN=dsn2;UID=;PWDCrypt=".

When you duplicate a standby database with cache groups to a read-only subscriber, use the -nokeepCG option. In this example, dsn2 is the standby database and sub1 is the read-only subscriber:

ttRepAdmin -duplicate -from dsn2 -host host2 -nokeepCG "DSN=sub1;UID=;PWD="

The ttRepAdmin utility prompts for values for -uid and -pwd.

If you want to use a specific local or remote network interface over which the database duplication occurs, you can optionally specify either by providing an alias or the IP address of the network interface.

For more information about the ttRepAdmin utility, see "ttRepAdmin" in Oracle TimesTen In-Memory Database Reference. For more information about the ttRepDuplicateEx C function, see "ttRepDuplicateEx" in Oracle TimesTen In-Memory Database C Developer's Guide.

Setting up an active standby pair with no cache groups

To set up an active standby pair, complete the tasks in this section. See "Configuring an active standby pair with one subscriber" for an example.

If you intend to replicate read-only cache groups or asynchronous writethrough (AWT) cache groups, see Chapter 5, "Administering an Active Standby Pair with Cache Groups".

  1. Create a database.

  2. Create the replication scheme using the CREATE ACTIVE STANDBY PAIR statement. See Chapter 3, "Defining an Active Standby Pair Replication Scheme".

  3. Start the replication agent. See "Starting and stopping the replication agents".

  4. Call ttRepStateSet('ACTIVE') on the active database.

  5. Create a user on the active database and grant the ADMIN privilege to the user.

  6. Duplicate the active database to the standby database.

  7. Start the replication agent on the standby database. See "Starting and stopping the replication agents".

  8. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet procedure to check the state of the standby database.

  9. Duplicate all of the subscribers from the standby database. See "Duplicating a database", using the -nokeepCG option.

  10. Set up the replication agent policy and start the replication agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Recovering from a failure of the active database

This section includes the following topics:

Recovering when the standby database is ready

This section describes how to recover the active database when the standby database is available and synchronized with the active database. It includes the following topics:

When replication is return receipt or asynchronous

Complete the following tasks:

  1. Stop the replication agent on the failed database if it has not already been stopped.

  2. On the standby database, execute ttRepStateSet('ACTIVE'). This changes the role of the database from STANDBY to ACTIVE.

  3. On the new active database, execute ttRepStateSave('FAILED', 'failed_database','host_name'), where failed_database is the former active database that failed. This step is necessary for the new active database to replicate directly to the subscriber databases. During normal operation, only the standby database replicates to the subscribers.

  4. Destroy the failed database.

  5. Duplicate the new active database to the new standby database.

  6. Set up the replication agent policy and start the replication agent. See "Starting and stopping the replication agents".

The standby database contacts the active database. The active database stops sending updates to the subscribers. When the standby database is fully synchronized with the active database, then the standby database enters the STANDBY state and starts sending updates to the subscribers. If you are replicating an AWT cache group, the new standby database takes over processing of the cache group automatically when it enters the STANDBY state.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateGet built-in procedure.

When replication is return twosafe

Complete the following tasks:

  1. On the standby database, execute ttRepStateSet('ACTIVE'). This changes the role of the database from STANDBY to ACTIVE.

  2. On the new active database, execute ttRepStateSave('FAILED', 'failed_database','host_name'), where failed_database is the former active database that failed. This step is necessary for the new active database to replicate directly to the subscriber databases. During normal operation, only the standby database replicates to the subscribers.

  3. Connect to the failed database. This triggers recovery from the local transaction logs. If database recovery fails, you must continue from Step 5 of the procedure for recovering when replication is return receipt or asynchronous. See "When replication is return receipt or asynchronous".

  4. Verify that the replication agent for the failed database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

When the active database determines that it is fully synchronized with the standby database, then the standby database enters the STANDBY state and starts sending updates to the subscribers. If you are replicating an AWT cache group, the new standby database takes over processing of the cache group automatically when it enters the STANDBY state.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateSet built-in procedure.

Recovering when the standby database is not ready

Consider the following scenarios:

  • The standby database fails. The active database fails before the standby comes back up or before the standby has been synchronized with the active database.

  • The active database fails. The standby database becomes ACTIVE, and the rest of the recovery process begins. (See "Recovering from a failure of the active database".) The new active database fails before the new standby database is fully synchronized with it.

In both scenarios, the subscribers may have had more changes applied than the standby database.

When the active database fails and the standby database has not applied all of the changes that were last sent from the active database, there are two choices for recovery:

  • Recover the active database from the local transaction logs.

  • Recover the standby database from the local transaction logs.

The choice depends on which database is available and which is more up to date.

Recover the active database

  1. Connect to the failed active database. This triggers recovery from the local transaction logs.

  2. Verify that the replication agent for the failed active database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

  3. Execute ttRepStateSet('ACTIVE') on the newly recovered database.

  4. Continue with Step 6 in "Setting up an active standby pair with no cache groups".

Recover the standby database

  1. Connect to the failed standby database. This triggers recovery from the local transaction logs.

  2. If the replication agent for the standby database has automatically restarted, you must stop the replication agent. See "Starting and stopping the replication agents".

  3. Drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.

  4. Re-create the replication configuration using the CREATE ACTIVE STANDBY PAIR statement.

  5. Set up the replication agent policy and start the replication agent. See "Starting and stopping the replication agents".

  6. Execute ttRepStateSet('ACTIVE') on the master database, giving it the ACTIVE role.

  7. Continue from Step 6 in "Setting up an active standby pair with no cache groups".

Failing back to the original nodes

After a successful failover, you may wish to fail back so that the active database and the standby database are on their original nodes. See "Reversing the roles of the active and standby databases" for instructions.

Recovering from a failure of the standby database

To recover from a failure of the standby database, complete the following tasks:

  1. Detect the standby database failure.

  2. If return twosafe service is enabled, the failure of the standby database may prevent a transaction in progress from being committed on the active database, resulting in error 8170, "Receipt or commit acknowledgement not returned in the specified timeout interval". If so, then call the ttRepSyncSet procedure with a localAction parameter of 2 (COMMIT) and commit the transaction again. For example:

    call ttRepSyncSet( null, null, 2);
    commit;
    
  3. Execute ttRepStateSave('FAILED','standby_database','host_name') on the active database. After this, as long as the standby database is unavailable, updates to the active database are replicated directly to the subscriber databases. Subscriber databases may also be duplicated directly from the active.

  4. If the replication agent for the standby database has automatically restarted, stop the replication agent. See "Starting and stopping the replication agents".

  5. Recover the standby database in one of the following ways:

    • Connect to the standby database. This triggers recovery from the local transaction logs.

    • Duplicate the standby database from the active database.

    The amount of time that the standby database has been down and the amount of transaction logs that need to be applied from the active database determine the method of recovery that you should use.

  6. Set up the replication agent policy and start the replication agent. See "Starting and stopping the replication agents".

The standby database enters the STANDBY state after the active database determines that the two master databases have been synchronized.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateGet procedure.

Recovering from the failure of a subscriber database

If a subscriber database fails, then you can recover it by one of the following methods:

  • Connect to the failed subscriber. This triggers recovery from the local transaction logs. Start the replication agent and let the subscriber catch up.

  • Duplicate the subscriber from the standby database.

If the standby database is down or in recovery, then duplicate the subscriber from the active database.

After the subscriber database has been recovered, then set up the replication agent policy and start the replication agent. See "Starting and stopping the replication agents".

Reversing the roles of the active and standby databases

To change the role of the active database to standby and vice versa:

  1. Pause any applications that are generating updates on the current active database.

  2. Execute ttRepSubscriberWait on the active database, with the DSN and host of the current standby database as input parameters. This ensures that all updates have been transmitted to the current standby database.

  3. Stop the replication agent on the current active database. See "Starting and stopping the replication agents".

  4. Execute ttRepDeactivate on the current active database. This puts the database in the IDLE state.

  5. Execute ttRepStateSet('ACTIVE') on the current standby database. This database now acts as the active database in the active standby pair.

  6. Set up the replication agent policy and start the replication agent on the old active database. Use the ttRepStateGet procedure to determine when the database's state has changed from IDLE to STANDBY. The database now acts as the standby database in the active standby pair.

  7. Resume any applications that were paused in Step 1.

Detection of dual active databases

Ordinarily, the designation of the active and standby databases in an active standby pair is explicitly controlled by the user. However, in some circumstances the user may not have the ability to modify both the active and standby databases when changing the role of the standby database to active.

For example, if network communication to the site of an active database is interrupted, the user may need the standby database at a different site to take over the role of the active, but cannot stop replication on the current active or change its role manually. Changing the standby database to active without first stopping replication on the active leads to a situation where both masters are in the ACTIVE state and accepting transactions. In such a scenario, TimesTen can automatically negotiate the active/standby role of the master databases when network communication between the databases is restored.

If, during the initial handshake between the databases, TimesTen determines that the master databases in an active standby pair replication scheme are both in the ACTIVE state, TimesTen performs the following operations automatically:

  • The database which was set to the ACTIVE state most recently is left in the ACTIVE state and may continue to be connected to and updated by applications.

  • The database which was set to the ACTIVE state least recently is invalidated. All applications are disconnected.

  • When the invalidated database comes back up, TimesTen determines whether any transactions have occurred on the database that have not yet been replicated to the other master database. If such transactions have occurred, they are now trapped, and the database is left in the IDLE state. The database needs to be duplicated from the active in order to become a standby. If there are no trapped transactions, the database is safe to use as a standby database and is automatically set to the STANDBY state.