Replicating Oracle ACFS File Systems

This section discusses the operations to manage replication on an Oracle ACFS file system on Linux.

The disk groups on which volumes are created for the primary and standby file systems must have compatibility attributes for ASM and ADVM set to 11.2.0.3 or higher. For information about disk group compatibility, refer to "Disk Group Compatibility".

The steps to manage replication are:

Determine the storage capacity necessary for replication on the sites hosting the primary and standby file systems. The primary file system must have a minimum size of 4 GB for each node that is mounting the file system. The standby file system must have a minimum size of 4 GB and should be sized appropriately for the amount of data being replicated and the space necessary for the replication logs sent from the primary file system.

Calculate the replication-related storage requirement for the primary file system, then use the same size requirement for the standby file system. If Oracle ACFS tagging is used to replicate only a subset of the files in the primary file system, then the size requirement for the standby file system is proportional to that subset of the primary file system.

Run the acfsutil info fs command with the -s interval option on the node where the primary file system is mounted to display the amount and rate of change to the primary file system for the node. The amount of change includes all user and metadata modifications to the primary file system. This amount approximates the size of replication logs that are generated when recording changes to the file system. Changes are stored in temporary files called replication logs which are kept in a special directory in the primary file system until they can be sent to the standby to be applied. After confirmation is received that the changes contained in a replication log have been successfully applied to the standby file system, the replication logs on the primary file system are deleted.

To approximate the extra storage capacity necessary for the replication logs, determine the following:
- The time interval during which the site hosting the primary file system may experience network connectivity problems or slowdowns when accessing the site hosting the standby file system.
- The time interval during which the site hosting the standby file system may be taken offline for maintenance.
These time intervals are used in calculating the amount and rate of change in storage space. You must account for the time interval when the primary file system cannot send the replication logs over to the standby file system at its usual rate or when standby file systems are inaccessible while undergoing maintenance. The replication logs continue to accumulate on the site hosting the primary file system and may eventually cause that site to run out of space.

For the following scenario, assume t = 60 minutes is the time interval in your environment that would adequately account for network problems or maintenance on site hosting the standby file system.

Run acfsutil info fs -s 900 on the primary file system to collect the average rate of change over a 24 hour period with a 15 minute (900 seconds) interval. The t/4 (60/4=15) result is the value for the sampling interval. Do not exceed a value of t/2 for the time interval as you may miss some important peaks.
```
$ /sbin/acfsutil info fs -s 900 /acfsmounts/repl_data
```
With the output, you can determine the average rate of change, the peak rate of change, and how long the peaks last. However, the command displays information only for the node on which the command is run. To collect the total amount of change in the file system the command must be run on every node that is modifying the file system. The maximum number of supported nodes is eight.

The following formula approximates the extra storage capacity needed:
```
Extra storage capacity to hold replication logs = 
    (Number-nodes-on-primary * 1GB) + P
```
where P is the peak amount of change generated across all nodes for time t as reported by the acfsutil info fs –s output.

In the example, you must total the changes from four 15-minute intervals to find the total amount of change that could occur in 60 minutes. You may choose to use the single hour that generated the largest amount of change, or you could select the top four 15-minute intervals even if they did not occur together to prepare for the worst-case scenario.

Assume that you have four nodes modifying the primary file system, and that during the measured interval, the peak amount of change reported for the 60 minutes is approximately 20 GB for all nodes. Using the storage capacity formula, 24 GB of excess storage capacity on each site hosting the primary file system is required for storage of the replication logs.
```
Extra storage capacity to hold replication logs = (4 * 1GB per node) + 
 20GB maximum change per hour = 24GB of extra storage capacity
```
Next, check that the network transfer rate is greater than or equal to the rate of change observed during the monitoring period. In the previous example, the peak of 20 GB of changed data per hour is equivalent to a peak rate of change of about 5.5 MB/sec. To keep up with this rate of change, you must ensure that the network can reliably transfer at least this amount of data per second without negatively impacting your existing network workloads.

To estimate your current actual network transfer rate, calculate the elapsed time required to FTP a 1 GB file from the primary file system to the intended standby file system during an interval when network usage is low. For example, if the 1 GB file transfers in 30 seconds, then your current FTP transfer rate is 33 MB per seconds (1000 MB/30 seconds = 33 MB per second). Because of various delays inherent in the transfers, for planning purposes you should reduce this measured FTP transfer rate by 20%, and then by an additional 5% per node.

In the previous example with 4 nodes, the FTP transfer rate used for planning is:
```
33 MB/sec * (1 – (0.2) – (4 * 0.05)) = 33 * (0.6) = ~20MB/sec
```
Because the peak rate of change was only 5.5 MB/second, you can expect the network to be able to handle this additional workload in this example. However, if the network capacity is close to fully utilized, you might want to consider increasing network capacity before implementing replication for this file system and workload.

In addition, insure you have sufficient network capacity to allow replication to catch up after times when network problems prevent a primary file system from sending replication logs to the standby file system.

For more information, refer to "acfsutil info fs".
Set up tags, user names, and service names.

When starting replication on an Oracle ACFS file system, first perform the following steps:
- Determine the user name and password that the sites hosting the primary and standby file systems use to connect to the remote Oracle ASM instance as the Oracle ASM and DBA administrator. All nodes that have the file system mounted must support this user name and password. The user must have SYSASM and SYSDBA privileges. For example:
```
SQL> CREATE USER primary_admin IDENTIFIED BY primary_passwd;
SQL> GRANT sysasm,sysdba TO primary_admin;
```
  Oracle wallets should be used to manage security credentials.
  See Also:
  - Oracle Database Enterprise User Security Administrator's Guide for information about Oracle wallets
  - Oracle Database SecureFiles and Large Objects Developer's Guide for information about wallet management
  - Oracle Database Net Services Reference for information about wallet parameters in the SQLNET.ORA file
- Determine a unique service name for the replicated file system.
  
  When both the primary and standby file systems are located in different clusters for disaster tolerance, then the service names for the primary and standby file systems can be the same. However, if the both file systems are mounted on the same node, such as a test configuration, then unique service names must be used for the primary and standby file systems. Using unique service names for the primary and standby file systems requires the use of the -c option during replication initialization. Service names are limited to a maximum of 128 bytes.
  Note:
  - You must specify a service name other than +ASM because that service name is currently in use by the Oracle ASM instance.
  - You must specify a unique service name for each file system that you want to replicate when there are multiple replicated file systems on a node or cluster.
  Using this service name, create a net service alias on the sites hosting the primary and standby file system that connects to the remote site. This alias along with the user name and password are used as the connection string in the replication initialization commands.
  
  For example, the following are examples of connect descriptors with net service aliases for the sites hosting the primary and standby file systems.
```
primary_repl_site=(DESCRIPTION=
  (ADDRESS=(PROTOCOL=tcp)(HOST=primary1.example.com)(PORT=1521))
  (ADDRESS=(PROTOCOL=tcp)(HOST=primary2.example.com)(PORT=1521))
  (CONNECT_DATA=(SERVICE_NAME=primary_service)))

standby_repl_site=(DESCRIPTION=
  (ADDRESS=(PROTOCOL=tcp)(HOST=standby1.example.com)(PORT=1521))
  (CONNECT_DATA=(SERVICE_NAME=standby_service)))
```
  If you want to perform replication using a single client access name (SCAN) VIP, you must update the REMOTE_LISTENER initialization parameter in the Oracle ASM instance before initializing replication. You can update the parameter in the initialization file or with the ALTER SYSTEM SQL statement.
  
  For example:
```
SQL> ALTER SYSTEM SET remote_listener='SCAN_NAME:1521' sid='*' scope=both;
```
  See Also:
  
  Oracle Database Net Services Administrator's Guide for information about connect descriptors
- Optionally set tags on directories and files to replicate only selected files in an Oracle ACFS file system. You can also add tags to files after replication has started. For information about the steps to tag files, refer to "Tagging Oracle ACFS File Systems".
Configure the site hosting the standby file system.

Before replicating an Oracle ACFS file system, configure the site hosting the standby file system by performing the following procedures.
- Create a new file system of adequate size to hold the replicated files and associated replication logs from the primary file system. For example: /standby/repl_data
- Mount the file system on one node only.
- Run the acfsutil repl init standby command. If this command is interrupted for any reason, the user must re-create the file system, mount it on one node only, and rerun the command. This command requires the following configuration information:
  - The connect string to be used to connect to the site hosting the primary file system. For example:
    
    /@primary_repl_site
    
    The connect string searches in the user's wallet for the user name and password associated with the alias primary_repl_site. The user must have SYSASM and SYSDBA privileges.
  - If the standby file system is using a different service name than the primary file system, then the use -c option. This option specifies the service name for the standby file system. For example:
    
    standby_repl_service
  - The mount point of the standby file system. For example:
    
    /standby/repl_data
For example, run the following acfsutil repl init standby command on the site hosting the standby file system.
```
$ /sbin/acfsutil repl init standby 
    -p /@primary_repl_site
    -c standby_repl_service /standby/repl_data
```
The acfsutil repl init standby command requires root or system administrator privileges to run.

For more information, refer to "acfsutil repl init".
Configure the site hosting the primary file system.

After the standby file system has been set up, configure the site hosting the primary file system and start replication by performing the following procedures.

Run the acfsutil repl init primary command. This command requires the following configuration information:
- The connect string to be used to connect to the site hosting the standby file system. For example:
  
  /@standby_repl_site
  
  The connect string searches in the user's wallet for the user name and password associated with the alias standby_repl_site. The user must have SYSASM and SYSDBA privileges.
- The mount point of the primary file system. For example: /acfsmounts/repl_data
- If the primary file system is using a different service name than the standby file system, then use the -c option. This option specifies the service name on the site hosting the primary file system. For example:
  
  primary_repl_service
- If the mount point is different on the site hosting the standby file system than it is on the site hosting the primary file system, specify the mount point on the standby file system with the -m standby_mount_point option. For example:
  
  -m /standby/repl_data
For example, run the following acfsutil repl init primary command on the site hosting the primary file system.
```
$ /sbin/acfsutil repl init primary 
     -s /@standby_repl_site
     -m /standby/repl_data -c primary_repl_service 
     /acfsmounts/repl_data
```
The acfsutil repl init primary command requires root or system administrator privileges to run.

For more information, refer to "acfsutil repl init".
Monitor information about replication on the file system.

The acfsutil repl info command displays information about the state of the replication processing on the primary or standby file system.

For example, run the following acfsutil repl info command on the site hosting the primary file system to display configuration information.
```
$ /sbin/acfsutil repl info -c -v /acfsmounts/repl_data
```
You must have system administrator or Oracle ASM administrator privileges to run this command.

For information, refer to "acfsutil repl info".
Manage replication background processes.

Run the acfsutil repl bg command to start, stop, or retrieve information about replication background processes.

For example, the following example displays information about the replication processes for the /acfsmounts/repl_data file system.
```
$ /sbin/acfsutil repl bg info /acfsmounts/repl_data
```
You must have system administrator or Oracle ASM administrator privileges to run the acfsutil repl bg info command.

For more information, refer to "acfsutil repl bg".
Pause replication momentarily only if necessary.

Run the acfsutil repl pause to momentarily stop replication. You should run the acfsutil repl resume command soon as possible to resume replication.

For example, the following command pauses replication on the /acfsmounts/repl_data file system.
```
$ /sbin/acfsutil repl pause /acfsmounts/repl_data
```
The following command resumes replication on the /acfsmounts/repl_data file system.
```
$ /sbin/acfsutil repl resume /acfsmounts/repl_data
```
You must have system administrator or Oracle ASM administrator privileges to run these commands.

For more information, refer to "acfsutil repl pause" and "acfsutil repl resume".
Failing over to a standby or turning a standby file system into an active file system.

If the primary file system is gone, you can run the acfsutil terminate standby mount_point command to turn the standby file system into an active file system. If the primary file system still exists, you should terminate the primary first with acfsutil terminate primary mount_point.

Before terminating replication with acfsutil repl terminate on the standby file system, you can determine the point in time of the primary file system that the standby file system represents. This timestamp is displayed with the acfsutil repl info -c command as Last sync time with primary. If the failover action must be coordinated with Oracle Data Guard, you can use the timestamp to set back the database if needed or perform other necessary actions that are based on the time stamp. For information, refer to "acfsutil repl info".

Note:

On an Oracle ACFS file system df reports space usage by internal metadata plus user files and directories. du only reports the space usage of user files and directories. Depending on the size of the volume and number of the nodes, internal metadata is allocated in varying sizes. Additionally, with replication enabled an internal replication log is allocated for each node that is used to record changes to the file system before exposing the replication log to user space daemons to transport to the standby.

For more information about replicating an Oracle ACFS file system, refer to "Oracle ACFS Replication".