1 Overview of Oracle R Enterprise

R is an open source statistical programming language and environment. For information about R, see the R Project for Statistical Computing at http://www.r-project.org.

R provides an environment for statistical computing, including:

  • An easy-to-use language

  • A powerful graphical environment for visualization

  • Many out-of-the-box statistical techniques

  • R packages (An R package is a set of related functions, help files, and data files; currently, there are more than 3340 R packages.)

  • The R Console graphical user interface for analyzing data interactively

R's rapid adoption has earned it a reputation as a new statistical software standard.

Oracle R Enterprise is a component of the Oracle Advanced Analytics Option of Oracle Database 12c Release 1 (12.1) Enterprise Edition. For detailed information about Oracle R Enterprise, including links to software downloads, go to Oracle R Enterprise at http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html.

Oracle R Enterprise allows users to perform statistical analysis on data stored in tables in an Oracle Database. Oracle R Enterprise has these components:

  • The Oracle R Enterprise R transparency layer. The transparency layer is a collection of packages that support mapping of R data types to Oracle Database objects and generate SQL transparently in response to R expressions on mapped data types. The transparency layer allows an R user to directly interact with database-resident data using R language constructs. This enables R users to work with data too large to fit into the memory of a user's desktop system.

  • Oracle statistics engine, a collection of statistical functions and procedures corresponding to commonly-used statistical libraries. The statistics engine packages execute in Oracle Database.

  • SQL extensions supporting R engine execution through the database on the database server. These SQL extensions enable productizing R scripts, that is, running R scripts in a lights-out mode.

  • Oracle R Connector for Hadoop is an R package executing MapReduce jobs that enables R users to directly work with an Oracle Hadoop cluster executing computations written in the R language and working on data resident in HDFS, Oracle database or local files.

The components of Oracle R Enterprise are described in Chapter 3.

Oracle R Connector for Hadoop is a related product.

Oracle R Enterprise also includes functions that perform most common or base statistical procedures; see Chapter 5 for more information.

The rest of this chapter describes Oracle R Enterprise Architecture, Oracle R Enterprise Data Types, and Oracle R Enterprise Supported Configurations.

Oracle R Enterprise Architecture

Oracle R Enterprise has these three components including the connector for Hadoop:

Description of oreug_vm_001.png follows
Description of the illustration oreug_vm_001.png

  1. The Client R Engine is a collection of R packages that allows you to connect to an Oracle Database and to interact with data in that database.

    You can use any R commands from the client. In addition, the client supplies these functions:

    • The R SQL Transparency layer intercepts R functions for scalable in-database execution

    • Functions intercept data transforms, statistical functions, and Oracle R Enterprise-specific functions

    • Interactive display of graphical results and flow control as in open source R

    • Submission of R closures (functions) for execution in Oracle Database

  2. The Server is a collection of PL/SQL procedures and libraries that augment Oracle Database 12c Release 1 (12.1) with the capabilities required to support an Oracle R Enterprise client. The R engine is also installed on Oracle Database to support embedded R execution. Oracle Database spawns R engines, which can provide data parallelism.

    The Oracle R Enterprise Database engine provides this functionality:

    • Scale to large datasets

    • Access to tables, views, and external tables in the database, as well as those accessible through database links

    • Use SQL query parallel execution

    • Use in-database statistical and data mining functionality

  3. R Engines spawned by Oracle Database support database-managed parallelism and provide lights-out scheduled execution of R scripts, that is, scheduling or triggering R scripts packaged inside a PL/SQL or SQL query. Oracle R Enterprise provides efficient transfer to and from the spawned engines. Embedded R execution can be used to emulate MapReduce style programming.

There are several data types specific to Oracle R Enterprise; see Oracle R Enterprise Data Types for details.

Oracle R Connector for Hadoop

Oracle R Connector for Hadoop (ORHC) is an R package that provides an interface between the local R environment and Hadoop. You install and load this package just as you would any other R package. Using R functions, you can copy data between R memory, the local file system, and HDFS. You can schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations.

ORHC is preinstalled on Oracle Big Data Appliance, but it is licensed separately as one of Oracle Big Data Connectors. You can install ORHC on a Hadoop cluster other than one on an Oracle Big Data Appliance.

For information about ORHC, see the Oracle Big Data Connectors User's Guide (http://docs.oracle.com/cd/E27101_01/doc.10/e27365/toc.htm), part of the Oracle Big Data Documentation library (http://docs.oracle.com/cd/E27101_01/index.htm).

Oracle R Enterprise Data Types

Oracle R Enterprise introduces a variant to many R data types. The name of the Oracle R Enterprise data type is the name of the corresponding R data type prefixed by ore. These data types establish a mapping between an R object and a database table or view. The mapping tracks metadata of the Oracle object which in turn aids in SQL query generation. These data types form the foundation of the Oracle R Enterprise transparency layer.

The following R data types have been overloaded for transparent in-database execution:

  • Character, Integer, Numeric and Logical vectors

  • Factors

  • Data Frame

  • Matrix is overloaded in two situations:

    • Linear algebra cross-products

    • Creating input matrices for advanced analytics

For more information and examples, see Oracle R Enterprise Transparency Layer .

Oracle R Enterprise Supported Configurations

Oracle R Enterprise consists of a client and a server. The client and server both run on Microsoft Windows (32-bit and 64-bit), Oracle Linux, Red Hat Linux, Solaris, and IBM AIX. The server is installed in Oracle Database 12c Release 1 (12.1), to which the client connects. Oracle R Enterprise also runs on Oracle Exadata machines with the Linux or Solaris operating system and on SPARC SuperCluster. For details, see Prerequisites.

Installation of Oracle R Enterprise is described in Chapter 2.