9 Using the XML Pipeline Processor for Java

This chapter contains these topics:

Introduction to the XML Pipeline Processor
Using the XML Pipeline Processor: Overview
Processing XML in a Pipeline

Introduction to the XML Pipeline Processor

This section contains the following topics:

Prerequisites
Standards and Specifications
Multistage XML Processing
Customized Pipeline Processes

Prerequisites

This chapter assumes that you are familiar with the following topics:

XML Pipeline Definition Language. This XML vocabulary enables you to describe the processing relations between XML resources. If you require a more thorough introduction to the Pipeline Definition Language, consult the XML resources listed in "Related Documents" of the preface.
Document Object Model (DOM). DOM is an in-memory tree representation of the structure of an XML document.
Simple API for XML (SAX). SAX is a standard for event-based XML parsing.
XML Schema language. Refer to Chapter 7, "Using the Schema Processor for Java" for an overview and links to suggested reading.

Standards and Specifications

The Oracle XML Pipeline processor is based on the W3C XML Pipeline Definition Language Version 1.0 Note. The W3C Note defines an XML vocabulary rather than an API. You can find the Pipeline specification at the following URL:

http://www.w3.org/TR/xml-pipeline/

"Pipeline Definition Language Standard for the XDK for Java" describes the differences between the Oracle XDK implementation of the Oracle XML Pipeline processor and the W3C Note.

Multistage XML Processing

The Oracle XML Pipeline processor is built on the XML Pipeline Definition Language. The processor can take an input XML pipeline document and execute pipeline processes according to derived dependencies. A pipeline document, which is written in XML, specifies the processes to be executed in a declarative manner. You can associate Java classes with processes by using the <processdef/> element in the pipeline document.

Use the Pipeline processor for mutistage processing, which occurs when you process XML components sequentially or in parallel. The output of one stage of processing can become the input of another stage of processing. You can write a pipeline document that defines the inputs and outputs of the processes. Figure 9-1 illustrates a possible pipeline sequence.

Figure 9-1 Pipeline Processing

Description of "Figure 9-1 Pipeline Processing"

In addition to the XML Pipeline processor itself, the XDK provides an API for processes that you can pipe together in a pipeline document. Table 9-2 summarizes the classes provided in the oracle.xml.pipeline.processes package.

The typical stages of processing XML in a pipeline are as follows:

Parse the input XML documents. The oracle.xml.pipeline.processes package includes DOMParserProcess for DOM parsing and SAXParserProcess for SAX parsing.
Validate the input XML documents.
Serialize or transform the input documents. Note that the Pipeline processor does not enable you to connect the SAX parser to the XSLT processor, which requires a DOM.

In multistage processing, SAX is ideal for filtering and searching large XML documents. You should use DOM when you need to change XML content or require efficient dynamic access to the content.

Customized Pipeline Processes

The oracle.xml.pipeline.controller.Process class is the base class for all pipeline process definitions. The classes in the oracle.xml.pipeline.processes package extend this base class. To create a customized pipeline process, you need to create a class that extends the Process class.

At the minimum, every custom process should override the do-nothing initialize() and execute() methods of the Process class. If the customized process accepts SAX events as input, then it should override the SAXContentHandler() method to return the appropriate ContentHandler that handles incoming SAX events. It should also override the SAXErrorHandler() method to return the appropriate ErrorHandler. Table 9-1 provides further descriptions of the preceding methods.

Table 9-1 Methods in the oracle.xml.pipeline.controller.Process Class

Class	Description
`initialize()`	Initializes the process before execution. Call `getInput()` to fetch a specific input object associated with the process element and call `supportType()` to indicate the types of input supported. Analogously, call `getOutput()` and `supportType()` for output.
`execute()`	Executes the process. Call `getInParaValue()`, `getInput()`, or `getInputSource()` to fetch the inputs to the process. If a custom process outputs SAX events, then it should call the `getSAXContentHandler()` and `getSAXErrorHandler()` methods in `execute()` to get the SAX handlers of the following processes in the pipeline. Call `setOutputResult()`, `getOutputStream()`, `getOutputWriter()` or `setOutParam()` to set the outputs or outparams generated by this process. Call `getErrorSource()`, `getErrorStream()`, or `getErrorDocument()` to access the pipeline error element associated with this process element. If an exception occurs during `execute()`, call `error()` or `info()` to propagate it to the `PipelineErrorHandler`.
`SAXContentHandler()`	Returns the SAX `ContentHandler`. If dependencies from other processes are not available at this time, then return `null`. When these dependencies are available the method will be executed till the end.
`SAXErrorHandler()`	Returns the SAX `ErrorHandler`. If you do not override this method, then the JAXB processor uses the default error handler implemented by this class to handle SAX errors.

Using the XML Pipeline Processor: Overview

This section contains the following topics:

Using the XML Pipeline Processor: Basic Process
Running the XML Pipeline Processor Demo Programs
Using the XML Pipeline Processor Command-Line Utility

Using the XML Pipeline Processor: Basic Process

The XML Pipeline processor is accessible through the following packages:

oracle.xml.pipeline.controller, which provides an XML Pipeline controller that executes XML processes in a pipeline based on dependencies.
oracle.xml.pipeline.processes, which provides wrapper classes for XML processes that can be executed by the XML Pipeline controller. The oracle.xml.pipeline.processes package contains the classes that you can use to design a pipeline application framework. Each class extends the oracle.xml.pipeline.controller.Process class.

Table 9-2 lists the components in the package. You can connect these components and processes through a combination of the XML Pipeline processor and a pipeline document.

Table 9-2 Classes in oracle.xml.pipeline.processes

Class	Description
`CompressReaderProcess`	Receives compressed XML and outputs parsed XML.
`CompressWriterProcess`	Receives XML parsed with DOM or SAX and outputs compressed XML.
`DOMParserProcess`	Parses incoming XML and outputs a DOM tree.
`SAXParserProcess`	Parses incoming XML and outputs SAX events.
`XPathProcess`	Accepts a DOM as input, uses an XPath pattern to select one or more nodes from an XML `Document` or an XML `DocumentFragment`, and outputs a `Document` or `DocumentFragment`.
`XSDSchemaBuilder`	Parses an XML schema and outputs a schema object for validation. This process is built into the XML Pipeline processor and builds schema objects used for validating XML documents.
`XSDValProcess`	Validates against a local schema, analyzes the results, and reports errors if necessary.
`XSLProcess`	Accepts DOM as input, applies an XSL stylesheet, and outputs the result of the transformation.
`XSLStylesheetProcess`	Receives an XSL stylesheet as a stream or DOM and creates an `XSLStylesheet` object.

Figure 9-2 illustrates how to pass a pipeline document to a Java application that uses the XML Pipeline processor, configure the processor, and execute the pipeline.

Figure 9-2 Using the Pipeline Processor for Java

The program flow of an XML Pipeline processor application.

Description of "Figure 9-2 Using the Pipeline Processor for Java"

The basic steps are as follows:

Instantiate a pipeline document, which forms the input to the pipeline execution. Create the object by passing a FileReader to the constructor as follows:
```
PipelineDoc pipe;
FileReader f;
pipe = new PipelineDoc((Reader)f, false);
```

Instantiate a pipeline processor. PipelineProcessor is the top-level class that executes the pipeline. Table 9-3 describes some of the available methods.

Table 9-3 PipelineProcessor Methods

Method	Description
`executePipeline()`	Executes the pipeline based on the `PipelineDoc` set by invoking `setPipelineDoc()`.
`getExecutionMode()`	Gets the type of execution mode: `PIPELINE_SEQUENTIAL` or `PIPELINE_PARALLEL`.
`setErrorHandler()`	Sets the error handler for the pipeline. This call is mandatory to execute the pipeline.
`setExecutionMode()`	Sets the execution mode. `PIPELINE_PARALLEL` is the default and specifies that the processes in the pipeline should execute in parallel. `PIPELINE_SEQUENTIAL` specifies that the processes in the pipeline should execute sequentially.
`setForce()`	Sets execution behavior. If `TRUE`, then the pipeline executes regardless of whether the target is up-to-date with respect to the pipeline inputs.
`setPipelineDoc()`	Sets the `PipelineDoc` object for the pipeline.

The following statement instantiates the pipeline processor:

proc = new PipelineProcessor();

Set the processor to the pipeline document. For example:
```
proc.setPipelineDoc(pipe);
```
Set the execution mode for the processor and perform any other needed configuration. For example, set the mode by passing a constant to PipelineProcessor.setExecutionMode().

The following statement specifies sequential execution:
```
proc.setExecutionMode(PipelineConstants.PIPELINE_SEQUENTIAL); 
```
Instantiate an error handler. The error handler must implement the PipelineErrorHandler interface. For example:
```
errHandler = new PipelineSampleErrHdlr(logname);
```
Set the error handler for the processor by invoking setErrorHandler(). For example:
```
proc.setErrorHandler(errHandler);
```
Execute the pipeline. For example:
```
proc.executePipeline();
```
See Also:
- Oracle Database XML Java API Reference to learn about the oracle.xml.pipeline subpackages
- "Creating a Pipeline Document"

Running the XML Pipeline Processor Demo Programs

Demo programs for the XML Pipeline processor are included in $ORACLE_HOME/xdk/demo/java/pipeline. Table 9-4 describes the XML files and Java source files that you can use to test the utility.

Table 9-4 Pipeline Processor Sample Files

File	Description
`README`	A text file that describes how to set up the Pipeline processor demos.
`PipelineSample.java`	A sample Pipeline processor application. The program takes `pipedoc.xml` as its first argument.
`PipelineSampleErrHdlr.java`	A sample program to create an error handler used by `PipelineSample`.
`book.xml`	A sample XML document that describes a series of books. This document is specified as an input by `pipedoc.xml`, `pipedoc2.xml`, and `pipedocerr.xml`.
`book.xsl`	An XSLT stylesheet that transforms the list of books in `book.xml` into an HTML table.
`book_err.xsl`	An XSLT stylesheet specified as an input by the `pipedocerr.xml` pipeline document. This stylesheet contains an intentional error.
`id.xsl`	An XSLT stylesheet specified as an input by the `pipedoc3.xml` pipeline document.
`items.xsd`	An XML schema document specified as an input by the `pipedoc3.xml` pipeline document.
`pipedoc.xml`	A pipeline document. This document specifies that process p1 should parse `book.xml` with DOM, process p2 should parse `book.xsl` and create a stylesheet object, and process p3 should apply the stylesheet to the DOM to generate `myresult.html`.
`pipedoc2.xml`	A pipeline document. This document specifies that process p1 should parse `book.xml` with SAX, process p2 should generate compressed XML `compxml` from the SAX events, and process p3 should regenerate the XML from the compressed stream as `myresult2.html`.
`pipedoc3.xml`	A pipeline document. This document specifies that a process p5 should parse `po.xml` with DOM, process p1 should select a single node from the DOM tree with an XPath expression, process p4 should parse `items.xsd` and generate a schema object, process p6 should validate the selected node against the schema, process p3 should parse `id.xsl` and generate a stylesheet object, and validated node to produce `myresult3.html`.
`pipedocerr.xml`	A pipeline document. This document specifies that process p1 should parse `book.xml` with DOM, process p2 should parse `book_err.xsl` and generate a stylesheet object if it encounters no errors and apply an inline stylesheet if it encounters errors, and process p3 should apply the stylesheet to the DOM to generate `myresulterr.html`. Because `book_err.xsl` contains an error, the program should write the text contents of the input XML to `myresulterr.html`.
`po.xml`	A sample XML document that describes a purchase order. This document is specified as an input by `pipedoc3.xml`.

Documentation for how to compile and run the sample programs is located in the README. The basic steps are as follows:

Change into the $ORACLE_HOME/xdk/demo/java/pipeline directory (UNIX) or %ORACLE_HOME%\xdk\demo\java\pipeline directory (Windows).
Make sure that your environment variables are set as described in "Setting Up the Java XDK Environment".
Run make (UNIX) or Make.bat (Windows) at the system prompt to generate class files for PipelineSample.java and PipelineSampleErrHdler.java and run the demo programs. The programs write output files to the log subdirectory.

Alternatively, you can run the demo programs manually by using the following syntax:
```
java PipelineSample pipedoc pipelog [ seq | para ]
```
The pipedoc option specifies which pipeline document to use. The pipelog option specifies the name of the pipeline log file, which is optional unless you specify seq or para, in which case a filename is required. If you do not specify a log file, then the program generates pipeline.log by default. The seq option processes threads sequentially; para processes in parallel. If you specify neither seq or para, then the default is parallel processing.
View the files generated from the pipeline, which are all named with the initial string myresult, and the log files.

Using the XML Pipeline Processor Command-Line Utility

The command-line interface for the XML Pipeline processor is named orapipe. The Pipeline processor is packaged with Oracle database. By default, the Oracle Universal Installer installs the utility on disk in $ORACLE_HOME/bin.

Before running the utility for the first time, make sure that your environment variables are set as described in "Setting Up the Java XDK Environment". Run orapipe at the operating system command line with the following syntax:

orapipe options pipedoc

The pipedoc is the pipeline document, which is required. Table 9-5 describes the available options for the orapipe utility.

Table 9-5 orapipe Command-Line Options

Option	Purpose
`-help`	Prints the help message
`-log` `logfile`	Writes errors and messages to the specified log file. The default is `pipeline.log`.
`-noinfo`	Does not log informational items. The default is on.
`-nowarning`	Does not log warnings. The default is on.
`-validate`	Validates the input `pipedoc` with the pipeline schema. Validation is turned off by default. If `outparam` feature is used, then `validate` fails with the current pipeline schema because this is an additional feature.
`-version`	Prints the release version.
`-sequential`	Executes the pipeline in sequential mode. The default is parallel.
`-force`	Executes pipeline even if target is up-to-date. By default no force is specified.
`-attr` `name` `value`	Sets the value of `$name` to the specified `value`. For example, if the attribute name is `source` and the value is `book.xml`, then you can pass this value to an element in the pipeline document as follows: `<input ... label="$source">`.

Processing XML in a Pipeline

This section contains the following topics:

Creating a Pipeline Document
Writing a Pipeline Processor Application
Writing a Pipeline Error Handler

Creating a Pipeline Document

To use the Oracle XML Pipeline processor, you must create an XML document according to the rules of the Pipeline Definition Language specified in the W3C Note.

The W3C specification defines the XML processing components and the inputs and outputs for these processes. The XML Pipeline processor includes support for the following XDK components:

XML parser
XML compressor
XML Schema validator
XSLT processor

Example of a Pipeline Document

The XML Pipeline processor executes a sequence of XML processing according to the rules in the pipeline document and returns a result. Example 9-1 shows pipedoc.xml, which is a sample pipeline document included in the demo directory.

Example 9-1 pipedoc.xml

<pipeline xmlns="http://www.w3.org/2002/02/xml-pipeline"
          xml:base="http://example.org/">
 
  <param name="target" select="myresult.html"/>
 
  <processdef name="domparser.p" 
   definition="oracle.xml.pipeline.processes.DOMParserProcess"/>
  <processdef name="xslstylesheet.p"  
   definition="oracle.xml.pipeline.processes.XSLStylesheetProcess"/>
  <processdef name="xslprocess.p" 
   definition="oracle.xml.pipeline.processes.XSLProcess"/>
 
   <process id="p2" type="xslstylesheet.p" ignore-errors="false">
     <input name="xsl" label="book.xsl"/>
     <outparam name="stylesheet" label="xslstyle"/>
   </process>
 
   <process id="p3" type="xslprocess.p" ignore-errors="false">
     <param name="stylesheet" label="xslstyle"/>
     <input name="document" label="xmldoc"/>
     <output name="result" label="myresult.html"/>
   </process>
 
  <process id="p1" type="domparser.p" ignore-errors="true">
     <input name="xmlsource" label="book.xml "/>
     <output name="dom" label="xmldoc"/>
     <param name="preserveWhitespace" select="true"></param>
     <error name="dom">
       <html xmlns="http://www/w3/org/1999/xhtml">
         <head>
            <title>DOMParser Failure!</title>
         </head>
         <body>
           <h1>Error parsing document</h1>
         </body>
       </html>
     </error>
  </process>
 
</pipeline>

Processes Specified in the Pipeline Document

In Example 9-1, three processes are called and associated with Java classes in the oracle.xml.pipeline.processes package. The pipeline document uses the <processdef/> element to make the following associations:

domparser.p is associated with the DOMParserProcess class
xslstylesheet.p is associated with the XSLStylesheetProcess class
xslprocess.p is associated with the XSLProcess class

Processing Architecture Specified in the Pipeline Document

The PipelineSample program accepts the pipedoc.xml document shown in Example 9-1 as input along with XML documents book.xml and book.xsl. The basic design of the pipeline is as follows:

Parse the incoming book.xml document and generate a DOM tree. This task is performed by DOMParserProcess.
Parse book.xsl as a stream and generate an XSLStylesheet object. This task is performed by XSLStylesheetProcess.
Receive the DOM of book.xml as input, apply the stylesheet object, and write the result to myresult.html. This task is performed by XSLProcess.

Note the following aspects of the processing architecture used in the pipeline document:

The target information set, http://example.org/myresult.html, is inferred from the default value of the target parameter and the xml:base setting.
The process p2 has an input of book.xsl and an output parameter with the label xslstyle, so it has to run to produce the input for p3.
The p3 process depends on input parameter xslstyle and document xmldoc.
The p3 process has an output parameter with the label http://example.org/myresult.html, so it has to run to produce the target.
The process p1 depends on input document book.xml and outputs xmldoc, so it has to run to produce the input for p3.

In Example 9-1, more than one order of processing can satisfy all of the dependencies. Given the rules, the XML Pipeline processor must process p3 last but can process p1 and p2 in either order or process them in parallel.

Writing a Pipeline Processor Application

The PipelineSample.java source file illustrates a basic pipeline application. You can use the application with any of the pipeline documents in Table 9-4 to parse and transform an input XML document.

The basic steps of the program are as follows:

Perform the initial setup. The program declares references of type FileReader (for the input XML file), PipelineDoc (for the input pipeline document), and PipelineProcessor (for the processor). The first argument is the pipeline document, which is required. If a second argument is received, then it is stored in the logname String. The following code fragment illustrates this technique:
```
public static void main(String[] args)
{
  FileReader f;
  PipelineDoc pipe;
  PipelineProcessor proc;
 
  if (args.length < 1)
  {
    System.out.println("First argument needed, other arguments are ".
                       "optional:");
    System.out.println("pipedoc.xml <output_log> <'seq'>");
    return;
  }
  if (args.length > 1)
    logname = args[1];
  ...
```
Create a FileReader object by passing the first command-line argument to the constructor as the filename. For example:
```
f = new FileReader(args[0]);
```
Create a PipelineDoc object by passing the reference to the FileReader object. The following example casts the FileReader to a Reader and specifies no validation:
```
pipe = new PipelineDoc((Reader)f, false);
```
Instantiate an XML Pipeline processor. The following statement instantiates the pipeline processor:
```
proc = new PipelineProcessor();
```
Set the processor to the pipeline document. For example:
```
proc.setPipelineDoc(pipe);
```
Set the execution mode for the processor and perform any other configuration. The following code fragment uses a condition to determine the execution mode. If three or more arguments are passed to the program, then it sets the mode to sequential or parallel depending on which argument is passed. For example:
```
String execMode = null;
if (args.length > 2)
{
   execMode = args[2];
   if(execMode.startsWith("seq"))
      proc.setExecutionMode(PipelineConstants.PIPELINE_SEQUENTIAL);
   else if (execMode.startsWith("para"))
      proc.setExecutionMode(PipelineConstants.PIPELINE_PARALLEL);
}
```
Instantiate an error handler. The error handler must implement the PipelineErrorHandler interface. The program uses the PipelineSampleErrHdler shown in PipelineSampleErrHdlr.java. The following code fragment illustrates this technique:
```
errHandler = new PipelineSampleErrHdlr(logname);
```
Set the error handler for the processor by invoking setErrorHandler(). The following statement illustrates this technique:
```
proc.setErrorHandler(errHandler);
```
Execute the pipeline. The following statement illustrates this technique:
```
proc.executePipeline();
```
See Also:
Oracle Database XML Java API Reference to learn about the oracle.xml.pipeline subpackages

Writing a Pipeline Error Handler

An application calling the XML Pipeline processor must implement the PipelineErrorHandler interface to handle errors received from the processor. Set the error handler in the processor by calling setErrorHandler(). When writing the error handler, you can choose to throw an exception for different types of errors.

The oracle.xml.pipeline.controller.PipelineErrorHandler interface declares the methods shown in Table 9-6, all of which return void.

Table 9-6 PipelineErrorHandler Methods

Method	Description
`error(java.lang.String msg, PipelineException e)`	Handles `PipelineException` errors.
`fatalError(java.lang.String msg, PipelineException e)`	Handles fatal `PipelineException` errors.
`warning(java.lang.String msg, PipelineException e)`	Handles `PipelineException` warnings.
`info(java.lang.String msg)`	Prints optional, additional information about errors.

The first three methods in Table 9-6 receive a reference to an oracle.xml.pipeline.controller.PipelineException object. The following methods of the PipelineException class are especially useful:

getExceptionType(), which obtains the type of exception thrown
getProcessId(), which obtains the process ID where the exception occurred
getMessage(), which returns the message string of this Throwable error

The PipelineSampleErrHdler.java source file implements a basic error handler for use with the PipelineSample program. The basic steps are as follows:

Implement a constructor. The constructor accepts the name of a log file and wraps it in a FileWriter object as follows:
```
PipelineSampleErrHdlr(String logFile) throws IOException
{
  log = new PrintWriter(new FileWriter(logFile));
}
```

Implement the error() method. This implementation prints the process ID, exception type, and error message. It also increments a variable holding the error count. For example:

public void error (String msg, PipelineException e) throws Exception
{
  log.println("\nError in: " + e.getProcessId());
  log.println("Type: " + e.getExceptionType());
  log.println("Message: " +  e.getMessage());
  log.println("Error message: " + msg);
  log.flush();
  errCount++;
}

Implement the fatalError() method. This implementation follows the pattern of error(). For example:

public void fatalError (String msg, PipelineException e) throws Exception
{
  log.println("\nFatalError in: " + e.getProcessId());
  log.println("Type: " + e.getExceptionType());
  log.println("Message: " +  e.getMessage());
  log.println("Error message: " + msg);
  log.flush();
  errCount++;
}

Implement the warning() method. This implementation follows the basic pattern of error() except it increments the warnCount variable rather than the errCount variable. For example:

public void warning (String msg, PipelineException e) throws Exception
{
  log.println("\nWarning in: " + e.getProcessId());
  log.println("Message: " +  e.getMessage());
  log.println("Error message: " + msg);
  log.flush();
  warnCount++;
}

Implement the info() method. Unlike the preceding methods, this method does not receive a PipelineException reference as input. The following implementation prints the String received by the method and increments the value of the warnCount variable:
```
public void info (String msg)
{
  log.println("\nInfo : " + msg);
  log.flush();
  warnCount++;   
}
```
Implement a method to close the PrintWriter. The following code implements the method closeLog(), which prints the number of errors and warnings and calls PrintWriter.close():
```
public void closeLog()
{
  log.println("\nTotal Errors: " + errCount + "\nTotal Warnings: " +
               warnCount);
  log.flush();
  log.close();
}
```
See Also:
Oracle Database XML Java API Reference to learn about the PipelineErrorHandler interface and the PipelineException class