This chapter discusses global application development in a PHP and Oracle Database environment. It addresses the basic tasks associated with developing and deploying global Internet applications, including developing locale awareness, constructing HTML content in the user-preferred language, and presenting data following the cultural conventions of the locale of the user.
Building a global Internet application that supports different locales requires good development practices. A locale refers to a national language and the region in which the language is spoken. The application itself must be aware of the locale preference of the user and be able to present content following the cultural conventions expected by the user. It is important to present data with appropriate locale characteristics, such as the correct date and number formats. Oracle Database is fully internationalized to provide a global platform for developing and deploying global applications.
This chapter contains the following topics:
Correctly setting up the connectivity between the PHP engine and the Oracle database is the first step in building a global application. It guarantees data integrity across all tiers. Most internet based standards support Unicode as a character encoding. In this chapter we will focus on using Unicode as the character set for data exchange.
PHP uses Oracle's C language OCI interface, and rules that apply to OCI also apply to PHP. Oracle locale behavior (including the client character set used in OCI applications) is defined by the NLS_LANG
environment variable. This environment variable has the form:
<language>_<territory>.<character set>
For example, for a Portuguese user in Brazil running an application in Unicode, NLS_LANG
should be set to
BRAZILIAN PORTUGUESE_BRAZIL.AL32UTF8
The language and territory settings control Oracle behaviors such as the Oracle date format, error message language, and the rules used for sort order. The character set AL32UTF8 is the Oracle name for UTF-8.
For information on the NLS_LANG
environment variable, see the Oracle Database installation guides.
When PHP is installed on Oracle Linux's Apache, you can set NLS_LANG
in /etc/sysconfig/httpd:
export NLS_LANG='BRAZILIAN PORTUGUESE_BRAZIL.AL32UTF8'
You must restart the Web listener to implement the change.
PHP was designed to work with the ISO-8859-1 character set. To handle other character sets, specifically multibyte character sets, a set of "MultiByte String Functions" is available. To enable these functions, you must enable PHP's mbstring
extension.
Your application code should use functions such as mb_strlen()
to calculate the number of characters in strings. This may return different values than strlen()
, which returns the number of bytes in a string.
Once you have enabled the mbstring extension and restarted the Web server, several configuration options become available. You can change the behavior of the standard PHP string functions by setting mbstring.func_overload
to one of the "Overload" settings.
For more information, see the PHP mbstring reference manual at
The PHP intl
extension which wraps the ICU library is also popular for manipulating strings, see
In a global environment, your application should accommodate users with different locale preferences. Once it has determined the preferred locale of the user, the application should construct HTML content in the language of the locale and follow the cultural conventions implied by the locale.
A common method to determine the locale of a user is from the default ISO locale setting of the browser. Usually a browser sends its locale preference setting to the HTTP server with the Accept Language HTTP header. If the Accept Language header is NULL, then there is no locale preference information available, and the application should fall back to a predefined default locale.
The following PHP code retrieves the ISO locale from the Accept-Language HTTP header through the $_SERVER
Server variable.
$s = $_SERVER["HTTP_ACCEPT_LANGUAGE"]
Once the locale preference of the user has been determined, the application can call locale-sensitive functions, such as date, time, and monetary formatting to format the HTML pages according to the cultural conventions of the locale.
When you write global applications implemented in different programming environments, you should enable the synchronization of user locale settings between the different environments. For example, PHP applications that call PL/SQL procedures should map the ISO locales to the corresponding NLS_LANGUAGE
and NLS_TERRITORY
values and change the parameter values to match the locale of the user before calling the PL/SQL procedures. The PL/SQL UTL_I18N package contains mapping functions that can map between ISO and Oracle locales.
Table 14-1 shows how some commonly used locales are defined in ISO and Oracle environments.
Table 14-1 Locale Representations in ISO, SQL, and PL/SQL Programming Environments
Locale | Locale ID | NLS_LANGUAGE | NLS_TERRITORY |
---|---|---|---|
Chinese (P.R.C.) |
zh-CN |
SIMPLIFIED CHINESE |
CHINA |
Chinese (Taiwan) |
zh-TW |
TRADITIONAL CHINESE |
TAIWAN |
English (U.S.A) |
en-US |
AMERICAN |
AMERICA |
English (United Kingdom) |
en-GB |
ENGLISH |
UNITED KINGDOM |
French (Canada) |
fr-CA |
CANADIAN FRENCH |
CANADA |
French (France) |
fr-FR |
FRENCH |
FRANCE |
German |
de |
GERMAN |
GERMANY |
Italian |
it |
ITALIAN |
ITALY |
Japanese |
ja |
JAPANESE |
JAPAN |
Korean |
ko |
KOREAN |
KOREA |
Portuguese (Brazil) |
pt-BR |
BRAZILIAN PORTUGUESE |
BRAZIL |
Portuguese |
pt |
PORTUGUESE |
PORTUGAL |
Spanish |
es |
SPANISH |
SPAIN |
The encoding of an HTML page is important information for a browser and an Internet application. You can think of the page encoding as the character set used for the locale that an Internet application is serving. The browser must know about the page encoding so that it can use the correct fonts and character set mapping tables to display the HTML pages. Internet applications must know about the HTML page encoding so they can process input data from an HTML form.
Instead of using different native encodings for the different locales, Oracle recommends that you use UTF-8 (Unicode encoding) for all page encodings. This encoding not only simplifies the coding for global applications, but it also enables multilingual content on a single page.
You can specify the encoding of an HTML page either in the HTTP header, or in HTML page header.
To specify HTML page encoding in the HTTP header, include the Content-Type HTTP header in the HTTP specification. It specifies the content type and character set. The Content-Type HTTP header has the following form:
Content-Type: text/html; charset=utf-8
The charset parameter specifies the encoding for the HTML page. The possible values for the charset parameter are the IANA names for the character encodings that the browser supports.
Use this method primarily for static HTML pages. To specify HTML page encoding in the HTML page header, specify the character encoding in the HTML header as follows:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
The charset parameter specifies the encoding for the HTML page. As with the Content-Type HTTP Header, the possible values for the charset parameter are the IANA names for the character encodings that the browser supports.
You can specify the encoding of an HTML page in the Content-Type HTTP header by setting the PHP configuration variable as follows:
default_charset = UTF-8
This setting does not imply any conversion of outgoing pages. Your application must ensure that the server-generated pages are encoded in UTF-8.
Making the user interface available in the local language of the user is a fundamental task in globalizing an application. Translatable sources for the content of an HTML page belong to the following categories:
Text strings included in the application code
Static HTML files, image files, and template files such as CSS
Dynamic data stored in the database
You should externalize translatable strings within your PHP application logic so that the text is readily available for translation. These text messages can be stored in flat files or database tables depending on the type and the volume of the data being translated.
Static files such as HTML files are readily translatable. When these files are translated, they should be translated into the corresponding language with UTF-8 as the file encoding. To differentiate the languages of the translated files, stage the static files of different languages in different directories or with different file names.
Dynamic information such as product names and product descriptions is typically stored in the database. To differentiate various translations, the database schema holding this information should include a column to indicate the language. To select the desired language, you must include a WHERE
clause in your query.
Data in the application must be presented in a way that conforms to the expectation of the user. Otherwise, the meaning of the data can be misinterpreted. For example, the date '12/11/05' implies '11th December 2005' in the United States, whereas in the United Kingdom it means '12th November 2005'. Similar confusion exists for number and monetary formats of the users. For example, the symbol '.' is a decimal separator in the United States; in Germany this symbol is a thousands separator.
Different languages have their own sorting rules. Some languages are collated according to the letter sequence in the alphabet, some according to the number of stroke counts in the letter, and some languages are ordered by the pronunciation of the words. Presenting data not sorted in the linguistic sequence that your users are accustomed to can make searching for information difficult and time consuming.
Depending on the application logic and the volume of data retrieved from the database, it may be more appropriate to format the data at the database level rather than at the application level. Oracle Database offers many features that help refine the presentation of data when the locale preference of the user is known. The following sections provide examples of locale-sensitive operations in SQL.
The three different date presentation formats in Oracle Database are standard, short, and long dates. The following examples illustrate the differences between the short date and long date formats for both the United States and Portuguese users in Brazil.
SQL> alter session set nls_territory=america nls_language=american; Session altered. SQL> select employee_id EmpID, 2 substr(first_name,1,1)||'.'||last_name "EmpName", 3 to_char(hire_date,'DS') "Hiredate", 4 to_char(hire_date,'DL') "Long HireDate" 5 from employees 6* where employee_id <105; EMPID EmpName Hiredate Long HireDate ---------- --------------------------- ---------- ----------------------------- 100 S.King 06/17/1987 Wednesday, June 17, 1987 101 N.Kochhar 09/21/1989 Thursday, September 21, 1989 102 L.De Haan 01/13/1993 Wednesday, January 13, 1993 103 A.Hunold 01/03/1990 Wednesday, January 3, 1990 104 B.Ernst 05/21/1991 Tuesday, May 21, 1991
SQL> alter session set nls_language = 'BRAZILIAN PORTUGUESE' nls_territory = 'BRAZIL'; Sessão alterada. SQL> select employee_id EmpID, 2 substr(first_name,1,1)||'.'||last_name "EmpName", 3 to_char(hire_date,'DS') "Hiredate", 4 to_char(hire_date,'DL') "Long HireDate" 5 from employees 6* where employee_id <105; EMPID EmpName Hiredate Long HireDate ----- -------- --------- ------------------------------- 100 S.King 17/6/2003 terça-feira, 17 de junho de 2003 101 N.Kochhar 21/9/2005 quarta-feira, 21 de setembro de 2005 102 L.De Haan 13/1/2001 sábado, 13 de janeiro de 2001 103 A.Hunold 3/1/2006 terça-feira, 3 de janeiro de 2006 104 B.Ernst 21/5/2007 segunda-feira, 21 de maio de 2007
The following examples illustrate the differences in the decimal character and group separator between the United States and Portuguese users in Brazil.
SQL> alter session set nls_territory=america; Session altered. SQL> select employee_id EmpID, 2 substr(first_name,1,1)||'.'||last_name "EmpName", 3 to_char(salary, '99G999D99') "Salary" 4 from employees 5* where employee_id <105 EMPID EmpName Salary ---------- --------------------------- ---------- 100 S.King 24,000.00 101 N.Kochhar 17,000.00 102 L.De Haan 17,000.00 103 A.Hunold 9,000.00 104 B.Ernst 6,000.00 SQL> alter session set nls_territory=brazil; Session altered. SQL> select employee_id EmpID, 2 substr(first_name,1,1)||'.'||last_name "EmpName", 3 to_char(salary, '99G999D99') "Salary" 4 from employees 5* where employee_id <105 EMPID EmpName Salary ---------- --------------------------- ---------- 100 S.King 24.000,00 101 N.Kochhar 17.000,00 102 L.De Haan 17.000,00 103 A.Hunold 9.000,00 104 B.Ernst 6.000,00
Note:
If the decimal and thousands separators used by Oracle are not '.' and ',' respectively, then you may see PHP errors when doing arithmetic on returned data values. This is because PHP will not correctly convert a string variable containing digits into an integer or float variable if the separators cannot be parsed in PHP style. To avoid this problem you can set the format explicitly with:alter session set nls_numeric_characters = '.,'
Spain traditionally treats ch, ll and ñ as unique letters, ordered after c, l and n respectively. The following examples illustrate the effect of using a Spanish sort against the employee names Chen and Chung.
SQL> alter session set nls_sort=binary; Session altered. SQL> select employee_id EmpID, 2 last_name "Last Name" 3 from employees 4 where last_name like 'C%' 5* order by last_name EMPID Last Name ---------- ------------------------- 187 Cabrio 148 Cambrault 154 Cambrault 110 Chen 188 Chung 119 Colmenares 6 rows selected. SQL> alter session set nls_sort=spanish_m; Session altered. SQL> select employee_id EmpID, 2 last_name "Last Name" 3 from employees 4 where last_name like 'C%' 5* order by last_name EMPID Last Name ---------- ------------------------- 187 Cabrio 148 Cambrault 154 Cambrault 119 Colmenares 110 Chen 188 Chung 6 rows selected.
The NLS_LANGUAGE
parameter also controls the language of the database error messages being returned from the database. Setting this parameter before submitting your SQL statement ensures that the language-specific database error messages will be returned to the application.
Consider the following server message:
ORA-00942: table or view does not exist
When the NLS_LANGUAGE
parameter is set to BRAZILIAN PORTUGUESE
, the server message appears as follows:
ORA-00942: a tabela ou view não existe
For more discussion of globalization support features in Oracle Database, see "Working in a Global Environment" in Oracle Database Express Edition 2 Day Developer's Guide.