Globalization support in Oracle

This is a concise compilation attempting to explain Oracle’s support of multiple languages. It is geared towards helping a .NET developer understand Oracle’s globalization support and also understand how to program in such an environment. It is not comprehensive but does attempt to answer some basic questions in this area. Oracle data provider for .NET internally uses Oracle’s OCI (Oracle Call Interface) libraries (unless you are using the managed version) and so do many other third-party drivers (DevArt dotConnect for Oracle in native mode, for example). The easiest way to deploy OCI libraries is using the Oracle Instant Client. Hence, it is also important to understand how configure it to support globalization.
Much of the information comes verbatim from Oracle® Database Globalization Support Guide 10g Release 2 (10.2)

1 Overview of Globalization Support

Globalization Support Architecture

Oracle's globalization support enables you to store, process, and retrieve data in native languages. It ensures that database utilities, error messages, sort order, and date, time, monetary, numeric, and calendar conventions automatically adapt to any native language and locale.
In the past, Oracle's globalization support capabilities were referred to as National Language Support (NLS) features. National Language Support is a subset of globalization support. National Language Support is the ability to choose a national language and store data in a specific character set. Globalization support enables you to develop multilingual applications and software products that can be accessed and run from anywhere in the world simultaneously. An application can render content of the user interface and process data in the native users' languages and locale preferences.

Architecture to Support Multilingual Applications

The database is implemented to enable multitier applications and client/server applications to support languages for which the database is configured.
The locale-dependent operations are controlled by several parameters and environment variables on both the client and the database server. On the database server, each session started on behalf of a client may run in the same or a different locale as other sessions, and have the same or different language requirements specified.
The database has a set of session-independent NLS parameters that are specified when the database is created. Two of the parameters specify the database character set and the national character set, an alternate Unicode character set that can be specified for NCHAR, NVARCHAR2, and NCLOB data. The parameters specify the character set that is used to store text data in the database. Other parameters, such as language and territory, are used to evaluate check constraints.
If the client session and the database server specify different character sets, then the database converts character set strings automatically.
When the application connects to a database, a session is created on the server. The new session initializes its NLS environment from NLS instance parameters specified in the initialization parameter file. These settings can be subsequently changed by an ALTER SESSION statement. The statement changes only the session NLS environment. It does not change the local client NLS environment.
Immediately after the connection has been established, if the NLS_LANG environment setting is defined on the client side, then an implicit ALTER SESSION statement synchronizes the client and session NLS environments.

Using Unicode in a Multilingual Database

Unicode is a universal encoded character set that enables you to store information in any language, using a single character set. Unicode provides a unique code value for every character, regardless of the platform, program, or language.

Character Set Support

Oracle supports a large number of single-byte, multibyte, and fixed-width encoding schemes that are based on national, international, and vendor-specific standards.
See Also:

Chapter 2, "Choosing a Character Set"
"Character Sets" for a list of supported character sets

Unicode Support

You can store Unicode characters in an Oracle database in two ways:

You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL CHAR datatypes.
You can support multilingual data in specific columns by using Unicode datatypes. You can store Unicode characters into columns of the SQL NCHAR datatypes regardless of how the database character set has been defined. The NCHAR datatype is an exclusively Unicode datatype.

2 Choosing a Character Set

Choosing an Oracle Database Character Set

Oracle uses the database character set for:

Data stored in SQL CHAR datatypes (CHAR, VARCHAR2, CLOB, and LONG)
Identifiers such as table names, column names, and PL/SQL variables
Entering and storing SQL and PL/SQL source code

The character encoding scheme used by the database is defined as part of the CREATE DATABASE statement. All SQL CHAR datatype columns (CHAR, CLOB,VARCHAR2, and LONG), including columns in the data dictionary, have their data stored in the database character set. In addition, the choice of database character set determines which characters can name objects in the database. SQL NCHAR datatype columns (NCHAR, NCLOB, and NVARCHAR2) use the national character set.
After the database is created, you cannot change the character sets, with some exceptions, without re-creating the database.

Client Operating System and Application Compatibility

The database character set is independent of the operating system because Oracle has its own globalization architecture. For example, on an English Windows operating system, you can create and run a database with a Japanese character set. However, when an application in the client operating system accesses the database, the client operating system must be able to support the database character set with appropriate fonts and input methods. For example, you cannot insert or retrieve Japanese data on the English Windows operating system without first installing a Japanese font and input method. Another way to insert and retrieve Japanese data is to use a Japanese operating system remotely to access the database server.

Character Set Conversion Between Clients and the Server

If you choo se a database character set that is different from the character set on the client operating system, then the Oracle database can convert the operating system character set to the database character set. Character set conversion has the following disadvantages:

Potential data loss
Increased overhead

Character set conversions can sometimes cause data loss. For example, if you are converting from character set A to character set B, then the destination character set B must have the same character set repertoire as A. Any characters that are not available in character set B are converted to a replacement character. The replacement character is often specified as a question mark or as a linguistically related character.
If all client applications use the same character set, then that character set is usually the best choice for the database character set. When client applications use different character sets, the database character set should be a superset of all the client character sets. This ensures that every character is represented when converting from a client character set to the database character set.

Restrictions on Character Sets Used to Express Names

Table 2-5 lists the restrictions on the character sets that can be used to express names.

Table 2-5 Restrictions on Character Sets Used to Express Names

Name	Single-Byte	Variable Width	Comments
Column names	Yes	Yes	-
Schema objects	Yes	Yes	-
Comments	Yes	Yes	-
Database link names	Yes	No	-
Database names	Yes	No	-
File names (datafile, log file, control file, initialization parameter file)	Yes	No	-
Instance names	Yes	No	-
Directory names	Yes	No	-
Keywords	Yes	No	Can be expressed in English ASCII or EBCDIC characters only

Choosing a National Character Set

A national character set is an alternate character set that enables you to store Unicode character data in a database that does not have a Unicode database character set.
SQL NCHAR, NVARCHAR2, and NCLOB datatypes have been redefined to support Unicode data only. You can use either the UTF8 or the AL 16UTF16 character set. The default is AL16UTF16.

Character Set Conversion in a Monolingual Scenario

Character set conversion may be required in a client/server environment if a client application resides on a different platform than the server and if the platforms do not use the same character encoding schemes. Character data passed between client and server must be converted between the two encoding schemes. Character conversion occurs automatically and transparently through Oracle Net.
You can convert between any two character sets

Restricted Multilingual Support

Some character sets support multiple languages because they have related writing systems or scripts. For example, the WE8ISO8859P1 Oracle character set supports the following Western European languages:

Catalan

Danish

Dutch

English

Finnish

French

German

Icelandic

Italian

Norwegian

Portuguese

Spanish

Swedish

These languages all use a Latin-based writing script.

Unrestricted Multilingual Support

If you need unrestricted multilingual support, then use a universal character set such as Unicode for the server database character set. Unicode has two major encoding schemes:

UTF-16: Each character is either 2 or 4 bytes long.
UTF-8: Each character takes 1 to 4 bytes to store.

The database provides support for UTF-8 as a database character set and both UTF-8 and UTF-16 as national character sets.
AL32UTF8 Oracle character set is based on the Unicode UTF-8 character set.

3 Setting Up a Globalization Support Environment

NLS (National Language Support) parameters determine the locale-specific behavior on both the client and the server. NLS parameters can be specified in the following ways:

As initialization parameters on the server
You can include parameters in the initialization parameter file to specify a default session NLS environment. These settings have no effect on the client side; they control only the server's behavior. For example:
```
NLS_TERRITORY = "CZECH REPUBLIC"
```
As environment variables on the client
You can use NLS environment variables, which may be platform-dependent, to specify locale-dependent behavior for the client and also to override the default values set for the session in the initialization parameter file. For example, on a UNIX system:
```
% setenv NLS_SORT FRENCH
```
With the ALTER SESSION statement
You can use NLS parameters that are set in an ALTER SESSION statement to override the default values that are set for the session in the initialization parameter file or set by the client with environment variables.
```
ALTER SESSION SET NLS_SORT = FRENCH;
```
See Also:
Oracle Database SQL Reference for more information about the ALTER SESSION statement
In SQL functions
You can use NLS parameters explicitly to hardcode NLS behavior within a SQL function. This practice overrides the default values that are set for the session in the initialization parameter file, set for the client with environment variables, or set for the session by the ALTER SESSION statement. For example:
```
TO_CHAR(hiredate, 'DD/MON/YYYY', 'nls_date_language = FRENCH') 
```

Table 3-1 shows the precedence order of the different methods of setting NLS parameters. Higher priority settings override lower priority settings. For example, a default value has the lowest priority and can be overridden by any other method.

Table 3-1 Methods of Setting NLS Parameters and Their Priorities

Priority	Method
1 (highest)	Explicitly set in SQL functions
2	Set by an `ALTER` `SESSION` statement
3	Set as an environment variable
4	Specified in the initialization parameter file
5	Default

Table 3-2 lists the available NLS parameters. Because the SQL function NLS parameters can be specified only with specific functions, the table does not show the SQL function scope.

Table 3-2 NLS Parameters

Parameter	Description	Default	Scope:I = Initialization Parameter File E = Environment Variable A = ALTER SESSION
NLS_CALENDAR	Calendar system	Gregorian	I, E, A
NLS_COMP	SQL, PL/SQL operator comparison	`BINARY`	I, E, A
NLS_CREDIT	Credit accounting symbol	Derived from`NLS_TERRITORY`	E
NLS_CURRENCY	Local currency symbol	Derived from`NLS_TERRITORY`	I, E, A
NLS_DATE_FORMAT	Date format	Derived from`NLS_TERRITORY`	I, E, A
NLS_DATE_LANGUAGE	Language for day and month names	Derived from`NLS_LANGUAGE`	I, E, A
NLS_DEBIT	Debit accounting symbol	Derived from`NLS_TERRITORY`	E
NLS_ISO_CURRENCY	ISO international currency symbol	Derived from`NLS_TERRITORY`	I, E, A
`NLS_LANG` See Also:Â "Choosing a Locale with the NLS_LANG Environment Variable"	Language, territory, character set	`AMERICAN_AMERICA. US7ASCII`	E
NLS_LANGUAGE	Language	Derived from`NLS_LANG`	I, A
NLS_LENGTH_SEMANTICS	How strings are treated	`BYTE`	I, E, A
NLS_LIST_SEPARATOR	Character that separates items in a list	Derived from`NLS_TERRITORY`	E
NLS_MONETARY_CHARACTERS	Monetary symbol for dollar and cents (or their equivalents)	Derived from`NLS_TERRITORY`	E
NLS_NCHAR_CONV_EXCP	Reports data loss during a character type conversion	`FALSE`	I, A
NLS_NUMERIC_CHARACTERS	Decimal character and group separator	Derived from`NLS_TERRITORY`	I, E, A
NLS_SORT	Character sort sequence	Derived from`NLS_LANGUAGE`	I, E, A
NLS_TERRITORY	Territory	Derived from`NLS_LANG`	I, A
NLS_TIMESTAMP_FORMAT	Timestamp	Derived from`NLS_TERRITORY`	I, E, A
NLS_TIMESTAMP_TZ_FORMAT	Timestamp with time zone	Derived from`NLS_TERRITORY`	I, E, A
NLS_DUAL_CURRENCY	Dual currency symbol	Derived from`NLS_TERRITORY`	I, E, A

Choosing a Locale with the NLS_LANG Environment Variable

A locale is a linguistic and cultural environment in which a system or program is running. Setting the NLS_LANG environment parameter is the simplest way to specify locale behavior for Oracle software. It sets the language and territory used by the client application and the database server. It also sets the client's character set, which is the character set for data entered or displayed by a client program.
NLS_LANG is set as an environment variable on UNIX platforms. NLS_LANG is set in the registry on Windows platforms.
The NLS_LANG parameter has three components: language, territory, and character set. Specify it in the following format, including the punctuation:

NLS_LANG = language_territory.charset

For example, if the Oracle Installer does not populate NLS_LANG, then its value by default is AMERICAN_AMERICA.US7ASCII. The language is AMERICAN, the territory is AMERICA, and the character set is US7ASCII. The values in NLS_LANG and other NLS parameters are case-insensitive.
Each component of the NLS_LANG parameter controls the operation of a subset of globalization support features:

language
Specifies conventions such as the language used for Oracle messages, sorting, day names, and month names. Each supported language has a unique name; for example, AMERICAN, FRENCH, or GERMAN. The language argument specifies default values for the territory and character set arguments. If the language is not specified, then the value defaults to AMERICAN.
territory
Specifies conventions such as the default date, monetary, and numeric formats. Each supported territory has a unique name; for example, AMERICA,FRANCE, or CANADA. If the territory is not specified, then the value is derived from the language value.
charset
Specifies the character set used by the client application (normally the Oracle character set that corresponds to the user's terminal character set or the OS character set). Each supported character set has a unique acronym, for example, US7ASCII, WE8ISO8859P1, WE8DEC, WE8MSWIN1252, or JA16EUC. Each language has a default character set associated with it.
Note:
All components of the NLS_LANG definition are optional; any item that is not specified uses its default value. If you specify territory or character set, then you must include the preceding delimiter [underscore (_) for territory, period (.) for character set]. Otherwise, the value is parsed as a language name.
For example, to set only the territory portion of NLS_LANG, use the following format: NLS_LANG=_JAPAN

Specifying the Value of NLS_LANG

In a UNIX operating system C-shell session, you can specify the value of NLS_LANG by entering a statement similar to the following:

% setenv NLS_LANG FRENCH_FRANCE.WE8ISO8859P1

Because NLS_LANG is an environment variable, it is read by the client application at startup time. The client communicates the information defined by NLS_LANG to the server when it connects to the database server.

Overriding Language and Territory Specifications

The NLS_LANG parameter sets the language and territory environment used by both the server session (for example, SQL command execution) and the client application (for example, display formatting in Oracle tools). Using this parameter ensures that the language environments of both the database and the client application are automatically the same.

Should the NLS_LANG Setting Match the Database Character Set?

The NLS_LANG character set should reflect the setting of the operating system character set of the client. For example, if the database character set is AL32UTF8 and the client is running on a Windows operating system, then you should not set AL32UTF8 as the client character set in the NLS_LANG parameter because there are no UTF-8 WIN32 clients. Instead, the NLS_LANG setting should reflect the code page of the client. For example, on an English Windows client, the code page is 1252. An appropriate setting for NLS_LANG is AMERICAN_AMERICA.WE8MSWIN1252.
Setting NLS_LANG correctly allows proper conversion from the client operating system character set to the database character set. When these settings are the same, Oracle assumes that the data being sent or received is encoded in the same character set as the database character set, so character set validation or conversion may not be performed. This can lead to corrupt data if the client code page and the database character set are different and conversions are necessary.

NLS Database Parameters

When a new database is created during the execution of the CREATE DATABASE statement, the NLS-related database configuration is established. The current NLS instance parameters are stored in the data dictionary along with the database and national character sets. The NLS instance parameters are read from the initialization parameter file at instance startup.
You can find the values for NLS parameters by using:

NLS Data Dictionary Views

Applications can check the session, instance, and database NLS parameters by querying the following data dictionary views:

NLS_SESSION_PARAMETERS shows the NLS parameters and their values for the session that is querying the view. It does not show information about the character set.
NLS_INSTANCE_PARAMETERS shows the current NLS instance parameters that have been explicitly set and the values of the NLS instance parameters.
NLS_DATABASE_PARAMETERS shows the values of the NLS parameters for the database. The values are stored in the database.

6 Supporting Multilingual Databases with Unicode

What is Unicode?

Oracle started supporting Unicode as a database character set in Oracle Database 7. In Oracle Database 10g, Unicode support has been expanded. Oracle Database 10g, Release 2 supports Unicode 4.0.

Supplementary Characters

The first version of Unicode was a 16-bit, fixed-width encoding that used two bytes to encode each character. This allowed 65,536 characters to be represented. However, more characters need to be supported, especially additional CJK ideographs that are important for the Chinese, Japanese, and Korean markets.
Unicode 4.0 defines supplementary characters to meet this need. It uses two 16-bit code points (also known as supplementary characters) to represent a single character. This enables an additional 1,048,576 characters to be defined. The Unicode 4.0 standard defines 45,960 supplementary characters.
Adding supplementary characters increases the complexity of Unicode, but it is less complex than managing several different encodings in the same configuration.

Unicode Encodings

The Unicode standard encodes characters in different ways: UTF-8, UCS-2, and UTF-16. Conversion between different Unicode encodings is a simple bit-wise operation that is defined in the Unicode standard.

UTF-8 Encoding

UTF-8 is the 8-bit encoding of Unicode. It is a variable-width encoding and a strict superset of ASCII. This means that each and every character in the ASCII character set is available in UTF-8 with the same code point values. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes.
UTF-8 is the Unicode encoding supported on UNIX platforms and used for HTML and most Internet browsers. Other environments such as Windows and Java use UCS-2 encoding.
The benefits of UTF-8 are as follows:

Compact storage requirement for European scripts because it is a strict superset of ASCII
Ease of migration between ASCII-based characters sets and UTF-8

UCS-2 Encoding

UCS-2 is a fixed-width, 16-bit encoding. Each character is 2 bytes. UCS-2 is the Unicode encoding used by Java and Microsoft Windows NT 4.0. UCS-2 supports characters defined for Unicode 3.0, so there is no support for supplementary characters.
The benefits of UCS-2 over UTF-8 are as follows:

More compact storage for Asian scripts because all characters are two bytes
Faster string processing because characters are fixed-width
Better compatibility with Java and Microsoft clients

UTF-16 Encoding

UTF-16 encoding is the 16-bit encoding of Unicode. UTF-16 is an extension of UCS-2 because it supports the supplementary characters by using two UCS-2 code points for each supplementary character. UTF-16 is a strict superset of UCS-2.
One character can be either 2 bytes or 4 bytes in UTF-16. Characters from European and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. UTF-16 is the main Unicode encoding used by Microsoft Windows 2000.
The benefits of UTF-16 over UTF-8 are as follows:

More compact storage for Asian scripts because most of the commonly used Asian characters are represented in two bytes.
Better compatibility with Java and Microsoft clients

Oracle's Support for Unicode

Oracle started supporting Unicode as a database character set in release 7. Table 6-1 summarizes the Unicode character sets supported by Oracle Database.
Table 6-1 Unicode Character Sets Supported by Oracle Database

Table 6-1 Unicode Character Sets Supported by Oracle Database

Character Set	Supported in RDBMS Release	Unicode Encoding	Unicode Version	Database Character Set	National Character Set
AL24UTFFSS	7.2 - 8i	UTF-8	1.1	Yes	No
UTF8	8.0 - 10g	UTF-8	For Oracle Database release 8.0 through Oracle8i release 8.1.6: 2.1 For Oracle8i Database release 8.1.7 and later: 3.0	Yes	Yes (Oracle9i Database and Oracle Database 10g only)
UTFE	8.0 - 10g	UTF-EBCDIC	For Oracle8i Database releases 8.0 through 8.1.6: 2.1 For Oracle8i Database release 8.1.7 and later: 3.0	Yes	No
AL32UTF8	9i - 10g	UTF-8	Oracle9i Database Release 1: 3.0 Oracle9i Database Release 2: 3.1 Oracle Database 10g, Release 1: 3.2 Oracle Database 10g, Release 2: 4.0	Yes	No
AL16UTF16	9i - 10g	UTF-16	Oracle9i Database Release 1: 3.0 Oracle9i Database Release 2: 3.1 Oracle Database 10g, Release 1: 3.2 Oracle Database 10g, Release 2: 4.0	No	Yes

Implementing a Unicode Solution in the Database

You can store Unicode characters in an Oracle database in two ways.
You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL CHAR datatypes (CHAR, VARCHAR2, CLOB, and LONG).
If you prefer to implement Unicode support incrementally or if you need to support multilingual data only in certain columns, then you can store Unicode data in either the UTF-16 or UTF-8 encoding form in SQL NCHAR datatypes (NCHAR, NVARCHAR2, and NCLOB). The SQL NCHAR datatypes are called Unicode datatypes because they are used only for storing Unicode data.

Enabling Multilingual Support with Unicode Databases

The database character set specifies the encoding to be used in the SQL CHAR datatypes as well as the metadata such as table names, column names, and SQL statements. A Unicode database is a database with a UTF-8 character set as the database character set. There are three Oracle character sets that implement the UTF-8 encoding. The first two are designed for ASCII-based platforms while the third one should be used on EBCDIC platforms.

AL32UTF8
The AL32UTF8 character set supports the latest version of the Unicode standard. It encodes characters in one, two, or three bytes. Supplementary characters require four bytes. It is for ASCII-based platforms.
UTF8
The UTF8 character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms.
The UTF8 character set has supported Unicode 3.0 since Oracle8i release 8.1.7 and will continue to support Unicode 3.0 in future releases of Oracle Database. Although specific supplementary characters were not assigned code points in Unicode until version 3.1, the code point range was allocated for supplementary characters in Unicode 3.0. If supplementary characters are inserted into a UTF8 database, then it does not corrupt the data in the database. The supplementary characters are treated as two separate, user-defined characters that occupy 6 bytes. Oracle recommends that you switch to AL32UTF8 for full support of supplementary characters in the database character set.
UTFE
The UTFE character set is for EBCDIC platforms. It is similar to UTF8 on ASCII platforms, but it encodes characters in one, two, three, and four bytes. Supplementary characters are converted as two 4-byte characters.

7 Programming with Unicode

Overview of Programming with Unicode

Oracle offers several database access products for inserting and retrieving Unicode data. Oracle offers database access products for commonly used programming environments such as Java and C/C++. Data is transparently converted between the database and client programs, which ensures that client programs are independent of the database character set and national character set. In addition, client programs are sometimes even independent of the character datatype, such as NCHAR or CHAR, used in the database.
To avoid overloading the database server with data conversion operations, Oracle always tries to move them to the client side database access products. In a few cases, data must be converted in the database, which affects performance. This chapter discusses details of the data conversion paths.

Database Access Product Stack and Unicode

The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL CHAR and NCHAR datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved.

OCI Programming with Unicode

OCIEnvNlsCreate() Function for Unicode Programming

The OCIEnvNlsCreate() function is used to specify a SQL CHAR character set and a SQL NCHAR character set when the OCI environment is created. It is an enhanced version of the OCIEnvCreate() function and has extended arguments for two character set IDs. The OCI_UTF16ID UTF-16 character set ID replaces the Unicode mode introduced in Oracle9i release 1 (9.0.1).
The Unicode mode, in which the OCI_UTF16 flag is used with the OCIEnvCreate() function, is deprecated.
When OCI_UTF16ID is specified for both SQL CHAR and SQL NCHAR character sets, all metadata and bound and defined data are encoded in UTF-16. Metadata includes SQL statements, user names, error messages, and column names. Thus, all inherited operations are independent of the NLS_LANG setting,….

OCI Unicode Code Conversion

Unicode character set conversions take place between an OCI client and the database server if the client and server character sets are different. The conversion occurs on either the client or the server depending on the circumstances, but usually on the client side.

Setting UTF-8 to the NLS_LANG Character Set in OCI

For OCI client applications that support Unicode UTF-8 encoding, use AL32UTF8 to specify the NLS_LANG character set, unless the database character set is UTF8. Use UTF8 if the database character set is UTF8.

Appendix A - Locale Data

There is a detailed reference information about languages, character sets etc in Appendix A - Locale Data

Some more relevant information about the Instant Client and OCI can be found in the Oracle® Call Interface Programmer's Guide, 10g Release 2 (10.2)

1 Introduction and Upgrading

OCI Instant Client

Table 1-3 OCI Instant Client Shared Libraries

Linux and UNIX	Description for Linux and UNIX	Windows	Description for Windows
`libclntsh.so.10.1`	Client Code Library	`oci.dll`	Forwarding functions that applications link with
`libociei.so`	OCI Instant Client Data Shared Library	`oraociei10.dll`	Data and code
`libnnz10.so`	Security Library	`orannzsbb10.dll`	Security Library

Oracle Database 10g Release 2 library names are used in the table.
To use the Microsoft ODBC and OLEDB driver, ociw32.dll must also be copied from ORACLE_HOME\bin.

OCI Instant Client Installation Process

The installation process is as simple as

Downloading and installing the Instant Client shared libraries to a directory such as instantclient_10_2.
Setting the operating system shared library path environment variable to the directory from step 1. For example, on Linux or UNIX, set the LD_LIBRARY_PATH to instantclient_10_2. On Windows, set PATH to locate the instantclient_10_2 directory.

After completing the preceding two steps you are ready to run the OCI application.
The OCI application operates in Instant Client mode when the OCI shared libraries are accessible through the operating system Library Path variable. In this mode, there is no dependency on ORACLE_HOME and none of the other code and data files provided in ORACLE_HOME are needed by OCI (except for the tnsnames.orafile described later).
For OCCI, the OCCI Library (libocci.so.10.1 on Linux or UNIX and oraocci10.dll on Windows) must also be installed in the Instant Client directory.
for the JDBC OCI driver, in addition to the three OCI shared libraries, you must also download OCI JDBC Library (for example libocijdbc10.so on Linux or UNIX and oraocijdbc10.dll on Windows) and ojdbcXY.jar (where XY is the version number, for example, ojdbc14.jar).

Database Connection Strings for OCI Instant Client

The OCI Instant Client can make remote database connections in all the ways that ordinary SQL clients can. However because the Instant Client does not have theORACLE_HOME environment and directory structure some of the database naming methods will require additional configuration steps.
All Oracle net naming methods that do not require use of ORACLE_HOME or TNS_ADMIN (to locate configuration files such as tnsnames.ora or sqlnet.ora) work in the Instant Client mode. In particular, the connect_identifier in the OCIServerAttach() call can be specified in the following formats:

A SQL Connect URL string of the form:

[//]host[:port][/service name]

such as:

//dlsun242:5521/bjava21

As an Oracle Net connect descriptor. For example:

"(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp) (HOST=dlsun242) (PORT=5521))
(CONNECT_DATA=(SERVICE_NAME=bjava21)))"

2 OCI Programming Basics

OCI Globalization Support

Client Character Set Control from OCI

The function OCIEnvNlsCreate() enables you to set character set information in applications, independently from NLS_LANG and NLS_NCHAR settings.
One application can have several environment handles initialized within the same system environment using different client side character set IDs and national character set IDs.

OCIEnvNlsCreate(OCIEnv **envhp, ..., csid, ncsid);

where csid is the value for character set ID, and ncsid is the value for national character set ID. Either can be 0 or OCI_UTF16ID. If both are 0, this is equivalent to using OCIEnvCreate() instead. The other arguments are the same as for the OCIEnvCreate() call.
OCIEnvNlsCreate() is an enhancement for programmatic control of character sets, because it validates OCI_UTF16ID.
When character set IDs are set through the function OCIEnvNlsCreate(), they will replace the settings in NLS_LANG and NLS_NCHAR. In addition to all character sets supported by NLSRTL, OCI_UTF16ID is also allowed as a character set ID in the OCIEnvNlsCreate() function, although this ID is not valid in NLS_LANG or NLS_NCHAR.
Any Oracle character set ID, except AL16UTF16, can be specified through the OCIEnvNlsCreate() function to specify the encoding of metadata, SQL CHAR data, and SQL NCHAR data.
You can retrieve character sets in NLS_LANG and NLS_NCHAR through another function, OCINlsEnvironmentVariableGet().

I also encourage you to read the previous post related to NLS_LANG and Oracle Instant Client