Monday, 10 September 2012

Introduction to DataSource



JDBC:- Java DataBase Connectivity is an Java programming Interface to access Data from a Java program.
The JDBC API is comprised of two packages:
- The java.sql package (the JDBC core API)
- The javax.sql package (the JDBC Standard Extension API)

JDBC provides two easy ways to connect Java Application to DB(filestore or real DataBase), lets' understand very basic part:-

(a) DriverManager:- DriverManager class is a very simple way of making an connection to DB from your Application, as its name suggest it manages the list of Drivers(provided by vendor) which can be used to make connection which is done implicitly. We will not go very deep here but a simple code extract of simple program generally used to connect if DriverManager is being used.

Class.forName("jdbc.odbc.JdbcOdbcDriver");

In above statement we are loading the Driver Class which implicitly registers the driver with the DriverManager Class. Then you are only left with getting an connection from DB. That's very simple, now when we have the Driver registered with DriverManager, use its getConnection() method to make a connection to the DB. Yes, you are right, Driver and DriverManager class have static sections where driver class registers drivers with DriverManager and DM contains the static methods too(e.g., getConnection(paratmeter1..2.3.)).

Connection con = DriverManager.getConnection(db_url,"user_name","password");

(b) DataSource:- JDBC API provides the DataSource Interface as an alternative to the DriverManager for establishing the connection with the DB. Think of DataSource as an object of real DB for making the connection. A data source represents a real-world DB. When a data source object has been registered with a JNDI naming service, an application can retrieve it from the naming service and use it to make a connection to the data source it represents. Information about the data source and how to locate it, such as its name, the server on which it resides, its port number, and so on, is stored in the form of properties on the DataSource object. This makes an application more portable because it does not need to hardcode a driver name, which often includes the name of a particular vendor. It also helps in maintaining the code easily, for example, the data source is moved to a different server, all that needs to be done is to update the relevant property in the data source file. None of the code using that data source needs to be touched. Once a data source has been registered with an application servers JNDI name space, application programmers can use it to make a connection to the data source it represents.

DataSource came up with major new implementations:-

(i) Portability by using JNDI:- Datasource properties(DB url, db hostname, port) aren't required to be hardcoded into the code. Properties of a DataSource can be kept separately. Any changes to the data source or database drivers are made in the configuration file. In case of a DriverManager, these properties are hard coded in the application and for any changes we must recompile the code.
Creating DataSource involves registering the DS name with JNDI(Java Naming and Directory Interface) and a unique name is bound to it. Lets name it 'myds'. From application side we have up to lookup for jndi and create an object out of it. It can be done as mentioned below:-

Lets go through simple example now-

InitialContext ic = new InitialContext();
DataSource ds = (DataSource)ic.lookup("jdbc/myds");
Connection con = ds.getConnection();

InitialContext(javax.naming.InitialContext) is the class that implements the context Interface and extends the object class. It provides the starting point for resolutions of names that are binded to objects. Now suppose we are using WebLogic/WebSphere as middleware technology and our DS is registered with naming service they provide, then following code will help our InitialContext object to fetch the starting point.

For WebSphere:-

Hashtable env = new Hashtable();
env.put(context.INITIAL_CONTEXT_FACTORY, "com.ibm.Webphere.naming.WsnInitialContextFactory");
InitialContext ic = new InitialContext(env);
DataSource ds = (DataSource)ic.lookup("jdbc/myds");
Connection con = ds.getConnection();

For Weblogic:-

Hashtable env = new Hashtable();
env.put(context.INITIAL_CONTEXT_FACTORY, "weblogic.jndi.WLInitialContext");
InitialContext ic = new InitialContext(env);
DataSource ds = (DataSource)ic.lookup("jdbc/myds");
Connection con = ds.getConnection();


Hashtable is well suited here as it stores the objects as key-value pair and is thread safe.

(ii) Connection Pooling :-
Connection pool is the pool of the real time database objects ready to make connection. Its advantages, as usually no developer wants his code to make new connection every time. This helps in decreasing the response time of the application.

Not only this, Connection pool comes with several detailed configuration using which one can enhance the way the application responds. For example, maximum number of connections to be available in the pool, reap time, purging of stale connections, recycle the connections and identifying old objects in the pool. All these configuration and settings are based on different Application server.

The connection will usually be a pooled connection. In other words, once the application closes the connection, the connection is returned to a connection pool, rather than being destroyed.

And if you come across term JDBC provider(usually in WebSphere), then by configuring it we are providing information about the set of classes used to implement the data source and the database driver. We are providing the environment settings for the DataSource object.

The programming model for accessing a data source is as follows:

1. An application retrieves a DataSource object from the JNDI naming space.

2. After the DataSource object is obtained, the application code calls getConnection() on the data source to get a Connection object. The connection is obtained from a pool of connections.

3. Once the connection is acquired, the application sends SQL queries or updates to the database.

(iii) Distributed Transaction:-

Provide interface for your application to have smart way to ensure your distributed/global transactions.

XA and Non-XA Transaction:-

Non-XA Transaction can be referred to as single transaction that involves least resources for transaction to complete, example can be if you are adding an name to single table of single DB. Whereas XA Transaction is more of an global transaction, which involves distributed commit of data(involves more than one database and other resources).

XA transaction is referred to as Two Phase Commit, which ensures once the transaction completes the data should be in sync with the all resources either after committing or rolling back the change. The two phases are prepare and commit.
In phase 1, the Transaction Manager instructs the resource Manager to prepare for the commit, to ensure that each resource can commit.
Phase 2, depends upon phase 1, if Transaction finds no issue with any resource to commit the change, proceeds with commit in this phase else proceeds with rollback.


XA Transaction involves Transaction Manager's coordination which is a service running in your Application Server(JBoss, WebSphere, WebLogic and many more). Non-XA transaction does not involve transaction manager, if your app have both single and distributed transactions then using XA DataSource will be enough. Smart XA DataSource will take care of the situation and will use 2 Phase commit only when required.