Software development techniques behind the magic user interface

Multi-Touch Developer Journal

Subscribe to Multi-Touch Developer Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Multi-Touch Developer Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Multi-Touch Authors: Ben Bradley, Qamar Qrsh, Suresh Sambandam, Jayaram Krishnaswamy, Kevin Benedict

Related Topics: MultiTouch Developer Journal, Java EE Journal

Multi-Touch: Article

Failover and Recovery of Enterprise Applications - Part 1

High availability - moving beyond clustering

WebLogic Platform Deployment Architecture
Before diving into specific details of the strategies used to make applications highly available, let us recap the overview of the WebLogic Platform deployment architecture from the documentation. (http://edocs.bea.com).

The basic administrative unit for a WebLogic Server installation is called a domain. A domain is a logically related group of WebLogic Server resources that are managed as a unit. A domain always includes at least one WebLogic Server instance called the administration server. The administration server serves as a central point of contact for server instances and system administration tools. A domain may also include additional WebLogic Server instances called managed servers.

Administrators can configure some or all of these managed servers to be part of a WebLogic Server cluster. A cluster is a group of WebLogic Server instances that work together to provide scalability and high availability for applications. A managed server in a cluster can act as a backup for services such as JMS and JTA that are hosted on another server instance in the cluster. Applications are also deployed and managed as part of a domain.

Failover Scenario
While the details of the failover and recovery procedures would be covered in the next part of this article, a brief overview of what happens when a server instance in the cluster crashes may be useful. Figure 2 represents the possible activities that happen when a WebLogic instance goes down and all of the services are failed over to another available instance.

As indicated in the diagram, one or more of the following operations may be performed to achieve failover if one of the instances goes down.

  • All of the new incoming HTTP requests will be rerouted to the other available servers in the cluster by the Web server plugins. Web server plugins detect that a server instance has faulted and dynamically update their list of available servers. The subsequent requests are then routed to the appropriate server (mostly the secondary server) from among the available servers.
  • All of the new requests for EJB/RMI will be rerouted to the available servers in the cluster by the EJB/RMI client-side cluster-aware stubs. Essentially, the stubs generated by the EJB/RMI compiler are aware of all of the available servers in the cluster. When a request to a server fails, the stub intercepts the exception and, depending on the type of the exception (such as network exceptions), the stub may redirect the request to any other available servers. In the case of stateless session beans the request may be routed to any available server, while in the case of stateful EJB the stub sends the request to the secondary server, which is the location to which the primary server replicates its state.
  • Internal requests for JMS are routed to other available servers in the cluster. In the case of JMS, if the destination is created as distributed destinations and contains physical members on the any of the available servers, the producer of the messages may continue to send messages without any interruption. The consumers, however, may need to reconnect to available members by incorporating logic within the Exception listeners. In the case of an MDB, the container may provide the logic for reconnecting to the destination.
  • The JMS server can administratively migrate to another available server. The migration of the JMS server assists in bleeding messages from the queue that went down with a downed server instance.
  • Any in-flight transactions are handled as per the JTA specifications. Essentially, the administrator can move the transaction logs from the failed instance to another available instance. The Transaction Manager within the application server then attempts to complete those transactions based on the tlog entries.
Highly Available Deployment Strategies
Having read the high-level introduction about a typical failover scenario, let's now discuss the possible high-availability strategies for deploying applications on the WebLogic Platform. It is no secret to architects that the bottom line in achieving high availability is avoiding single point of failure (SPOF). On the face of it, it appears one would always want to avoid SPOF, but in reality there many constraints in the application design, vendor product architecture, and deployment topology that force the infrastructure with one or more single point of failures. The following sections explain what the infrastructure team should do to avoid these in their deployment architecture.

Hardware Load Balancer
One of the first entry points to the application is via the hardware load balancers. Hardware load balancers have gone beyond simple load balancing/distribution of incoming HTTP traffic and now provide sophisticated algorithms to distribute IP traffic more efficiently, and provide a much higher level of fault tolerance. WebLogic clusters can use any of the sophisticated load balancing/failover algorithms supported by the hardware load balancer. Hardware load balancers are generally more fault resilient than Web server plugins. For a detail description of the capabilities of a hardware load balancer, readers should see the vendor-specific documentation and the BEA documentation on configuring hardware load balancers with WebLogic cluster (http://e-docs.bea.com/wls/docs81/ cluster/load_balancing.html#1026240).

Web Server Farm
In some of the conventional Web facing applications, the Web servers front-end the application servers. Web server runs a Web server plugin that redirects/routes the HTTP traffic. While a single Web server can distribute the traffic to multiple back-end application servers, in this case the Web server itself becomes a single point of failure. Therefore, one of the common strategies to avoid this scenario is to create a Web server farm. Typically, the load balancers are configured to maintain a sticky session with the Web server to which the first request from a given client was routed. In addition, the Web server plugin also maintains stickiness to the server that the first request was routed to. The Web server plugin maintains a list of available back-end servers, along with the primary/secondary pair for that particular client. In case of failure of one of the back-end application servers, the Web server plugin routes the request to the appropriate available server. Regardless of which Web server the request gets routed to, the Web server routes the request to the correct application server by inspecting the cookie in the HTTP header.

More Stories By Sudhir Upadhyay

Sudhir Upadhyay is currently with Architecture and Shared services at JP Morgan Chase where he is an application architect. Prior to joining JPMorgan, he was a principal consultant with BEA Professional Services where he helped customers design and implement enterprise J2EE solutions. He is a BEA Certified WebLogic Developer and a Sun Certified Java Developer.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Thirunavukarasu 02/04/06 06:31:00 AM EST

A neatly explained technical atricle. Very useful, very crisp. Thanks .

Viktor Lioutyi 08/12/05 10:34:27 AM EDT

Hi,

How to test failover in automatic manner?

"In several scenarios, administrators end up performing basic failover testing by shutting down the processes and verifying that the subsequent requests succeeded.

Although this level of testing can satisfy the failover requirements for the records, more robust failover testing needs to be performed to ensure a proper recovery if failures do occur."

We did the manual testing and failover worked. But we would like to do automatic testing of failover to make sure that it works for all our 1000+ pages. BEA does not have any tool for such testing.

There are different reasons why someone may want to test all pages for failover.

1) WebLogic only replicates attributes that were modified. Call of session's setAttribute() method is an indication for WebLogic that attribute was modified. This call may be done explicitly or implicitly when jsp tags are used. It is possible that on some pages members of complex attributes were modified but WebLogic was not notified about it, so it will not replicate such attributes.

2) Complex attributes may reference other objects and attributes. After replication these references may be broken. For example, attribute A and B references object C. Only attribute A was modified, so only A will be replicated. After the replication A and B may point to different copies of C and program may not work correctly anymore.

3) Some objects are assumed to be singletons. Developer needs to provide special implementation for serialization to support replication of singleton objects. If this implementation is omitted, then replication may create copies of a singleton object.

4) Transient fields are not going to be replicated but there should be a recovery code that restores values of these fields after replication. Without testing we do not know if all our recovery code works correctly or not.

There are probably other reasons too.

Does anybody know about any tool for automatic testing of failover (or at least just session replication) for WebLogic and/or WebSphere?

Thanks,
Viktor