Estimated reading time: 4 minutes, 44 seconds

My database career pretty much started in 1998 with an internship at a software consulting firm where I spent late nights with my manager working on partitioning a SCO Unix server as part of a disaster recovery effort for a bank that had lost the database server earlier in the day. Needless to say, the bank had reverted to manually serving clients to maintain a business as usual facade for the greater part of the business day.

In those early days of the database, the database server was not as delineated as it is today. The hardware, operating system, Oracle software and storage was pretty much seen as one integrated piece. Something as native as a broken power supply unit on the server meant downtime and manual processes (read labor) until the problem was resolved.

Needless to say, with the era so much has changed with high availability and disaster recovery and/or business continuity (depending on which side of the table you’re sitting on) taking center stage with IT and of course business operations. At a minimum, some redundancy at the level of person/server/application and even business processes are a required minimum for any technology driven business or process in 2016. Redundancy has become an absolute necessity at several points of the IT infrastructure – networks and their switches and backhauls, servers with redundant power and partitions, software zones and versions with Docker.

To get some confusion out of the way, permit me to get a few definitions out of the way – for clarity purposes:

High Availability: In technology, refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to “100% operational” or “never failing.

Disaster Recovery: In the sense of our narrative is the area of IT operations and planning that deals with protecting an organization and its business processes and assets from the effects of significant negative events, (such as a broken database).

Business Continuity: Is defined as the capability of the organization to continue delivery of products or services at acceptable predefined levels following a disruptive incident. (Source: ISO 22301:2012)

Oracle Real Application Clusters (Oracle RAC) is one of the many offerings from Oracle geared to address database high availability. It is a clustered version of Oracle Database based on a high-availability stack of at least 2 servers with access to shared storage ensuring high availability. Other benefits include scalability (meaning the ability to add more servers or nodes and users without much hassle) and agility (being the ability to add and remove nodes quite easily) for any application. In very simple terms, Oracle RAC ensures that in the event of a database server becoming unavailable on account of a fault (hardware, power, OS or other fault), database services are maintained UNINTERRUPTED. It does this by ensuring redundancy in the form of another instance (server) that shares the load and is available to take on additional work, possibly with some more pain, but with a guarantee that database sessions and the work they do (and are doing) are not lost. So no dropped sessions, no dropped connections, none of that! In fact, the occurrence of a fault can go completely unnoticed by the business users of the database. Consider a case where a buyer in South Japan needs to complete a trade with a broker in New York – lots of money could be lost if a glitch were to be introduced.

You could say that one of the primary drivers of the adoption of Oracle RAC is the zero-downtime of the Oracle database. Add to that agility and scalability at the drop of a hat. Other solutions such as server virtualization, which has seen major adoption with the demand for always-on systems play in that space as well. These require a combination of virtual server fault tolerance, replication, clustering and load balancing applied with all virtual servers stored in a shared volume allowing different physical hosts to access the files.

REDUNDANCY is essentially the matter under debate. Any effective solution strategy should include measures to protect against outages. The best defense you can have is to plan for redundancy. Redundancy at every level – servers, which is essentially what Oracle RAC addresses, but also at the level of shared storage, at the network, site, application redundancy and even so personnel. And that is where CLOUD trumps the most, or them all.

A cloud based IT strategy ensures/delivers agility and scalability with built in redundancies – mostly. Even more, the idea of paying only for what you use without the upfront capital investment is a sound business argument.  An absolute requirement, however is the clear, unbiased and proper think through that must go into a selection process.

In summary, a business could decide to host IT systems and processes in-house or in the cloud. Redundancy and the assurance of zero downtime must be built into the business IT strategy. For Oracle databases, Oracle RAC is a valid option that should be given careful consideration. Virtualization as a means of achieving redundancy of systems should be considered alongside. Most importantly, redundancy in the sense of personnel to provide the IT services that are required by the business should be given proper thought.

This treatise started out as a consideration of Oracle RAC as a reasonable option for database high availability. More importantly, the argument must be one of a carefully thought through redundant solution for the business processes. I always recommend getting the business involved in determining what levels of redundancy are required for how much downtime and loss they are willing to consider (I picked that from dealing with a trusted insurer)