StorageWorld : Data Storage, RAID, SAN, NAS, Disaster Recovery & Backup

DR Overview | DR Objectives | DR & Integrated Backup | IT Outages & Risk Assessment

Disaster Recovery Overview

Disaster Recovery (DR) focuses on the continuity of IT operations in the event of disaster scenarios, and is a logical subset of the Business Continuity Planning (BCP) process. The design of any IT infrastructure should be compatible with that of the organisation's Disaster Recovery Planning (DR Planning) procedure. As most modern businiess place a heavy reliance upon electronic data to perform essential tasks, it is critical that key IT systems and components are given adequate protection from preventable loss.

Although the information here primarily focuses on critical data storage and backup systems, the importance of individual IT and network components (e.g. specialised transaction servers, bespoke development platforms, access etc.) must also be assessed and addressed.

However, regardless of business model, the most critical component of any IT structure is the business data it contains and processes. Without the availability of critical business data, the business cannot operate properly, and, in the cases of total and irretrievable loss, full business recovery is unlikely. Invariably, the loss of critical business data carries higher financial penalties than the loss of the systems that stored and processed it.

DR Planning needs to address any integral flaws in the existing data storage infrastructure that present barriers to fundamental Business Continuity. The primary objective of any proposed storage network is to move the business IT environment to a more stable and reliable platform that is capable of actively participating in DR and BCP.

DR Objectives:
Getting the Basic Data Protection Strategy Right

The primary objectives of a data storage strategy are complex. At the basic level, it needs to address capacity management and availability, ensuring that it is devised in a manner which guarantees flexible headroom (storage capacity will be able to scale in order to meet increasing data growth demands), provides a level of protection against hardware failure (fault tolerance), and also permits capacity be added without disruption or downtime (transparent dynamic expansion). Similarly, the backup sub-system must be devised in a manner that is able to cope and scale alongside the primary storage systems.

At the highest level, a data storage strategy needs to comply with the rigorous demands set by disaster recovery and business continuity planning, offering options for off-site storage, remote replication, automated backup to increase reliability, and restoration procedures that encompass every scenario - from single file recoveries triggered by user requests, to the ability to retrieve and provide access to critical data in disaster situations where primary storage systems are temporarily or permanently lost.

DR strategies and solutions are themselves also very complex, and to help categorise the various solutions and their characteristics, definitions of the varying levels and required components can be defined. The model used is a standard industry-wide tier structure that breaks down the varying strategies into a series of seven defined and escalating tiers. Although each component of the strategy can be individually implemented through a series of phases, to ensure appropriate equipment procurement, software compliance, active procedures, and strategy designs, are all integrated seamlessly, there is an essential need for a organisational data strategy that these components should be designed to comply with.

The first step is to focus on ensuring the basic structure meets a certain minimum level of requirements. An exmple of these could be as follows:

A reliable internal data storage platform providing scalable, fault-tolerant storage systems
A shared data network to reduce LAN congestion, and provide fast, reliable access to servers, users, and data backup services
Automated backup such as LAN-Free backup and/or Serverless backup to improve reliability
Automated backup copying services with off-site vaulting
Scalability, with the system providing adequate capacity headroom
Flexibility to provide support for future high level data management/disaster recovery such as remote replication, hotsite or multi-site data storage centres, High Availability, and real-time imaging and backup that provides the ability to rollback systems to an exact point in time.

Disaster Recovery & Integrated Backup

The storage models presented for consideration do not constitute a DR process in themselves, however, they do provide the base upon which initial DR techniques can be employed.

By introducing processes such as automated backup (through LAN-Free or Serverless backup), the reliability, performance, quality of service, and regularity of data backups can be improved dramatically. Storage devices on the network communicate and transfer data directly between themselves with no server involvement or processor overhead.

The speed and quality of connection ensures quick efficient backups and copies may be made at any time - even during peak operating hours. This also allows reductions in the Recovery Point Objective (RPO) bringing forward the age of the data you want the ability to restore in the event of a disaster. This automated backup process also allows us to also initiate a direct tape-to-tape copy of the backed up data onto a second single slot drive (the existing LTO device if appropriate) to produce a physical copy suitable for daily offsite vaulting in a secured storage facility, a process known as PTAM - Pick-up Truck Access Method - as it usually involves a secure courier service.

The introduction of these backup procedures form the basic initial steps towards developing a DR environment by placing critical company inormation on secure systems that can be replicated, rolled back, or otherwise restored in the event of major failures or losses.

From an IT-centric perspective, outages are classified as planned or unplanned disruptions to operations. The list below shows the types of outages commonly experienced in enterprise computing environments. The majority of outages are familiar risks that are applicable across all business centres, and do not just affect the IT division. From the perspective of a business, these risks should be already highly defined and understood.

IT Outages & Risk Assessment

An unplanned IT outage can equate to an IT disaster, depending on the scope and severity of the problem. Many Disaster Recovery plans focus solely on risks within the data centre. However, the importance of looking beyond the data centre operations by implementing the BCP process in addition to traditional IT Disaster Recovery Planning is essential. Beyond the immediate control of the data centre, IT operations face a variety of risks such as:

Computer Failure
Corrupted Data
Lost Data
Network Failure
Software Errors
Computer Virus
Electro-Magnetic Pulse
Hacking
Sabotage
Theft
Blackouts
Brownouts
Flood or Burst Pipe
Environmental Hazard
Epidemic
Evacuation
Halon Discharge
HVAC Failure
WAN/ISP Failure
Power Surge
Power Grid Failure
Sprinkler System Discharge
Transportation Disruptions
Bomb Threat
Bomb Blast
Biological Attack
Chemical Spill/Attack
Civil Unrest
Earthquakes
Electrical Storms
Fire
Flooding

The quality of the DR procedures in place is directly comparable to the number of scenarios or disasters that the strategy is designed to compensate for and recover from.

In-house backup and security policies should be built that handles the majority of basic situations such as restoration of corrupt data, viral infection, hacking & external access, network failure etc. DR policies will need to cover external risks beyond the control of the business to predict or prevent. Each possible risk does not have to be addressed individually.

For example, if a business has a DR process that provides highly accessible, up-to-date copies of business critical data at a secure remote location and in addition also provides a method of automatic retrieval and restoration of that data to redundant systems at secondary site (a hotsite), most building-centric disasters can be eliminated (or at least greatly reduced) from the IT DR risk assessment.

To provide a basis for testing the resilience of your DR procedures, there are seven defined levels or tiers of DR. These tiers are accepted as de facto standards for DR Planning. Every DR Plan can therefore be given a classification based upon which aspects of DR criteria it satisfies. Classifications, from lowest to highest DR ability, run from Tier 0 (the lowest classification and is classed as the lack of any DR ability) to Tier 6 (the highest classification which provides for zero data loss in multiple disaster scenarios).

Important information on StorageWorld

StorageWorld is a reference for my clients, colleagues in the data storage industry, and end-users (hopefully potential clients) to provide an overview of the range of data storage service offered. There is a wide & diverse network of independent data professionals in the UK providing consultant & engineering services on all aspects of data storage, data network design, project management of data system implementations, and offering vendor-independent advice through either a direct relationship with end clients or through third-party suppliers.

The objective of this site was to use the term 'StorageWorld' as a name to describe an independent group of professional storage colleagues who offered their services directly, or through contracting, to clients, and provides a platform to promote and explain the type of services we provide. It also serves as a contact point for services currently rendered to clients, with restricted sections that we may use as central reference library. Suggestions, comments, contributions, error corrections, or other interest welcome.