RAIDhelp: Introduction to RAID Technology

Home | Contents | Help

Raidhelp

Hotswap Functionality
The ability to quickly and easily exchange failed components in a RAID system is vital. Most, if not all, commercial RAID systems offer modular components in quick release modules that allow you to simply release and remove the failed component and insert the new one. This is a fundamental design that no serious RAID system should be built without.

However, a RAID system has objectives beyond this type of simplicity. The replacement of components when the system is powered down or off-line is not enough for a mission critical storage system that is under constant use and access. Bringing down an Internet server on a busy web site just to exchange a power unit - no matter how quickly it may be done - simply cannot be tolerated. As more and more organisations rely on their IT systems, and as an obvious consequent, their data storage, the provision of continuous and uninterrupted access is the main motivation behind the implementation of protected storage systems. Protected storage systems that need to be taken off-line for the replacement of basic consumer components such as PSUs, fans, hard drives, etc. defeats the entire purpose of their installation.

As one of the main objectives of a RAID array is the provision of non-stop data access, the ability for the system to provide exchange of critical components without any disruption to user access is extremely important. The two main ways in which this can be implemented is by Hot-Swap and Warm-Swap functionality. Normal powering down of a system for maintenance (replacing memory in a server for example), is usually termed Cold-Swap in comparison.

Hot-swap is the ability to exchange the component with no disruption to I/O requests and transactions, no powering down of any part of the system except the failed component, and bringing the new component on-line and integrated as part of the array with no further action necessary to the remaining working components. The most common hot-swap functionality the majority of higher-end RAID systems offer is the ability to replace hard drives.

In a true hot-swap system, the failed hard drive may be released from its bay in the enclosure without first stopping and I/O requests to the array that the drive is a member of. All user access and transaction must continue as normal. Once the drive is removed, a new drive is added to the array and brought on-line by the controller. If the drive contained part of the logical drive data, the controller should then begin the rebuild or reconstruction process. Depending on the controller, this may be manual rebuild or an automatic rebuild. At no point should any disruption occur to the I/O processing of the array. All good professional or enterprise level RAID systems should include a hot-swap ability of as many components as possible as a fundamental part of the basic design including power supplies, cooling units, and hard drives.

Warm-swap is a compromise between hot-swap and cold-swap. In a typical warm-swap of a hard drive the array may require I/O transactions to be halted whilst the failed drive is exchanged, but the system does not have to be powered down. This eliminates the delays incurred by the drive spin-up, controller boot-up, and negotiation with the host. In a warm-swap the controller simply places I/O requests on hold until he component is ready, then resumes operation. This is normally the only type of component exchange offered by PCI-based RAID systems.

The ability to provide a hot-swap function depends on two major components of an array: the RAID Controller and the Enclosure. The physical handling of power and data disconnection and reconnection without disruption must be built into the drive enclosure. Once this is available the controller must have the ability to recognise and use this function.

Related topics:

Components: RAID Controllers
Components: Enclosures
Global & Local Spare Drives
The Interface Decision

Raidhelp

RAIDhelp© Copyright 1999-2004 Antony Kershaw