RAIDhelp: Introduction to RAID Technology

Home | Contents | Help

Raidhelp

SAF-TE - SCSI Accessed Fault-Tolerant Enclosures
One of the leading hardware causes of operating and storage system downtime include the failure of hard drives, power supplies and cooling systems. These components are the weakest link in any server or RAID system, carrying the highest probability of failure. Although there is always scope to include redundant components to compensate for these failures, SAFTE attempts to offer a more professional and proactive environment to address these problems rather than simply waiting for failures to occur and be fixed. The SAFTE specification allows for extensive flexibility in enclosure design, openness of components, and attempts to alleviate the reactive policies that most systems operate on.

SAFTE is independent of hardware I/O cabling, operating systems, server platforms, and RAID implementation because the SAFTE enclosure itself is treated as simply another device on the SCSI bus together with its own target SCSI ID and LUN. This addressing system also simplifies the integration of any SAFTE bus into a RAID controller or monitoring software as they simply need to address a specific target ID to receive all the necessary information concerning the working environment of the enclosure. Details may be gathered from any monitoring device installed on the SAFTE bus - such as the operating temperature of separate components or the overall enclosure. The SAFTE specification may be applied to either a server or an add-on storage enclosure with the manufacturer supplying as many separate monitoring options as they feel is appropriate.

In a properly implemented SAFTE enclosure a RAID controller should be able to determine an impending failure before it occurs and compensate accordingly. The controller may be configured to monitor certain components on known parameters - such as excessive heat-generation from array member hard drives - one of the most obvious signs of impending hard drive failure. If one of the drives being monitored via the SAFTE specification reaches a preset parameter, the controller could be built with the ability to stop all I/O access to the target ID, shut down the offending device, and bring a stand-by drive on-line to replace it.
The controller may be programmed to then automatically rebuild the logical drive in the background without any user intervention. In a high-level critical environment the controller could also be programmed to shut down problem power supplies or cooling units and bringing redundant units on-line. The SAFTE specification makes this total control over individual components possible. It ties together the different parts into a single cohesive unit with an intelligent management unit that automates the monitoring and replacement of its own components.

Use of the SAFTE specification in building enclosures and compliant controllers allows the best combination of third party controllers, enclosures, and monitoring software to be used while vastly reducing compatibility issues. The automatic monitoring and alert notification of the storage subsystem, locally or remotely, adds an additional layer of protection to a RAID storage system.

The SAFTE specification was originally drawn up by two commercial interests - nStor Corporation and Intel - and their goal was to support a standardised alert detection and status reporting system using SCSI's underlying transport mechanism. This approach allows all standard SCSI host adapters or RAID controllers to work without special considerations for reserved signals on the SCSI bus or additional cabling, and allows for consistent implementation by multiple manufacturers and integrators. nStor and Intel have published a document entitled SCSI Accessed Fault-Tolerant Enclosures Specification.

Related topics:

Components of a RAID System
Redundant Controllers

Raidhelp

RAIDhelp© Copyright 1999-2004 Antony Kershaw