|
SAF-TE
- SCSI Accessed Fault-Tolerant Enclosures
One
of the leading hardware causes of operating and storage system downtime
include the failure of hard drives, power supplies and cooling systems.
These components are the weakest link in any server or RAID system,
carrying the highest probability of failure. Although there is always
scope to include redundant components to compensate for these failures,
SAFTE attempts to offer a more professional and proactive environment
to address these problems rather than simply waiting for failures
to occur and be fixed. The SAFTE specification allows for extensive
flexibility in enclosure design, openness of components, and attempts
to alleviate the reactive policies that most systems operate on.
SAFTE is independent of hardware I/O cabling, operating systems,
server platforms, and RAID implementation because the SAFTE enclosure
itself is treated as simply another device on the SCSI bus together
with its own target SCSI ID and LUN. This addressing system also
simplifies the integration of any SAFTE bus into a RAID
controller or monitoring software as they simply need to address
a specific target ID to receive all the necessary information concerning
the working environment of the enclosure. Details may be gathered
from any monitoring device installed on the SAFTE bus - such as
the operating temperature of separate components or the overall
enclosure. The SAFTE specification may be applied to either a server
or an add-on storage enclosure with the manufacturer supplying as
many separate monitoring options as they feel is appropriate.
In a properly implemented SAFTE enclosure a RAID controller should
be able to determine an impending failure before it occurs and compensate
accordingly. The controller may be configured to monitor certain
components on known parameters - such as excessive heat-generation
from array member hard drives - one of the most obvious signs of
impending hard drive failure. If one of the drives being monitored
via the SAFTE specification reaches a preset parameter, the controller
could be built with the ability to stop all I/O access to the target
ID, shut down the offending device, and bring a stand-by drive on-line
to replace it.
The
controller may be programmed to then automatically
rebuild the logical drive in
the background without any user intervention. In a high-level critical
environment the controller could also be programmed to shut down
problem power supplies or cooling units and bringing redundant units
on-line. The SAFTE specification makes this total control over individual
components possible. It ties together the different parts into a
single cohesive unit with an intelligent management unit that automates
the monitoring and replacement of its own components.
Use of the SAFTE specification in building enclosures and compliant
controllers allows the best combination of third party controllers,
enclosures, and monitoring software to be used while vastly reducing
compatibility issues. The automatic monitoring and alert notification
of the storage subsystem, locally or remotely, adds an additional
layer of protection to a RAID storage system.
The SAFTE specification was originally drawn up by two commercial
interests - nStor Corporation and Intel - and their goal was to
support a standardised alert detection and status reporting system
using SCSI's underlying transport mechanism. This approach allows
all standard SCSI host adapters or RAID controllers to work without
special considerations for reserved signals on the SCSI bus or additional
cabling, and allows for consistent implementation by multiple manufacturers
and integrators. nStor and Intel have published a document entitled
SCSI Accessed Fault-Tolerant Enclosures Specification.
Related
topics:
|