This bulletin gives an introduction on how to go about developing a disaster
recovery strategy for SQL Server. Also included is a case study demonstrating
typical considerations when designing a disaster recovery strategy.
What Is a Disaster Recovery Strategy?
It is a list of well-documented steps, tried and tested, which will assist
an organization in restoring production operation after a disaster strikes.
A disaster can be defined as any incident that may result in production outage,
e.g. server hardware failure, site disaster, etc.
What Are the Factors to Consider When Designing a Disaster Recovery
Strategy?
There are several factors that are crucial in determining an organization's
disaster recovery plan. Some of these factors are:
1. How critical is the data to the operability of the organization's core business?
2. How much data can the organization afford to lose (which may be recreated
or reentered) in case there is a disaster?
3. How much downtime can the organization afford if there is a disaster and
how much downtime can the organization allocate for regular database maintenance?
4. What kind of funding and resources are available for the disaster recovery
plan?
These factors will give a typical set of parameters within which a disaster
recovery plan needs to be formulated.
What Are the Disaster Recovery Options Available within SQL Server?
SQL Server provides several options that may be combined to formulate a robust
disaster recovery plan. It is important to understand all options available,
and their respective limitations, before designing a disaster recovery plan.
Clustering
Refers to architecture where two or more nodes may be set up to share disk
and some other resources. These shared resources are owned by the active node.
This architecture provides protection against server failure. However, it does
not protect against failure of the shared resources. This is probably the reason
that clustering is more often referred to as a High Availability solution rather
than a disaster recovery solution. A High Availability solution refers to hardware
architecture that is designed to withstand a partial outage.
SQL Database Backups
SQL Server databases may be backed up in a variety of modes. These backups
may eventually be used to recover the production environment in case of a disaster.
We will discuss these in a little bit more detail further in the bulletin.
Log Shipping
Log shipping is a process of backing up, copying, and restoring transaction
log backups to a warm standby server. This process may be set up in fully automated
mode and if you are using SQL Server 2000 Enterprise Edition, you get advanced
setup and monitoring capabilities built into the core SQL Server product. Log
shipping provides complete machine and environment redundancy.
What Are the Different Types of Database Backups That Are Available
within SQL Server?
There are several types of database backups available within SQL Server. These
provide the flexibility of backing up the largest of enterprise level systems
in a manner that provides the quickest recovery mechanism.
Complete Database Backups
A complete database backup backs up all information stored in the respective
database. A complete backup provides the easiest means of recovery in a disaster
situation. The downside is that this takes the longest to complete.
Differential Database Backups
A differential backup backs up portions of the database that have been modified
since the last complete backup. These backups are cumulative, meaning the most
recent differential backup contains changes from previous differential backups.
Differential backups provide an excellent alternative to complete backups for
very large databases and also require less management since only the latest
backups need to be maintained.
Transaction Log Backups
Transaction log backups back up the changes that have been logged in the transaction
log for a database. This backup is not allowed if a database is in Simple recovery
model (on SQL Server 2000) or has the Truncate Log on Checkpoint setting on
(for SQL 7.0). A transaction log for a database is a journal of all changes
made to the database. Once the changes have been propagated to the database,
they are cleared out when the transaction log is backed up or truncated. Backing
up the transaction log at frequent intervals provides the flexibility to the
recovery scenario. Transaction log backups could be performed at more frequent
intervals than differential backups. The downside would be the management aspect
of managing a large number of transaction log backups.
File/Filegroup Backups
Filegroup backups provide a means to recover from failure of individual files
and/or filegroups. Filegroup is a logical grouping of SQL Server database files.
A SQL Server database may consist of several filegroups. Each filegroup may
be backed up individually using filegroup backups. This provides the flexibility
of recovering an individual filegroup if the disaster involves losing only a
single or multiple filegroup, without having to restore a complete database
backup. The downside to using filegroup backups is manageability. Each time
a filegroup backup is performed, a transaction log backup should be performed
since a transaction log backup is required to recover the databases successfully
if the filegroup backup is restored.
Snapshot Backups
Snapshot backups are performed at hardware level. This method of backup is supported
only for certified snapshot capable hardware. It provides an extremely quick
disk level backup of entire database through hardware implementation. Snapshot
backups are implemented using the Virtual Device Interface (VDI). VDI is a programming
interface that exposes the backup/restore functionality of SQL Server. Snapshot
backup functionality is accomplished in cooperation with third-party hardware
and/or software vendors. For more information, refer to the Snapshot Backups
topic in SQL Server 2000 Books Online.
What Are My Next Steps in Designing the Disaster Recovery Strategy?
The backup/restore of SQL databases is the most widely used and most easily
implemented disaster recovery solution. A combination of the different types
of backups described above usually provides the best recovery path in case of
a disaster.
The decision to include some or all of the types of backups mentioned previously
would be determined from the recovery needs for the organization.
General Tips on Maintaining the Effectiveness of the Disaster Recovery
Strategy
For the disaster recovery strategy to be effective, some simple steps should
be followed:
1. Close watch on the backup jobs is required. If the jobs are failing, the
reason for failure should be corrected and any corrective actions should be
taken so the recovery path is not disturbed.
2. Document and rehearse the disaster recovery procedures. This will provide
all staff members the ability to cope with the disaster in the most efficient
manner and will also provide a good timing estimate on how long it would take
to restore production operations if the real disaster strikes.
3. Restore the backups being performed at regular intervals since merely performing
the backups does not guarantee restorability of the backups.
4. Revise the disaster recovery procedures each time there is a major release
or event that involves major changes to the database. If it makes sense to add/remove
or modify the frequency of a specific type of backup, this should be documented.
A Case Study
An organization using SQL Server on a round-the-clock basis may require a disaster
recovery strategy that takes into account the following aspects:
- If a disaster was to strike, the production environment should be operational
within 30 minutes or less.
- The company cannot afford to lose more than 15 minutes worth of processing.
- There are resources available to have a warm backup server setup for
disaster recovery purposes.
The above aspects could drive the management to formulate a disaster recovery
plan encompassing the following:
- The operational databases will be in Full recovery model.
- A complete database backup will be performed each Sunday night at 11:00
pm.
- A Differential backup for each database will be performed all other nights
at 11:00 pm.
- Transaction log backups will be performed every 10 minutes.
- A warm backup database will be setup on the redundant server that is
available as the disaster recovery server.
- Log shipping will be used for the warm backup solution.
- Transaction logs will be restored every 10 minutes on the warm backup
database.
With the above strategy, the organization meets all the desired goals of the
disaster recovery strategy. Implementation of such a plan may require further
technical actions, but such a plan will satisfy all requirements laid out during
the planning phase of the disaster recovery strategy.
© 2003 Microsoft