Disaster recovery planning has the goal of minimizing the effects of a disaster.
Contingency planning deals with providing methods and procedures for dealing with longer-term outages and disasters.
The most critical piece overall is management support.
The Business Impact Analysis (BIA) is a crucial first step in disaster recovery and contingency planning. The goal is to see exactly how a business will be affected by different threats.
Time-loss curves show the total impact over specific time periods.
The main goals of disaster recovery planning is to:
- Improve responsiveness by the employees in different situations.
- Ease confusion by providing written procedures and participation in drills
- Help make logical decisions during a crisis
Disaster Recovery Planning:
A disaster recovery plan is a comprehensive statement of consistent actions to be taken before, during, and after a disruptive event that causes a significant loss of information system resources.
Phases of Development:
The phases of development for a DRP/BCP program should be:
- Business impact analysis
- Strategy development
- Plan development
The 4 primary elements of BCP are:
- Scope plan initiation
- Business impact Analysis – includes vulnerability assessment
- Business continuity plan development
- Plan approval and implementation
Scope and Plan initiation:
Steps involved in the scope and plan initiation include creating an account of the work required, listing the resources to be used and defining the management practices to be employed.
A BCP committee should be formed and given the responsibility to create, implement and test the plan.
Business Impact Analysis:
The purpose of a BIA is to create a document to be used to help understand what impact a disruptive event would have on the business.
The business impact analysis has 3 primary goals:
Criticality Prioritization: Critical business units must be identified and prioritized.
Downtime Escalation: Estimate the maximum tolerable downtime (MTD)
Resource Requirements: Identify resource requirements for the critical processes.
A business impact analysis generally takes 4 steps:
- Gathering the needed assessment materials
- The vulnerability assessment
- Analyzing the information compiled
- Documenting the results and presenting recommendations to management.
There is a general 6-step approach to contingency planning:
- Identify critical business functions
- Identify the resources and systems that support these critical functions.
- Estimate potential disasters
- Select planning strategies – how to recover the critical resources and evaluate alternatives. A disaster recovery and contingency plan usually consists of emergency response, recovery andresumption activities.
- Implementing strategies.
- Testing and revisiting the plan.
Plan Approval and Implementation:
Plan approval and implementation consists of:
1. Approval by senior management. (APPROVAL)
2. Creating an awareness of the plan enterprise-wide. (AWARENESS)
3. Maintenance of the plan, including updating when needed. (MAINTENANCE)
End User Environment:
The first issue pertaining to users is how will they be notified of the disaster and who will tell them where to go and when? A tree structure/call list is necessary for this.
The hardware backup procedures should address on-site and off-site strategies. There are 3 main categories of disruption:
Non-Disaster: Disruption in service from device malfunction or user error.
Disaster: Entire facility unusable for a day or longer.
Catastrophe: Major disruption that destroys the facility altogether. Requires a short term and long term solution.
Off-site backup facility options are:
Hot-Site: Fully configured and ready to be operating within a few hours. Expensive but the company has exclusive use.
Warm-Site: Partially configured with some equipment, but not the actual computers.
Cold-Site: Basic environment such as wiring, AC, plumbing is in place, but no equipment. This is the least expensive option but has much longer recovery time.
Different Backup Types:
Incremental: All files changed since the last backup. Removes archive attribute.
Differential: All files changed since the last full backup. Does not remove archive attribute.
Full: All files. Removes archive attribute.
Other backup strategies include:
Electronic Vaulting: Makes an immediate copy of a changed file or transaction and sends it to a remote location where the original backup is stored. Moving backup tapes off-site is also a form of electronic vaulting.
Remote Journaling: Transmitting only the journal or transaction logs to the off-site facility and not the actual files.
Database Shadowing: Database shadowing is similar to remote journaling, but the transactions are shadowed to multiple databases.
Disk Shadowing: Mirrored disks for redundancy.
Disk Duplexing: More than one disk controller is used. If one fails, another takes over.
A company is not considered out of an emergency until it is back at the original site operating under normal circumstances. The least critical systems should be moved back first.
Disaster Recovery Testing:
Reasons for testing include:
- Inform management of the recovery capabilities of the enterprise.
- Verify accuracy of the recovery procedures and identify deficiencies.
- Prepare and train the personnel to execute emergency duties.
- Verify processing capability of the remote backup site.
Disaster recovery tests should be performed at least once a year!
The recovery team is used to get critical business functions running at the alternate site.
The salvage team is used to return the primary site to normal processing conditions.
Tests and Drills:
There are a few different types of tests and drills that can take place, each with its own pros and cons:
Checklist Test: Copies of the DR plan and continuity plan are distributed to each functional area for review.
Structured Walk-Through Test: Group comes together to walk through scenarios in detail.
Simulation Test: DR team or groups of employees come together to simulate a specific scenario.
Parallel Test: Done to ensure that critical systems can perform adequately at the off-site facility. The systems are moved to the alternate site and processing takes place.
Full Interruption Test: Original site is actually shut down and processing takes place at the alternate site.
“Emergency response procedures are the prepared actions that are developed to help people in a crisis situation better cope with the disruption. They are the first line of defense when dealing with a crisis situation”