Key Concepts:

Terms in this set (207)

Service Level Agreements (SLAs), sometimes called maintenance contracts, guarantee the quality of a network service provider's care to a subscriber. SLAs often include descriptions for the following:
• The mean time between failures (MTBF) identifies the average lifetime of a system or component. Components should be replaced about the time that the MTBF is reached.
• The mean time to repair (MTTR) identifies the average amount of time necessary to repair a failed component or to restore operations.
SLAs can include guarantees for:
• Turn-around times
• Average response times
• Number of online users
• System utilization rates
• System uptimes
• Volume of transactions
• Production problems
Keep in mind the following recommendations for SLAs:
• SLAs should define, in sufficient detail, any penalties incurred if the level of service is not maintained.
• In the information security realm, it is also vital that the provider's role in disaster recovery operations and continuity planning is clearly defined.
• Industry standard templates are frequently used as a starting point for SLA design, but must be tailored to the specific project or relationship to be effective.
• If you depend on an SLA for mission-critical code, you should consider a code escrow arrangement. Code escrow is a storage facility hosted by a trusted third party which will ensure access to the mission critical code even if the development company, the company with whom you have the SLA, goes out of business.
After you have identified the risks and their associated costs, you can determine how best to respond to the risk. Responses include:
• Taking measures to reduce (or mitigate) the likelihood of the threat by deploying security controls or other protections. When deploying countermeasures, the annual cost of the countermeasures should not exceed the ALE. If it does, you are paying more to protect the asset than it is worth. Security control types include:
o Management
o Operational
o Technical
Consider the following factors when implementing security controls to reduce risk:
o Compatibility with the existing infrastructure
o Effectiveness
o Regulatory compliance
o Organizational policies
o Operational (performance) impact
o Feasibility (technical requirements or usability)
o Safety and reliability
• Transferring (or assigning) risk by purchasing insurance to protect the asset. When the incident occurs, the cost of replacing or reparing the asset is covered by insurance. When deciding to transfer the risk, be sure to compare the cost of insurance with the ALE. Purchase the insurance only if its cost is less than the ALE.
• Accepting the risk and choosing to do nothing. For example, you might decide that the cost associated with a threat is acceptable or that the cost of protecting the asset from the threat is unacceptable. In this case, you would plan for how to recover from the threat, but not implement any measures to avoid it.
• Risk rejection (or denial) is choosing not to respond to the risk even though the risk is not at an acceptable level. Risk rejection introduces the possibility of negligence and may lead to liability. Risk rejection is not an appropriate response.
• Risk deterrence is letting threat agents know of the consequences they face if they choose to attack the asset. This could include posting warnings on login pages to indicate prosecution policies.
• Distributive Allocation responds to the risk by spreading it through redundancy and high availability techniques such as clustering, load balancing, and redundant storage arrays.
It is not possible to eliminate all risk. Taking actions reduces risk to acceptable levels. Risk that remains after reducing or transferring risk is called residual risk.
*Plans for resumption of applications, data, hardware, communications, and other IT infrastructure in case of disaster.
*Attempts to take into consideration every failure possible.
*Plans for converting operations to alternate processing sites in case of disaster.
*Plans for converting back to the original site after the disaster has concluded.
*Disaster recovery exercises (such as fire drills) that simulate a possible disaster.

Decisions about alternate site locations need to be guided by the following requirements:

*Maintain adequate geographic distance between primary and secondary sites. Such geographic diversity can minimize the possibility of a disaster bringing down both sites.
*Site locations can have legal implications, especially when data is stored in multiple countries. Data sovereignty refers to the fact that every country has its own laws and regulations regarding digital data storage. Data safety and privacy concerns may need to be reassessed for each location.
*Decide whether the backup site will be hot or cold. A hot site is set up with servers and workstations that have almost immediate access to data that is continuously replicated from the main site. If this is too expensive, a cold site, such as an empty warehouse, can be used. The disadvantage of a cold site is that it will take much longer to install the necessary hardware and software necessary to resume business operations.
Whether a hot or a cold site is chosen as a backup, alternate business practices and processes need to be defined and stored in each location. Critical tasks should be described in sufficient detail to allow business staff to carry them out with minimal training.
Keep in mind the following when creating the disaster recovery and business continuity plans:

A good plan documents all important decisions before the disaster strikes. When a disaster occurs, staff members simply need to follow the documented procedures.